Abstract
MESPEUS is a freely accessible database which uses carefully selected metal coordination groups found in metalloprotein structures taken from the Protein Data Bank. The database contains geometrical information of metal sites within proteins, including 40 metal types. In order to completely determine the metal coordination, the symmetry-related units of a given protein structure are taken into account and are generated using the appropriate space group symmetry operations. This permits a more complete description of the metal coordination geometry by including all coordinating atoms. The user-friendly web interface allows users to directly search for a metal site of interest using several useful options, including searching for metal elements, metal-donor distances, coordination number, donor residue group, and structural resolution. These searches can be carried out singly or in combination. The details of a metal site and the metal site(s) in the whole structure can be graphically displayed using the interactive web interface. MESPEUS is automatically updated monthly by synchronizing with the PDB database. An investigation for the Mg-ATP interaction is given to demonstrate how MESPEUS can be used to extract information about metal sites by selecting structure and coordination features. MESPEUS is available at http://mespeus.nchu.edu.tw/.
Graphical Abstract
Graphical Abstract.
Introduction
Metal elements are essential to all living organism by participating in numerous important biological functions and processes, in many cases by forming specific complexes with a wide range of proteins, known generally as ‘metalloproteins’. It has been estimated that about one-third of proteins require metals for functionality (1), and about 30–40% enzymes contain metals at their catalytic centers (2). These metals act as cofactors involved in various enzymatic reactions and catalytic functions including ubiquitous proteins such as superoxide dismutase, carbonic anhydrase and alcohol dehydrogenase. Metals also contribute to the protein structure maintenance (e.g. zinc-finger), and transportation and the storage of small molecules (e.g. hemoglobin, myoglobin and hemocyanin) (3).
The physical-chemical properties of the metal and the geometry of a metal coordination result in a variety of highly specific biological functions (4,5). The study of the interaction between a metal and protein is therefore important for understanding the functions of metalloproteins and their mechanisms. Furthermore, the geometric information of a metal coordinating with the protein may be useful in fitting a model to electron-density maps of medium resolution X-ray or cryo-electron microscopy structures.
The number of proteins stored in the Protein Data Bank continues to grow exponentially (6). Though X-ray crystallography provides the majority of structures (172 903 by May 2023 in total), cryo-EM contributions are growing rapidly (7,8) (13 156 in total). Several metalloprotein databases based on the PDB data have been published. For example, MDB (9) is a historical metalloprotein database describing geometries and properties of metal sites. It enhances integrity of coordination group information by extracting the donors of first- and second-shell surrounding a metal site. The dbTEU database (10) features in the extensive investigation of trace elements mainly specific to Cu, Mo, Ni, Co and Se, and characterized the metalloproteomes in organisms. The ZincBind (11) database is specific to Zn-binding proteins. The Zn-binding protein structures deposited in the database were of representative set based on 90% sequence identity. Unfortunately, the three databases described above are no longer available. MetalPDB (12) stores the metal sites derived from structures which are grouped by both structure and sequence identities in order to retrieve the representation of metal sites from either proteins or nucleic acids by way of Minimal Functional Sites (MFSs). InterMetalDB (13) focuses on metal sites found in structures of biological assemblies; metal coordination sites are constructed from one asymmetric unit after selecting a subset of the deposited coordinates with certain coordinate operations (such as symmetry translation, rotation and their combination). MeDBA (14) is a relatively new database targeting on the interactions between metal ions and small-molecule ligands in metalloenzymes. Its main features are the manual curation of metalloenzymes into different categories along with information of metal coordination strength involved in catalysis.
A PDB file usually only contains one asymmetric unit of a structure model. If a metal site is found on a rotation axis, the coordinating donor atoms might not be present as some donor atoms might be in the symmetry-related copies not presented in the PDB file. This will result in a low and incorrect coordination number of the metal site. To the best of our knowledge, none of the published metalloprotein databases have generated complete coordination shells by using crystallographic symmetry to generate missing atoms. Figure 1 illustrates how misleading results could be when, for example, metal sites lie on a symmetry axis and only one asymmetric unit is considered.
Figure 1.
Illustration of how metal coordination results can differ between using the asymmetric unit alone compared with generating all symmetry-related atoms. (A) A protein structure downloaded from PDB website (PDB ID: 1PPT) which is an asymmetric unit consisting of an extended polyproline-like helix and an alpha-helix that run roughly antiparallel. A Zn atom is seen only coordinated with the glycine residue atoms (shown as brown stick). (B) Based on its crystal space group information, two additional symmetry-related units can be generated (gray ribbon). As a result, the Zn atom is in fact coordinated with three donor groups (GLY, ASN and HIS; brown sticks) from each asymmetric unit, rather than only with glycine. (C) The enlarged scene of the Zn coordination group in the three asymmetric units. All of the structure diagrams were generated by PyMOL (16).
MESPEUS (MEtal Sites in Proteins at Edinburgh University), a database of the geometry of metal sites in proteins, was developed under the supervision of Dr. Marjorie Harding and Prof. Malcolm Walkinshaw, and was firstly published in 2008 (15). The database has been widely used and cited in studies of metal coordination geometry. We have re-built the MESPEUS system by redesigning the database and renewing its content using an updated algorithm for metal site analysis. In total, 375 201 metal sites have been found in 71 552 protein structures (to May 2023). Metals including Ca, Co, Cu, Fe, K, Mg, Mn, Na, Ni, Zn and another 30 elements have been treated in detail. The contacts of these metals within the asymmetric units and neighbouring symmetry-related units have been considered to be chemical bonds to the proteins. For the sake of data freshness and service longevity, the MESPEUS system has been developed to be capable of automatically updating its database monthly by synchronizing its data with the storage of Protein Data Bank database and performing metal site analysis as well as content renewal. In addition, a web-based user interface has been developed for MESPEUS database. It facilitates the search for a metal site of interest through several useful options, including combinations of different metals, donor types, and structure resolution. The retrieved metal-donor contact distances and the coordination geometries can be further evaluated. Individual metal sites can be graphically displayed on the web page for interactive inspection. The statistical profiles of a search result can be presented by a set of histograms.
Materials and methods
Protein structure data collection
MESPEUS stores the geometric information of metal sites, which are derived from metalloprotein structures deposited in the PDB up to May 2023 (17). The PDB files of proteins and nucleic acids were identified for structures which contain one or more metal atoms. Those determined by NMR or structures stored without structural resolution information were excluded. The identified structures are mainly characterized by protein X-ray crystallography and cryo-EM. The following analyses presented in this study are based on structural models at a resolution of 2.5 Å or better, unless otherwise specified. For the case of multiple protein molecules with metal(s) presented in the asymmetric unit, all of the metal sites are collected and stored in the database. Metals selected are the 10 most common elements found in biological systems: Ca, Co, Cu, Fe, K, Mg, Mn, Na, Ni and Zn. An additional 30 metals are also covered, including Ag, Al, Au, Ba, Be, Cd, Cr, Cs, Ga, Gd, Hg, Ir, Li, Mo, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, Sr, Tb, Tl, U, V, W, Y and Yb. Some information is directly identified and extracted from the PDB files, including (i) the protein name, class and title, (ii) the structure resolution (Å), (iii) information about the space group and cell dimensions, (iv) the refinement software program applied, (v) the R factor and Rfree, (vi) B and occupancy values of the atoms involved in the coordination group.
The geometric information of metal sites from each of PDB files is identified and extracted. The identification of donor atoms in protein or ligands, the measurement of metal–donor distances in a coordination group, the difference between the actual distance and target distance (described in more detail in the next section) and the distortion of the coordination geometry are then obtained and evaluated. The names of all atoms involved in a metal site (including metal and donors) are stored as they appear in the PDB files.
Target distance
In order to identify the donor molecules surrounding a metal atom, we make use of the idea of a ‘target distance’: these are the most likely distances found from previous studies between the metal of interest and the ligand atom, and are tabulated in the Supplementary Table S1. Atoms, excluding carbon and phosphorus, which are within the ‘target distance’ plus a tolerance 0.75 Å to a metal, are recognized as coordination donors. For instance, the target metal-ligand distances lie in the range of 1.79–2.95 Å for O, and 2.02 to 2.88 Å for N (non-amino acid donors excluded). The common donor atoms in a metal coordination group are those listed in Figure 4, including O, N or S from the side chains of amino acids, or the carbonyl group of protein main chain, as well as the substrate compounds, water molecules or other non-protein molecules found in the structure. The target distances between metal atoms and associated donor groups have been well-characterized in our previous study (15,18). The distances were obtained from a series of analyses for metal coordination groups found in PDB protein structure data combined with the information from small molecule metal complexes deposited in the Cambridge Structural Database, CSD (19). The CSD data used in the study were required to have R factor ≤ 0.065. The analogues mainly mimic the functional groups of amino acids commonly found in proteins, including phenolates, thiolates, alcohols, imidazole groups, carboxylate groups of amino acid side chains and water molecules. The observed distances in diverse metal-donor pairs were then applied to the PDB data (determined at a structure resolution ≤ 1.25 Å) by evaluating the agreement between the values derived from CSD and PDB, and then subjected to statistical analyses delivering a set of objectively reliable values known as the ‘target distance’. The metal-donor distance distributions of 10 metals common in structures deposited in the PDB are shown in Supplementary Figure S1.
Figure 4.
Query page of the MESPEUS web interface. Users can directly input a PDB ID to investigate all of the metal sites in that protein. Alternatively, users can specify a selection of query criteria to search for metal sites of interest, including: select metal types; specify metal coordination number with different logical operators; choose the type of donor residue group and atom (sub options) or input the name of non-protein donor (e.g. ATP); determine the maximum structure resolution (Å). The donor groups and atoms available for the selection are listed below.
Symmetry-related unit generation
As discussed in the introduction (Figure 1), it is important to account for space group symmetries to generate interactions that are not apparent by simply using the atomic coordinates available in the one asymmetric unit that can be readily downloaded from the PDB. Symmetry-related asymmetric units can be generated by the symmetry operators and space group information defined in the PDB file. By incorporating with PyMOL library (16), we have developed an in-house program to generate symmetry-related units for each protein structure; structures with undefined space group information were excluded. The symmetry-related units at a distance within 30 Å from the asymmetric unit atoms are generated by the in-house program which writes out a file in PDB format for metal site analysis. Among these generated symmetry-related copies, only one representative coordination group for each metal site is selected and stored in the database.
Database construction
Considering the efficiency of data storage and the compatibility with various scrips, a relational database was chosen for the present study, and a set of tables were designed to relationally organize data. In the database, a number of keys, including primary key, unique key and index, have been deployed to particular table columns in order to manage the data referential integrity and to improve the query performance. In some PDB files, two alternative positions for a donor atom or metal have been determined during refinement. Those disordered atoms are labelled A and B (or C in rare cases) in the PDB file. Disorder can also occur in some well-determined structures at high resolution. In such cases, the donor atoms with highest occupancy have been recognized and stored. An error flag has been set up for each of the disordered metal atoms.
Web interface development
The web interface of MESPEUS was developed mainly by utilizing a framework of React (https://reactjs.org/) and Next.js (https://nextjs.org/), which is a JavaScript library for building user interface. Next.js provides server-side rendering functions with the features of building complex interface by less coding, benefiting the future maintenance. The MESPEUS database with its user-friendly web interface allows clients to set a series of query options to access the information of particular metal sites without requirement of any knowledge of database query language. The web user interface displays the accessible information of metal coordination groups which matches the query conditions and presents the individual metal site with distance, angles, coordination geometric. JSmol (20), which is an interactive viewer of molecular structures, has been deployed on the interface to display the architecture of metal coordination groups. Through it, users can directly inspect the metal site in graphical detail and also visualize its position within the whole metalloprotein structure. The interactions between metal and donor atoms can be illustrated in a 2D diagram by LigPlot+ (21). A set of histograms are generated to statistically illustrate the search result after a query, including the distribution of metal-donor distance, structure resolution, and coordination number. The bins of the histograms are separated in order for users to inspect and click on them with ease. The histograms are provided with drilldown enabled function allowing users to examine query results to the desired level of detail. Information of the histograms can be downloaded as svg, png and csv formats.
Results and discussion
Content of MESPEUS
Metal sites and coordination numbers
In total, the current version of MESPEUS database stores the information of 375 201 metal sites found in 71 552 protein structures after applying crystallographic symmetry, though this is updated monthly and is currently increasing at ∼400 structures per month. Table 1 lists the statistical profiles of these metal sites grouped by the 10 common metal types. As shown in column (1), the top five metal sites are Mg, Zn, Fe, Ca and Na, respectively. Overall, >78% of the metal sites were determined without disorder as both the metal and donor atoms’ occupancy values = 1.0 (column (2)). About one-tenth of metal sites (11.85%) do not directly interact with protein as amino acid atoms are absent from the coordination groups (column (3)). The proportion of such non-protein interacting metals is particularly large for Mg and Co. There are about a quarter of Mg sites which do not interact with protein, many of which only interact with DNA or RNA (MESPEUS database stores about 770 DNA, 763 RNA and 38 DNA/RNA complex structures without protein). Some are of ions in hydrated state (e.g. Mg(OH2)62+). Co sites mainly interact in the form of a cobalt hexamine ion (CoH18N6+3; 64.74% of the 825 sites listed in column (3)) or cobalamin molecule (14.3%). Other metal coordination groups composed of non-protein interactions include atoms from small-molecule ligands (e.g. ATP, heme groups, iron-sulfur clusters, chlorophyll derivatives etc.). Table 1 shows that there are 464 metal sites out of those presented in column (2) whose donors are only found by applying crystallographic symmetry operators (column (4)).
Table 1.
Statistical profiles of 10 metal sites, including Ca, Co, Cu, Fe, K, Mg, Mn, Na, Ni and Zn, in the MESPEUS database
| Metal | (1) All sites in database at resolution 2.5 Å or better | (2) Sites from Col. (1) with metal and donor occupancy = 1.0 | (3) Sites from Col. (2) without metal-protein interactions (ratio of Col. (3)/(2) (%) | (4) Sites from Col. (2) whose donors all require to be generated by applying crystallographic symmetry | (5) Sites from Col. (2) excluding Col. (4) and structure determined at 1.5 Å or better | (6) Sites from Col. (5) with underestimated coordination number without applying crystallographic symmetry (ratio of Col. (6)/(5) (%) |
|---|---|---|---|---|---|---|
| Mg | 44 209 | 39 578 | 11 730 (29.64) | 142 | 2802 | 681 (24.30) |
| Zn | 39 681 | 32 202 | 471 (1.46) | 61 | 3690 | 404 (10.95) |
| Fe | 39 307 | 21 788 | 334 (1.53) | 8 | 2226 | 28 (1.26) |
| Ca | 32 134 | 28 255 | 1708 (6.04) | 57 | 3092 | 415 (13.42) |
| Na | 22 562 | 19 617 | 2674 (13.63) | 154 | 3067 | 490 (15.98) |
| Mn | 10 610 | 8104 | 658 (8.12) | 8 | 486 | 29 (5.97) |
| K | 7997 | 6422 | 965 (15.03) | 20 | 522 | 91 (17.43) |
| Cu | 6901 | 4013 | 65 (1.62) | 3 | 527 | 33 (6.26) |
| Ni | 3615 | 2591 | 100 (3.86) | 5 | 377 | 24 (6.37) |
| Co | 3256 | 2244 | 825 (36.76) | 6 | 214 | 28 (13.08) |
| Total | 210 272 | 164 814 | 19 530 (11.85) | 464 | 17 003 | 2223 (13.07) |
All of the metal sites are determined by applying crystallographic symmetry. The results listed in Columns (2) to (6) are derived from the metal sites present in Column (1). The occupancy values of the metals and donor atoms listed in Columns (2) to (6) are all equal to 1. Column (1): the number of metal sites grouped by metal types; (2): metal sites with full occupancy; (3): number of metal sites with no protein donor atom(s) after applying symmetry; (4): all of the metal-coordinating atoms (including any ligand atoms) are only found in the generated symmetry-related units; (5): at least one metal-coordinating atom is present prior to applying symmetry and its structure is determined at 1.5 Å or better; (6): metal sites with underestimated coordination number when crystallographic symmetry is not applied.
As summarized in column (5) of Table 1, about 10.32% out of the metal sites listed in column (2) are at very high resolution (≤1.5 Å). These high-resolution structures were analysed in more detail and the results of applying crystallographic symmetry operation were compared. Figure 2 illustrates the distribution of the coordination numbers of each metal site before and after applying crystallographic symmetry as gray and black bars, respectively. As shown in Figure 2A, the lower coordination number (between 1 and 5) are more prevalent before symmetry has been applied. After applying crystallographic symmetry, the numbers of observed metal sites with coordination numbers from 6 to 9 increase. The major coordination numbers are 4 and 6, which is the same as the result proposed by Dokmanić in 2007 (4). Figure 2B shows even more clearly the effect of including symmetry-related ligand atoms on the coordination number (column (6), Table 1). The observations suggesting coordination numbers of 1 to 5 were corrected to number >5. These results clearly indicate that use of asymmetric structures for the determination of metal sites has underestimated the coordination number in many cases. In this newly designed MESPEUS database that includes symmetry-related metal-coordinating atoms there are significant changes in the CN statistics. This is especially clear in the groups of coordination number 1–3 which are obviously too low and can be largely corrected by the application of symmetry. On average, 13.07% out of the metal sites listed in column (5) have been corrected, and such a correction is more dramatic in Mg, K and Na sites (>15% as shown in column (6)).
Figure 2.
Distribution of metal coordination numbers (CN) observed without applying crystallographic symmetry to the atoms in the asymmetric unit (grey bars) with CN calculated after applying crystallographic symmetry (black bars). Note the total number of the metal sites in the grey and black bars are identical. Metals include Ca, Co, Cu, Fe, K, Mg, Mn, Na, Ni and Zn. The occupancy values of metal and donor atoms are restricted equal to 1.0. (A) In total, 17 003 sites were selected whose structure resolutions are at 1.5 Å or better (i.e. those present in column (5), Table 1). (B) 2223 out of 17 003 sites listed in column (5) whose CN has been underestimated when crystallographic symmetry was not applied (i.e. those present in column (6), Table 1). Their CN are corrected as additional coordinating donor atoms were found after applying crystallographic symmetry. Note that the frequencies of small coordination numbers CN=(1,2) are significantly overestimated by an asymmetric analysis. After symmetry correction, most of the corresponding sites move to group CN = 6.
Shape of a coordination group
MESPEUS stores information on metal coordination geometry and distortions away from ideality. The distortion can be evaluated by calculating the RMSD (Root Mean Square Deviation), while the actual metal-donor interbond angles are calculated and compared with the values of an ideal geometry. Regardless of bond type, coordination numbers 4, 5 and 6 have regular shapes and interbond angles of a coordination group can be (i) tetrahedral with equivalent bond angles at 109.5°, or square planar with angles at 90°, (ii) trigonal bipyramid or tetragonal pyramid and (iii) octahedral with angles at 90°. Their RMSD values can be obtained by following equation:
![]() |
where the αi is the actual value of an angle measured from the interbonds in the coordination group and the αideal stands for the ideal bond angle in a regular shape. Here n =CN(CN-1)/2 is the number of angles required to describe a shape with coordination number CN. The smaller the RMSD value, the closer is the actual shape to the ideal geometry.
Furthermore, the RMSD of a coordination group can be used to indicate the nearest shape if the coordination group tends to exhibit alternative geometries. As mentioned above, the shapes of coordination number 4 and 5 may adopt multi-geometries. In order to determine the nearest shape for such a case, the RMSD values of the alternative geometries can be measured and compared. For the example of CN = 4, the nearest shape is likely to be tetrahedral when RMSDtetrahedral < RMSDsquare planar, otherwise it would be square planar.
Other than RMSD of a shape, the MESPEUS database also stores the deviation from
symmetry (a ‘−’ symbol upon 4).
in a regular tetrahedron for coordination number 4, as illustrated in Figure 3. The
symmetry RMSD, δbar4, is useful as a quality indicator for the reported angles and the geometry of the site. A smaller δbar4 value indicates an observed shape which is closer to the ideal geometry.
Figure 3.
Illustration of
symmetry in a regular tetrahedron for the case of coordination number 4. The coordination group is composed of four donors A, B, C and D around a metal M. (A) A, M and D lie in the plane, whereas B is in front and C is behind. The
axis is shown as the dashed line bisecting the angles of AMD. The other two
axes between AMB and CMD are not shown. (B) The tetrahedron turns into a square planar when it is squashed from the
axis, and the angles should be AMD = BMC and AMB = AMC = DMC = DMB. The calculation steps of
symmetry RMSD, δbar4, in this study are defined as following (i) m1: the mean of AMD and BMC, (ii) m2: the mean of AMB, AMC, DMC, DMB, (iii) δbar4: one sixth of the sum of the squares of AMD-m1, BMC-m1, AMB-m2, AMC-m2, DMB-m2, DMC-m2. Each of the three bar 4 axis are then calculated and the minimum value is taken and stored.
The web interface
The MESPEUS web interface allows users to submit a query through various search options for types of interactions observed in a metal site with a specific resolution. Alternatively, a PDB ID can be simply given to display all of the metal sites found in that protein. The interface has been designed for performing a straightforward search by setting up the query criteria with ease through the main page as shown in Figure 4. A list of options defining metal-donor interactions is provided on the page, as tabulated below the figure. These interactions are observed from the PDB collection and stored in the MESPEUS database. As an example, demonstrated in Figure 4, a query for Mg interacting with ATP with the maximum structure resolution set at 2.5 Å has been specified. The first result page shows the list of all the metal-donor distances, coordination number, shape (for cases of the coordination number 4–6), donor atom identities, the differences to the target distances of each metal-donor pair and overall metal site (indicated by a RMSD value) as shown in Figure 5. The query results can be displayed as histograms with the statistical profiles, including the distributions of metal-donor distances, structure resolutions and coordination numbers. As the histograms are drilldown enabled, users can click on the bin marked by *, and the results shown on the page will be trimmed down to the samples whose Mg–OATP distances are within the range between 1.8 and 1.9 Å. The page allows downloading the data of the histograms and the query results as csv files for further investigation locally. An exclusion function is available for de-selecting the undesired entries before downloading or calculating distributions. The metal site details can be graphically displayed by clicking on the metal name and the PDB ID printed on the result list (marked by ** and ***). For the case that the metal site contains the donor atoms identified in the symmetry-related units, the structures of the symmetry-related units will be displayed by translucent color, as shown in Figure 6 and 7. Both of the interactive graphs use JSmol (20) to present the structures. A set of functional buttons next to the graphs are available for users to manipulate the structures with ease, such as to quickly lookup and centre the specified metal, exhibit the metal-donor angles or distances, reveal the location(s) of metal site(s) in protein, and rotate structure. Users can directly perform an immediate inspection on the structures and metal sites via these two pages.
Figure 5.
Result page of searching for Mg–ATP interaction. This yielded 2459 examples in 472 proteins. The distributions of observed Mg–OATP distances, structure resolution and coordination numbers can be illustrated as histograms. The columns of the table from the left side lists the Mg–OATP distances measured, coordination number, estimated shape with the RMSD value from the ideal geometry, metal name, donor group (atom name, residue name, chain ID and residue ID), PDB ID, resolution, the RMSD between the metal-donor atom distances and the target distances observed in the metal site and the difference between the two distances. The histograms and table can be downloaded in different formats for further investigation. *: clicking to specify this distance range only, and the table content will be updated accordingly. **: linking to a page showing the metal site graphically (like Figure 6). ***: clicking to show the whole metalloprotein with the tabulated metal site details (as shown in Figure 7).
Figure 6.
Page displaying the specified metal site found in a protein (PDB ID: 6AAP). The coordination group of the metal site is shown in 3D by JSmol (left), as well as its relation to the whole metalloprotein (right). The structure of symmetry-related units displayed by translucent color. The buttons next to the graphs allow users to manipulate the 3D pictures with ease, such as to centre and rotate the view and to add other information. Clicking on the ‘2D’ button, the metal site interaction can be illustrated in 2D diagram by LigPlot+. The information of the site geometry is listed below the pictures. For instance, the metal, Mg (green colored), is coordinated with the O atoms from GLU, ATP phosphates and two water molecules, respectively. Among them, two donors from GLU and HOH are located within symmetry-related units.
Figure 7.
Page for showing the specified metalloprotein (PDB ID: 6AAP). When a PDB ID is input in the main page or is selected in the results listed in Figure 5, the 3D structure of the metalloprotein can be displayed by JSmol, together with the details of the metal sites found in the protein and its model information. The structures of the symmetry-related units will be displayed by translucent color. Here, the protein contains three Mg-ATP sites found in its chain A, and two donor atoms of the third metal site are from symmetry-related units. *: clicking to centre the metal site shown in the view. **: linking to the page of Figure 6 showing the details of the specified metal site.
Application of MESPEUS database and web interface
A case study of surveying Mg-ATP interactions has been conducted using the MESPEUS database and web interface. The interactions occur commonly and are essential in ATP catalysis (22). A rich source of structural information is available in the MESPEUS database which can be used to provide a detailed analysis of the geometries of metal–ATP interactions. Through the query specification shown in Figure 4, the web interface can retrieve all the interactions between Mg and OATP. In total, 2459 observations were found in the database using a 2.5 Å cutoff. An example of an Mg site is shown in Figures 6 and 7. These query results were then downloaded for further investigation and filtered to remove: identical sites found in asymmetric units; proteins whose sequence identities are >70% with any other; structures with geometrical disorder (the occupancy values of donor atoms are not equal to 1.0). In total, 844 observations in 380 representative Mg-ATP sites from 282 proteins remained. The interactions between protein and Mg are normally through ASP and GLU (about 57.20% of all amino acids). ATP may interact with Mg using one, two or three phosphate groups. The pattern of bonding between Mg and ATP is most commonly through β- and γ-phosphate groups, followed by α-, β- and γ-phosphate groups (33.68% and 28.24% of all patterns, respectively). It is uncommon for Mg to coordinate with two O atoms of the same phosphate group, but this did occur in 24 cases. The coordination number distribution of representative Mg sites is shown in Figure 8. 60.79% of Mg sites have coordination number 6 at resolution ≤2.5 Å. Some observations from high resolution structures have coordination numbers lower than 4, even though symmetry-related units were taken into account. This raises questions regarding the completeness of refinement for these structures. Table 2 shows the mean distance of Mg–OATP-phosphate ordered by the structural resolution. The table excludes observations with distance >2.6 Å as they are considered to be outliers. Using the 56 observations taken from the highest resolution structures (≤1.5 Å) gives a mean Mg–OATP-phosphate distance of 2.037 Å (standard deviation: 0.110 Å).
Figure 8.
Distribution of coordination numbers for Mg–ATP sites at different structural resolutions. Observations are shown in percentage grouped by resolution. The actual number of observations is provided in Supplementary Table S2. RES: resolution.
Table 2.
Mean distance of Mg–OATP-phosphate ordered by the structural resolutions
| Max. resolutions (Å) | Proteins | Metals | Observations | Distance (STD; Å) |
|---|---|---|---|---|
| 2.5 | 271 | 363 | 768 | 2.161 (0.179) |
| 2.3 | 208 | 272 | 594 | 2.146 (0.175) |
| 2.0 | 132 | 172 | 380 | 2.126 (0.167) |
| 1.8 | 75 | 98 | 224 | 2.098 (0.162) |
| 1.5 | 19 | 22 | 56 | 2.037 (0.110) |
The observations shown here have excluded the Mg–OATP–phosphate with distance >2.6Å. STD: standard deviation.
Conclusion
The MESPEUS database stores and analyses structural and geometric information of 375 201 metal sites taken from 71 552 structures currently available in the Protein Data Bank. These metalloprotein descriptors available using the MESPEUS interface include the calculated deviation from ideal coordination geometries of metal sites. Its user-friendly web interface permits immediate identification and display of metal sites of interest. Individual metal sites can be graphically displayed on the web page for interactive inspection. The statistical profiles of any search can be presented as a series of histograms. A major advance in this version of MESPEUS is the inclusion of crystallographic symmetry to ensure completeness of the coordination sphere around the metal. The content of the MESPEUS database is automatically updated monthly and is synchronized with updates from the Protein Data Bank database.
Supplementary Material
Acknowledgements
We would like to thank Prof. Malcolm D. Walkinshaw and Dr Marjorie M. Harding for insightful comments; Dr Reuven Pnini for thoughtful discussion on symmetry operations; Mr. Xi-Yu Wang for website technical support.
Author contributions: Geng-Yu Lin and Yu-Cheng Su: Performed the experiments, analysed data and contributed equally to this work. Yen Lin Huang: Assisted the web interface development. Kun-Yi Hsin: Designed the experiments, implemented the algorithm and wrote the paper.
Contributor Information
Geng-Yu Lin, Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan.
Yu-Cheng Su, Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan.
Yen Lin Huang, Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan.
Kun-Yi Hsin, Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan.
Data availability
All data are incorporated into the article and its online supplementary material. The MESPEUS is available at http://mespeus.nchu.edu.tw/.
Supplementary data
Supplementary Data are available at NAR Online.
Funding
National Science and Technology Council, Taiwan, R.O.C. [110-2313-B-005-005]; Council of Agriculture Executive Yuan, Taiwan, R.O.C. [112AS-9.2.1-AD-U1]; National Chung Hsing University, Taiwan, R.O.C. [10737059G]. Funding for open access charge: National Chung Hsing University, Taiwan.
Conflict of interest statement. None declared.
References
- 1. Waldron K.J., Rutherford J.C., Ford D., Robinson N.J.. Metalloproteins and metal sensing. Nature. 2009; 460:823–830. [DOI] [PubMed] [Google Scholar]
- 2. Prabhulkar S., Tian H., Wang X., Zhu J.-J., Li C.-Z.. Engineered proteins: redox properties and their applications. Antioxid. Redox Signaling. 2012; 17:1796–1822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Wilson C.J., Apiyo D., Wittung-Stafshede P.. Role of cofactors in metalloprotein folding. Q. Rev. Biophys. 2004; 37:285–314. [DOI] [PubMed] [Google Scholar]
- 4. Dokmanić I., Šikić M., Tomić S.. Metals in proteins: correlation between the metal-ion type, coordination number and the amino-acid residues involved in the coordination. Acta Crystallogr. Sect. D. Biol. Crystallogr. 2008; 64:257–263. [DOI] [PubMed] [Google Scholar]
- 5. Barber-Zucker S., Shaanan B., Zarivach R.. Transition metal binding selectivity in proteins and its correlation with the phylogenomic classification of the cation diffusion facilitator protein family. Sci. Rep. 2017; 7:16381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Burley S.K., Bhikadiya C., Bi C., Bittrich S., Chao H., Chen L., Craig P.A., Crichlow G.V., Dalenberg K., Duarte J.M.. RCSB Protein Data Bank (RCSB. org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 2023; 51:D488–D508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Burley S.K., Berman H.M., Chiu W., Dai W., Flatt J.W., Hudson B.P., Kaelber J.T., Khare S.D., Kulczyk A.W., Lawson C.L.. Electron microscopy holdings of the Protein Data Bank: the impact of the resolution revolution, new validation tools, and implications for the future. Biophys. Rev. 2022; 14:1281–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Burley S.K., Bhikadiya C., Bi C., Bittrich S., Chen L., Crichlow G.V., Duarte J.M., Dutta S., Fayazi M., Feng Z.. RCSB Protein Data Bank: celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D. Protein Sci. 2022; 31:187–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Castagnetto J.M., Hennessy S.W., Roberts V.A., Getzoff E.D., Tainer J.A., Pique M.E.. MDB: the metalloprotein database and browser at the Scripps Research Institute. Nucleic Acids Res. 2002; 30:379–382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Zhang Y., Gladyshev V.N.. dbTEU: a protein database of trace element utilization. Bioinformatics. 2010; 26:700–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ireland S.M., Martin A.C.. ZincBind—the database of zinc binding sites. Database. 2019; 2019:baz006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Putignano V., Rosato A., Banci L., Andreini C.. MetalPDB in 2018: a database of metal sites in biological macromolecular structures. Nucleic Acids Res. 2018; 46:D459–D464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Tran J.B., Krezel A.. InterMetalDB: a database and browser of intermolecular metal binding sites in macromolecules with structural information. J. Proteome Res. 2021; 20:1889–1901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Yu J.-L., Wu S., Zhou C., Dai Q.-Q., Schofield C.J., Li G.-B.. MeDBA: the Metalloenzyme data bank and analysis platform. Nucleic Acids Res. 2023; 51:D593–D602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hsin K., Sheng Y.-g., Harding M.M., Taylor P., Walkinshaw M.D.. MESPEUS: a database of the geometry of metal sites in proteins. J. Appl. Crystallogr. 2008; 41:963–968. [Google Scholar]
- 16. DeLano W.L. Pymol: an open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr. 2002; 40:82–92. [Google Scholar]
- 17. wwPDB consortium Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 2019; 47:D520–D528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Harding M.M. Small revisions to predicted distances around metal sites in proteins. Acta Crystallogr. Sect. D. Biol. Crystallogr. 2006; 62:678–682. [DOI] [PubMed] [Google Scholar]
- 19. Ferrence G.M., Tovee C.A., Holgate S.J., Johnson N.T., Lightfoot M.P., Nowakowska-Orzechowska K.L., Ward S.C.. CSD communications of the Cambridge Structural Database. IUCrJ. 2023; 10:6–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Hanson R.M., Prilusky J., Renjian Z., Nakane T., Sussman J.L.. JSmol and the next-generation web-based representation of 3D molecular structure as applied to proteopedia. Isr. J. Chem. 2013; 53:207–216. [Google Scholar]
- 21. Laskowski R.A., Swindells M.B.. LigPlot+: multiple ligand–protein interaction diagrams for drug discovery. J. Chem. Inf. Model. 2011; 51:2778–2786. [DOI] [PubMed] [Google Scholar]
- 22. Buelens F.P., Leonov H., de Groot B.L., Grubmüller H.. ATP–magnesium coordination: protein structure-based force field evaluation and corrections. J. Chem. Theory Comput. 2021; 17:1922–1930. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data are incorporated into the article and its online supplementary material. The MESPEUS is available at http://mespeus.nchu.edu.tw/.










