Abstract
The worldwide Protein Data Bank (wwPDB) is the international collaboration that manages the deposition, processing and distribution of the PDB archive. The online PDB archive is a repository for the coordinates and related information for more than 38 000 structures, including proteins, nucleic acids and large macromolecular complexes that have been determined using X-ray crystallography, NMR and electron microscopy techniques. The founding members of the wwPDB are RCSB PDB (USA), MSD-EBI (Europe) and PDBj (Japan) [H.M. Berman, K. Henrick and H. Nakamura (2003) Nature Struct. Biol., 10, 980]. The BMRB group (USA) joined the wwPDB in 2006. The mission of the wwPDB is to maintain a single archive of macromolecular structural data that are freely and publicly available to the global community. Additionally, the wwPDB provides a variety of services to a broad community of users. The wwPDB website at http://www.wwpdb.org/ provides information about services provided by the individual member organizations and about projects undertaken by the wwPDB.
HISTORY AND BACKGROUND
The Protein Data Bank (PDB) was founded in 1971 to provide a repository for three-dimensional (3D) structure data of experimentally determined biological macromolecules (1–3). The PDB archive contains 3D coordinate data, information about the chemical content such as polymer sequence and ligand chemistry, information about the experiment used to derive the structure and some qualitative descriptions of the structure. When the PDB was in its infancy, the archive contained seven structures composed of loosely structured free text. Today, the PDB archive contains close to 40 000 structures and relies upon strict ontologies that define the content of these entries.
The data contained in the PDB are generated and submitted by scientists from around the globe to sites in the United States, Europe and Asia. The worldwide PDB (wwPDB) was established in 2003 to formally recognize the international nature of the PDB archive (2,4) and to ensure that the data files remain uniform in content and format. The founding members are the RCSB PDB (USA) (1), the Macromolecular Structure Database at the European Bioinformatics Institute (MSD-EBI) (5) and the Protein Data Bank Japan (PDBj) at Osaka University. These wwPDB sites share responsibilities in data deposition, processing and distribution of the PDB archive, and agree to support a single, standardized archive of structural data (Table 1). The BioMagResBank (BMRB) at the University of Wisconsin-Madison (USA) (6) became a member in 2006 and will be a deposition site for primary experimental data and PDB data.
Table 1.
A wwPDB Advisory Committee (wwPDBAC) consists of representatives appointed by each member site as well as representatives of the international X-ray, NMR and electron microscopy (EM) communities. wwPDBAC meets yearly and provides advice about policies governing the content, format and distribution of the PDB data files.
The website (http://www.wwpdb.org/) contains the formal agreement for the operation of the wwPDB organization, links to the deposition and access sites, and news and announcements about policies and projects related to the wwPDB.
MEMBER DEPOSITION SITES
The advances in protein cloning, expression, labeling, purification through to structure determination has resulted in a rapid increase in the rate at which new protein structures are determined. Progress is also being made in structure determinations of nucleic acids, particularly RNA molecules. A key component of the wwPDB is that its tools are able to efficiently capture and curate data as the amount deposited grows exponentially (Table 1). Although the sites are physically dispersed and use three different tools for data capture and processing (ADIT, ADIT-NMR and AutoDep), all the data are annotated and processed using common standards. To ensure that the core data are represented uniformly, the wwPDB sites actively collaborate to exchange core reference information (e.g. the dictionary description for ligands) and to ensure that standard practices are followed. The annotators at all sites maintain daily communication via video teleconferencing, exchange visits and email; they are currently extending and updating the annotation manuals that will be made publicly available.
Every week, the data processed at each site are forwarded to the RCSB PDB for inclusion in the archive. At present, the RCSB PDB is the archive keeper and as such has sole write access to the PDB archive.
Statistics about the PDB structures deposited and processed by the wwPDB are available from http://www.wwpdb.org/stats.html (Tables 2 and 3).
Table 2.
Year | Total depositions | Deposited to | Processed by | ||||
---|---|---|---|---|---|---|---|
RCSB PDB | PDBj | EBI | RCSB PDB | PDBj | EBI | ||
2000 | 2983 | 2445 | 10 | 528 | 2294 | 161 | 528 |
2001 | 3286 | 2673 | 118 | 495 | 2407 | 384 | 495 |
2002 | 3563 | 2769 | 289 | 505 | 2401 | 657 | 505 |
2003 | 4830 | 3488 | 673 | 669 | 3135 | 1026 | 669 |
2004 | 5508 | 3796 | 900 | 812 | 3083 | 1613 | 812 |
2005 | 6677 | 4506 | 1166 | 1005 | 3562 | 2110 | 1005 |
2006 | 4728 | 3239 | 725 | 764 | 2659 | 1305 | 764 |
Total | 31 575 | 22 916 | 3881 | 4778 | 19 545 | 7252 | 4778 |
Table 3.
Year | Total |
---|---|
2000 | 2632 |
2001 | 2840 |
2002 | 3018 |
2003 | 4185 |
2004 | 5230 |
2005 | 5421 |
2006 | 4154 |
Total | 27 480 |
DATA ACCESS: MEMBER FTP AND WEBSITES
The ‘PDB archive’ is the collection of flat files that are maintained in three different formats: the legacy PDB file format; the PDB exchange format that follows the mmCIF syntax (http://deposit.pdb.org/mmcif/); and the PDBML/XML format (7) that is a direct translation of the PDB exchange format. Each wwPDB site distributes the same PDB archive via FTP. The archive is updated weekly.
Time-stamped snapshots of the PDB archive are added each year to ftp://snapshots.rcsb.org. They provide a frozen copy of the archive as it appeared at that time for research and historical purposes. The most recent snapshot was added in January 2006. It includes the 34 421 experimentally determined coordinate files that were current (i.e. not obsolete) as of January 3, 2006, and the directory containing the frozen content as of January 6, 2005. Scripts are available to download all, or part, of a snapshot automatically.
In addition to providing access to the PDB archive, each wwPDB site provides databases and websites that provide different views and analyses of the structural data contained within the PDB archive (8–14).
DATA UNIFORMITY
wwPDB members collaborate to ensure the uniformity of the PDB archive. The PDB Exchange Dictionary consolidates content from a variety of dictionaries and includes extensions to describe NMR, EM and protein production data (15). wwPDB data processing, exchange and annotation depend upon this dictionary and the mmCIF format (16) to help make the data more consistent across the archive.
In the past, query across the complete PDB archive has been limited by missing, erroneous and inconsistently reported data, nomenclature and functional annotation. The evolution of experimental methods, functional knowledge of proteins and methods used to process these data has introduced various inconsistencies into the PDB archive and has inspired different versions of the PDB format.
Over the years, the MSD-EBI, PDBj and the RCSB PDB have been working individually on correcting errors in the archive. Under the wwPDB banner, these groups are now working to integrate all remediation efforts into a single consistent collection of data files. This work includes improving the representation of PDB small molecule data, assessing the required chemical definitions and their correspondences in PDB entries, resolving any remaining differences in the macromolecular sequences assigned by each group and resolving differences in primary citation assignments. The BMRB has been collaborating with MSD-EBI and RCSB PDB on standardizing restraint data associated with PDB depositions (17,18).
The remediated data (PDB V.2) will be made available for public review in 2007 and will form the basis of the wwPDB websites. The data released before remediation (PDB V.1) will continue to be available for the historical record.
PHASING OUT THEORETICAL MODEL DEPOSITIONS TO THE PDB ARCHIVE
Effective October 15, 2006, PDB depositions were restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules. This policy was recommended and endorsed by a working group composed of structural and computational biologists and endorsed by the wwPDB Advisory Committee. Thus, theoretical model depositions (such as models determined purely in silico using, for example, homology or ab initio methods) will no longer be accepted.
NEWS AND ANNOUNCEMENTS
The News sections of the wwPDB website gives information about the outcome of the wwPDBAC meetings and policy statements affecting the PDB data files. A recent example is the announcement of the policy for the archiving of in silico models (19).
Acknowledgments
The RCSB PDB is operated by Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego. It is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke and the National Institute of Diabetes and Digestive and Kidney Diseases. E-MSD gratefully acknowledges the support of the Wellcome Trust (GR062025MA), the EU (TEMBLOR, NMRQUAL and IIMS), CCP4, the BBSRC, the MRC and EMBL. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science and Technology (MEXT). The BMRB is supported by NIH grant LM05799 from the National Library of Medicine. Funding to pay the Open Access publication charges for this article was provided by the agencies supporting the RCSB PDB.
Conflict of interest statement. None declared.
REFERENCES
- 1.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Berman H.M., Henrick K., Nakamura H. Announcing the worldwide Protein Data Bank. Nature Struct. Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
- 3.Bernstein F.C., Koetzle T.F., Williams G.J.B., Meyer E.F., Brice M.D., Rodgers J.R., Kennard O., Shimanouchi T., Tasumi M. Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
- 4.Henrick K., Berman H.M., Nakamura H. The Protein Data Bank and the wwPDB. In: Jorde L.B., Little P.F.R., Dunn M.J., Subramaniam S., editors. Encyclopedia of Genomics, Proteomics, and Bioinformatics. Vol. 7. John Wiley & Sons Ltd, Chichester; 2005. pp. 3335–3339. [Google Scholar]
- 5.Golovin A., Oldfield T.J., Tate J.G., Velankar S., Barton G.J., Boutselakis H., Dimitropoulos D., Fillon J., Hussain A., Ionides J.M., et al. E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res. 2004;32:D211–D216. doi: 10.1093/nar/gkh078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ulrich E.L., Markley J.L., Kyogoku Y. Creation of a nuclear magnetic resonance data repository and literature database. Protein Seq. Data Anal. 1989;2:23–37. [PubMed] [Google Scholar]
- 7.Westbrook J., Ito N., Nakamura H., Henrick K., Berman H.M. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics. 2005;21:988–992. doi: 10.1093/bioinformatics/bti082. [DOI] [PubMed] [Google Scholar]
- 8.Deshpande N., Addess K.J., Bluhm W.F., Merino-Ott J.C., Townsend-Merino W., Zhang Q., Knezevich C., Xie L., Chen L., Feng Z., et al. The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res. 2005;33:D233–D237. doi: 10.1093/nar/gki057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kouranov A., Xie L., de la Cruz J., Chen L., Westbrook J., Bourne P.E., Berman H.M. The RCSB PDB information portal for structural genomics. Nucleic Acids Res. 2006;34:D302–D305. doi: 10.1093/nar/gkj120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tagari M., Tate J., Swaminathan G.J., Newman R., Naim A., Vranken W., Kapopoulou A., Hussain A., Fillon J., Henrick K., et al. E-MSD: improving data deposition and structure quality. Nucleic Acids Res. 2006;34:D287–D290. doi: 10.1093/nar/gkj163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Henrick K., Thornton J.M. PQS: a protein quarternary file server. Trends Biochem. Sci. 1998;23:358–361. doi: 10.1016/s0968-0004(98)01253-5. [DOI] [PubMed] [Google Scholar]
- 12.Kinoshita K., Nakamura H. eF-site and PDBjViewer: database and viewer for protein functional sites. Bioinformatics. 2004;20:1329–1330. doi: 10.1093/bioinformatics/bth073. [DOI] [PubMed] [Google Scholar]
- 13.Standley D.M., Toh H., Nakamura H. GASH: an improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics. 2005;6:221. doi: 10.1186/1471-2105-6-221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wako H., Kato M., Endo S. ProMode: a database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics. 2004;20:2035–2043. doi: 10.1093/bioinformatics/bth197. [DOI] [PubMed] [Google Scholar]
- 15.Westbrook J., Yang H., Feng Z., Berman H.M. The use of mmCIF architecture for PDB data management. In: Hall S.R., McMahon B., editors. International Tables for Crystallography. Vol. G. The Netherlands: Springer, Dordrecht; 2005. pp. 539–543. [Google Scholar]
- 16.Fitzgerald P.M.D., Westbrook J.D., Bourne P.E., McMahon B., Watenpaugh K.D., Berman H.M. Macromolecular dictionary (mmCIF) In: Hall S.R., McMahon B., editors. International Tables for Crystallography. Vol. G. The Netherlands: Springer, Dordrecht; 2005. pp. 295–443. [Google Scholar]
- 17.Doreleijers J.F., Mading S., Maziuk D., Sojourner K., Yin L., Zhu J., Markley J.L., Ulrich E.L. BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank. J. Biomol. NMR. 2003;26:139–146. doi: 10.1023/a:1023514106644. [DOI] [PubMed] [Google Scholar]
- 18.Doreleijers J.F., Nederveen A.J., Vranken W., Lin J., Bonvin A.M., Kaptein R., Markley J.L., Ulrich E.L. BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. J. Biomol. NMR. 2005;32:1–12. doi: 10.1007/s10858-005-2195-0. [DOI] [PubMed] [Google Scholar]
- 19.Berman H.M., Burley S.K., Chiu W., Sali A., Adzhubei A., Bourne P.E., Bryant S.H., Dunbrack J.R.L., Fidelis K., Frank J., et al. Outcome of a workshop on archiving structural models of biological macromolecules. Structure. 2006;14:1211–1217. doi: 10.1016/j.str.2006.06.005. [DOI] [PubMed] [Google Scholar]