Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Nov 19.
Published in final edited form as: Structure. 2012 Mar 7;20(3):391–396. doi: 10.1016/j.str.2012.01.010

The Protein Data Bank at 40: Reflecting on the Past to Prepare for the Future

Helen M Berman 1,*, Gerard J Kleywegt 2, Haruki Nakamura 3, John L Markley 4
PMCID: PMC3501388  NIHMSID: NIHMS410286  PMID: 22404998

Abstract

A symposium celebrating the 40th anniversary of the Protein Data Bank archive (PDB), organized by the Worldwide Protein Data Bank, was held at Cold Spring Harbor Laboratory (CSHL) October 28–30, 2011. PDB40’s distinguished speakers highlighted four decades of innovation in structural biology, from the early era of structural determination to future directions for the field.


Structural biology was born in Cambridge, England in the 1950s, when the race to the DNA double helix reached the finish line (Franklin and Gosling, 1953; Watson and Crick, 1953; Wilkins et al., 1953) and the first three-dimensional (3D) crystal structures of hemoglobin and myoglobin (Kendrew et al., 1958; Perutz et al., 1960) were determined. In the years that followed, a slow but steady trickle of new protein structures brought unexpected insights into the principles and consequences of protein structure and evolution, and contributed to our growing understanding of the intricate relationships between protein sequence, structure, and function.

In 1971, a landmark meeting was held at Cold Spring Harbor Laboratory (CSHL) entitled “Structure and Function of Proteins at the Three Dimensional Level.” At that symposium, the earliest 3D structures were described by the pioneers of structural biology in a way that led David Phillips to announce structural biology’s “coming of age” (Cold Spring Laboratory Press, 1972). The meeting also provided a venue for ongoing conversations about what it would mean for all scientists to have access to the structural data (Berman, 2008). These discussions culminated in the offer by Walter Hamilton to host the Protein Data Bank (PDB) at Brookhaven National Laboratory (BNL) (Protein Data Bank, 1971). The rest, as the saying goes, is history (Figure 1).

Figure 1. Timeline of Key PDB Events and Structural Biology Highlights, 1971–2011.

Figure 1

(Left) Key events in the evolution of the PDB.

(Right) selected key structures in the field of structural biology (Ban et al., 2000; Carter et al., 2000; Schluenzen et al., 2000; Henderson et al., 1990; Driscoll et al., 1989; Drew et al., 1981; Wang et al., 1979; Kim et al., 1973; Robertus et al., 1974).

A symposium celebrating the 40th anniversary of the PDB, organized by the leadership of the current guardians of the PDB archive, the Worldwide Protein Data Bank (wwPDB; http://wwpdb.org) (Berman et al., 2003), was held at CSHL October 28–30, 2011. Many people involved in the PDB’s past and present were in attendance––staff members of the current wwPDB partners, past PDB BNL heads Tom Koetzle and Joel Sussman, and former staff, including the PDB’s longest-serving data processor, Frances Bernstein, who annotated entries from 1974 until 1998 (Figure 2). “PDB40” was attended by almost 300 scientists from all over the world, several of whom had been at the seminal 1971 meeting. Thanks to generous funding from the National Science Foundation, National Institutes of Health, Wellcome Trust, Japan Society for the Promotion of Science, as well as more than 20 industrial sponsors, 34 students from as far away as India and South Africa were able to participate in the meeting. Almost 100 posters were presented, and in spite of rain, snow (yes, snow in New York in October), and wind pelting on the tent in which they were displayed, they engendered lively discussions until the very last minute of the meeting.

Figure 2. PDB Staff Members, Past and Present, at PDB40.

Figure 2

Members of the PDB team who attended the conference, including members from RCSB PDB, PDBe, PDBj, BMRB, and BNL. Photo by Constance Brukin used with permission. Additional information about the symposium is available from the wwPDB site (http://wwpdb.org).

The program boasted 19 distinguished speakers. Several speakers who had been a part of that early era of structural biology described their experiences determining structures before the advent of cryocooling, high-speed computers, intense X-rays, powerful detectors and automated phasing and model-building software. Michael Rossmann, who had worked in Max Perutz’s laboratory, described how he built brass models and measured coordinates with a lead plumb. Richard Henderson, who determined one of the first structures using electron microscopy (EM), recounted how when he was part of the chymotrypsin group, it took so long to measure the coordinates with the plumb (a cord with a lead bob) that the values changed because the string stretched! A research assistant at the time of the 1971 meeting, Jane Richardson showed how colored models were built using Tygon tubing filled with fluorescein. Kurt Wüthrich, who was awarded the Nobel Prize for developing the methods used to determine biomacromolecular structures by NMR spectroscopy, and others paid homage to Dick Dickerson and Irving Geis for their marvelous hand-drawn depictions of DNA and protein structures (Figure 3) (Dickerson, 1997).

Figure 3. Myoglobin Fold, 1987.

Figure 3

Illustration, Irving Geis. The protein chain is shown in blue, and the heme is shown as a gold disk. Illustration, Irving Geis. Image from the Irving Geis Collection, Howard Hughes Medical Institute. Rights owned by HHMI. Not to be reproduced without permission. Reprinted here with permission.

Many speakers commented on the importance of data sharing and the pioneering and exemplary role that the PDB has played in ensuring that biological data are kept in the public domain. Tribute was given to structural biologists such as Fred Richards, Dick Dickerson, and Max Perutz, who rallied their colleagues to deposit their data in spite of the considerable initial resistance of some, as pointed out by Hans Deisenhofer. Following the guidelines published in 1989 (International Union of Crystallography, 1989), virtually every journal requires deposition of coordinates and experimental data as a prerequisite to publication.

It was pointed out that the PDB was conceived as a global resource from the outset. In 1971, it was jointly operated at Brookhaven and the Cambridge Crystallographic Data Centre (CCDC) (Protein Data Bank, 1971). Nowadays, structures and experimental data are deposited at and processed by the wwPDB partner sites in America (RCSB PDB; http://rcsb.org), Europe (PDBe; http://pdbe.org), and Japan (PDBj; http://pdbj.org). A formal Memorandum of Understanding among the three partner institutes was signed in 2003 (Berman et al., 2003). In 2006, the BioMagResBank (BMRB) joined the wwPDB partnership to collect and curate experimental NMR data belonging to PDB entries. The wwPDB now operates as a cohesive unit of equal partners to steward the world’s structural data. wwPDB members collaborate on all issues related to the archive, including deposition and annotation policies, requirements and procedures, formats, validation standards, description of chemical components, interactions with journals, distribution, remediation, and weekly updates to the archive. All four partners maintain independent websites and services through which the archived data is presented in a variety of ways to the many user communities.

The impact that technological advances have had on the field of structural biology was emphasized in many ways. Stephen Burley pointed out that these advances have made it possible to determine a typical protein structure in 20 days rather than the 20 man-years it took 40 years ago. Nowadays, synchrotron radiation is used for 75% of the X-ray crystallographic structures deposited in the PDB. In the early 1980s, synchrotrons made it possible for Wayne Hendrickson to develop new phasing methods based on anomalous dispersion, thereby facilitating the direct determination of structures with far fewer crystals than had been possible previously. Soichi Wakatsuki showed dramatic video footage of the earthquake damage at the Photon Factory. Fortunately, the damage has now been repaired and data collection once more goes on. For the future, the community eagerly anticipates the next leap into nano-crystallography using resources such as X-ray freeelectron lasers.

In the early days of protein crystallography, protein structures were built as physical models and the atomic coordinates measured by hand. Such models have long since been replaced by well-refined structures thanks to major developments in interactive graphics model building and reciprocal-space refinement methods. Such methods work very well at relatively high resolution, but below 3–4 Å, crystallographers are often still struggling. New methods such as Deformable Elastic Networks, described by Axel Brunger, are designed to get the most out of very low-resolution data. In the field of EM, Richard Henderson demonstrated the problems that microscopists face because the samples are moving. He pointed out that these problems must be solved in order for the field of cryo-EM to make the leap to higher resolution structure determination.

Jane Richardson showed how she developed the now famous ribbon depictions of protein structures and described the evolution of the validation methods developed in her laboratory. Her interest in de novo design led to the recognition that hydrogen atoms are key to the success of these designs. The realization that many structures have serious steric clashes, which can be identified and resolved if explicit hydrogen atoms are taken into account, inspired the use of “hydrogenated” crystal structures in model validation, and was implemented in MolProbity (Davis et al., 2004).

Common themes in many of the presentations on state-of-the-art experimental structural biology included the use of groups of structures to understand complex biological systems, the use of more than one experimental method to study such systems, and an increasing number of successes in tackling membrane-protein structures. Cheryl Arrowsmith showed how she uses high-throughput X-ray and NMR methods to understand epigenetics and brute-force screening to find inhibitors against histone-modifying proteins. Using both X-ray and NMR methods, Susan Taylor’s group has made major inroads into understanding protein kinases and the human kinome. Angela Gronenborn, well known for her work in NMR spectroscopy, described how she has added cryo-EM to her arsenal of methods to study and understand HIV-capsid assembly. Wah Chiu gave an impressive account of the use of cryo-EM to investigate complex molecular machines in isolation and as part of cells and organelles. Wayne Hendrickson focused on X-ray studies to understand plant stomata (guard cells) and presented his work on membrane structures with novel pore channels. Using solution NMR methods, Ad Bax studies influenza fusion protein and its interactions when inserted into membranes. Mei Hong presented the use of sophisticated solid-state NMR techniques to study influenza membrane proteins.

The talks on bioinformatics and computational biology highlighted how the availability of tens of thousands of structures has enabled entirely new ways of doing science. For example, Janet Thornton uses bioinformatics approaches to try to understand enzyme families and function. In her presentation, she emphasized the importance of integrating structural data with those from other sources. Andrej Sali described an integrated approach to structure determination. Using data with relatively limited information content from a variety of biophysical and other methods, he demonstrated how it was possible to arrive at a very plausible model for a nuclear pore complex. David Baker explained how he uses experimental and modeling methods to design and predict structures. In a unique demonstration of community involvement and outreach, computer-game players were able to successfully fold proteins using software developed in Baker’s lab. David Searls, who is well known for his work on describing DNA as a language, is now using protein structure to create a new grammar.

The combined presentations about the history, state-of-the-art trends, and future directions of structural biology demonstrated that although the field has grown and blossomed almost beyond recognition, the characteristics that were apparent in the 1971 meeting (and which may explain much of the success of the field) are still the same. First and foremost, there is the curiosity and determination to understand biology in molecular terms and, increasingly, on larger scales (either physically, e.g., at the level of organelles and cells, or conceptually, at the level of proteomes and diseases). Whereas in 1971 the only available structure-determination method was X-ray crystallography, nowadays many other biophysical methods, such as NMR spectroscopy, cryo-EM, and small-angle scattering are contributing structural data and insights. Even more exciting is the combined use of multiple methods to study large and complex systems and the fact that molecular modeling approaches are becoming sophisticated enough to integrate the data from the many methods. As Soichi Wakatsuki and others stated, we have entered the era of an “integrated structural biology.” Another theme that has characterized the field since 1971 is the never-ending push to develop and apply new technologies and methods to solve problems and remove bottlenecks. Early on, the focus was on new instruments and computer technologies to determine structures from samples sometimes provided by others. Structural biologists have now developed new ways to crystallize proteins and to produce large amounts of pure samples. Today, high-throughput structure determination is a reality, although structural biologists will always continue to push the boundaries and tackle problems that are ambitious and difficult, but whose solution will bring important rewards and insights (and, occasionally, a Nobel Prize). The field continues to be populated with scientists of uncommon passion and persistence and with a strong sense of collegiality, community, and collaboration.

This 40th anniversary symposium highlighted some of the challenges to the wwPDB as it manages the PDB archive of the future. Although the growth is no longer exponential, it is still formidable especially in an era of flat or possibly diminished funding. Moreover, deposited structures are increasing in both size and complexity. More and more structures are determined in complex with small-molecule ligands, such as peptide inhibitors and antibiotics. Whereas the majority of structures in the PDB are still determined by X-ray crystallographic methods, the largest relative growth is seen for structures determined using EM methods (Figure 4). Furthermore, as was noted several times at the symposium, hybrid methods in which multiple experimental approaches are combined with modeling are being used increasingly to unravel the structures of large systems not amenable to any single method.

Figure 4. Number of Structures Released Per Year, Organized by Experimental Method.

Figure 4

X-ray structure counts are shown in red, EM structure counts in purple, and NMR structure counts in orange.

The wwPDB partners have anticipated many of these challenges and trends and have mechanisms in place to meet them. Validation task forces (VTFs) made up of community experts have been set up for each of the major structure-determination methods represented in the PDB. These VTFs advise the wwPDB as to what data must be archived to ensure that the depositions can be validated and what type of validation procedures should be used. Starting in 2012, EM volume maps will be collected, archived and distributed by wwPDB. The internal working format for the PDB archive, called PDBx, can accommodate any structure-determination method and structures of any size with none of the restrictions imposed by the legacy 80 column punched-card format (Bernstein et al., 1977). Following a workshop at the EBI in September 2011, a wwPDB Format Working Group is now implementing and refining this format for most of the commonly used structure-determination packages. The providers of these software packages are collaborating on its implementation with a self-imposed deadline of January 1, 2013. The planned launch of a new common wwPDB Deposition and Annotation System in 2012 will allow users to deposit models and data derived by single or multiple methods. Annotators at all wwPDB data centers will use these powerful new tools to process, annotate and validate the depositions much more efficiently.

ACKNOWLEDGMENTS

PDB40 was organized by the wwPDB directors and Stephen K. Burley (Eli Lilly & Company). We gratefully acknowledge the support of our PDB40 host, Cold Spring Harbor Laboratory. Thanks to Christine Zardecki for her help in coordinating the meeting. The wwPDB is supported by: RCSB PDB (NSF DBI 0829586, NIGMS, DOE, NLM, NCI, NINDS, and NIDDK), PDBe (EMBLEBI, Wellcome Trust, BBSRC, NIGMS, and EU), PDBj (NBDC-JST) and BMRB (NLM).

REFERENCES

  1. Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. Science. 2000;289:905–920. doi: 10.1126/science.289.5481.905. [DOI] [PubMed] [Google Scholar]
  2. Berman HM. Acta Crystallogr. A. 2008;64:88–95. doi: 10.1107/S0108767307035623. [DOI] [PubMed] [Google Scholar]
  3. Berman HM, Henrick K, Nakamura H. Nat. Struct. Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
  4. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. J. Mol. Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  5. Carter AP, Clemons WM, Brodersen DE, Morgan-Warren RJ, Wimberly BT, Ramakrishnan V. Nature. 2000;407:340–348. doi: 10.1038/35030019. [DOI] [PubMed] [Google Scholar]
  6. Cold Spring Laboratory Press. Cold Spring Harbor Symposia on Quantitative Biology. 1972;Vol 36 [Google Scholar]
  7. Davis IW, Murray LW, Richardson JS, Richardson DC. Nucleic Acids Res. 2004;32(Web Server issue):W615–W619. doi: 10.1093/nar/gkh398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dickerson RE. Protein Sci. 1997;6:2483–2484. [Google Scholar]
  9. Drew HR, Wing RM, Takano T, Broka C, Tanaka S, Itakura K, Dickerson RE. Proc. Natl. Acad. Sci. USA. 1981;78:2179–2183. doi: 10.1073/pnas.78.4.2179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Driscoll PC, Gronenborn AM, Beress L, Clore GM. Biochemistry. 1989;28:2188–2198. doi: 10.1021/bi00431a033. [DOI] [PubMed] [Google Scholar]
  11. Franklin RE, Gosling RG. Nature. 1953;171:740–741. doi: 10.1038/171740a0. [DOI] [PubMed] [Google Scholar]
  12. Henderson R, Baldwin JM, Ceska TA, Zemlin F, Beckmann E, Downing KH. JMol. Biol. 1990;213:899–929. doi: 10.1016/S0022-2836(05)80271-2. [DOI] [PubMed] [Google Scholar]
  13. International Union of Crystallography. Acta Crystallogr. A. 1989;45:658. [Google Scholar]
  14. Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. Nature. 1958;181:662–666. doi: 10.1038/181662a0. [DOI] [PubMed] [Google Scholar]
  15. Kim SH, Quigley GJ, Suddath FL, McPherson A, Sneden D, Kim JJ, Weinzierl J, Rich A. Science. 1973;179:285–288. doi: 10.1126/science.179.4070.285. [DOI] [PubMed] [Google Scholar]
  16. Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North ACT. Nature. 1960;185:416–422. doi: 10.1038/185416a0. [DOI] [PubMed] [Google Scholar]
  17. Protein Data Bank. Nat. New Biol. 1971;233:223. [Google Scholar]
  18. Robertus JD, Ladner JE, Finch JT, Rhodes D, Brown RS, Clark BFC, Klug A. Nature. 1974;250:546–551. doi: 10.1038/250546a0. [DOI] [PubMed] [Google Scholar]
  19. Schluenzen F, Tocilj A, Zarivach R, Harms J, Gluehmann M, Janell D, Bashan A, Bartels H, Agmon I, Franceschi F, Yonath A. Cell. 2000;102:615–623. doi: 10.1016/s0092-8674(00)00084-2. [DOI] [PubMed] [Google Scholar]
  20. Wang AH-J, Quigley GJ, Kolpak FJ, Crawford JL, van Boom JH, van der Marel GA, Rich A. Nature. 1979;282:680–686. doi: 10.1038/282680a0. [DOI] [PubMed] [Google Scholar]
  21. Watson JD, Crick FHC. Nature. 1953;171:737–738. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
  22. Wilkins MHF, Stokes AR, Wilson HR. Nature. 1953;171:738–740. doi: 10.1038/171738a0. [DOI] [PubMed] [Google Scholar]

RESOURCES