Abstract
Seriously flawed and even fictional models of biomolecular crystal structures, although rare, still persist in the record of structural repositories and databases. The ensuing problems of database contamination and persistence of publications based on incorrect structure models must be effectively addressed. The burden cannot be simply left to the critical voices who take the effort to contribute dissenting comments that are mostly ignored. The entire structural biology community, and particularly the journal editors who exercise significant power in this respect, must engage in a constructive dialogue lest structural biology lose its credibility as an evidence-based empirical science.
Keywords: biomolecular crystallography, X-ray crystal structure models, structural data bases, data base integrity, error correction
Graphical Abstract
Flawed crystal structure models polluting structural repositories provide convincing examples of systemic problems affecting the entire biomedical research community and establish a timely starting point for an overdue discussion of research integrity. The burden of correcting the scientific record cannot be simply left to critical voices who take the effort to craft dissenting comments that are mostly ignored.
Falsehood flies, and truth comes limping after it, so that when men come to be undeceived, it is too late; the jest is over, and the tale hath had its effect.
— Jonathan Swift (1667–1745)
1. When one bad apple spoils the barrel
Readers of publications based on the analysis of macromolecular structures can rightfully expect that the accompanying structural models, deposited in the Protein Data Bank (PDB; [1]), are properly built and refined. Macromolecular crystallographers have been at the forefront of establishing standards for data and model deposition, and most leading journals have followed the ethics standards of the community by requesting mandatory deposition of model coordinates and, more recently, of the diffraction data. While the path from raw diffraction data to processed structure factors, as currently deposited, and from there to the resulting electron density reconstruction, is of respectable mathematical objectivity, the interpretation of this experimental electron density in terms of an atomic model allows for significant individual freedom, inversely proportional to the quality of the experiment and to the competence of the interpreter. As a result, the expectation that a model should accurately reflect the underlying evidence in the form of electron density is, on occasion, spectacularly betrayed for a variety of reasons.
While the call of the structural community for data and model deposition has been welcome by almost all professional societies and journal editors, much less attention has been afforded to spell out how to address the fate of publications presenting demonstrably incorrect structure models, and how, in order to maintain database integrity, to expunge those models from the public data repositories. Wrong and implausible models are not just a minor nuisance; they are an impediment to data mining, negatively affect meta-analyses, and in all likelihood also negatively affect the impact and credibility of the journals reporting structural studies [2, 3]. Moreover, publications including incorrect models can induce futile research by others and waste resources in a ripple effect that is difficult or even impossible to stop.
We believe that it is of great importance that the structural biology community at large, with the assistance of journal editors and reviewers, agrees on a clearly outlined path on how to expeditiously respond to publications with demonstrated serious errors in the reported structure models, and to define an effective mechanism to flag or eliminate those models from the structural record. At present, the problem of database contamination is exacerbated by the policy of the PDB that a retraction (or obsoleting, in the language used by the PDB) of model coordinates is only possible when the author of the original entry requests or permits the retraction. Only rarely does a critiqued author agree to this step, and database integrity remains compromised.
2. When self-correction fails
Based on experiences with the difficulties of correcting structural records encountered over the last decade or so, we present selected actual examples and propose in the following sections a set of recommendations aimed at restoring the integrity of the published record and structural databases.
Contaminated model data invalidate meta-analyses
In a recent paper published in the journal Proteins [4], the authors proposed novel Zn2+ coordination patterns in protein structures based on blind analysis of the data collected from the PDB. On closer look at those propositions it turned out that most, if not all, of the “new” coordination geometries were fictitious, as they were largely based on flawed models of incorrectly interpreted protein crystal structures. As this invalid analysis propounded a new vista and classification of an important branch of structural science (biological metal coordination) we thought that a speedy counteraction was of the essence, especially in view of its effect on future meta-analyses. Since the problems with the paper in question were clearly crystallography-related, we sent a manuscript expressing our reservations to Acta Crystallographica D. Our manuscript received very supportive comments of the reviewers with a very clear recommendation that it should be published. However, a point about the best venue for our critique was raised by one of the reviewers, who wondered if the correction should be sent perhaps to the original journal. By executive decision of the editors of Acta Crystallographica, and after six weeks spent there, our paper was blatantly rejected. We resubmitted it to Proteins, where it was finally accepted, several months later [5]. In this case the original journal was ultimately willing to participate in the correction of the record. With clear and generally accepted rules about where to submit such comments to - whether first to the originating, likely non-specialist journal (where it may probably be less than enthusiastically received), or to a specialized trade journal, or to a high-visibility journal - the delay could have been avoided and the ripple effect stopped much earlier.
When demonstrated fraud is not enough
In a few salient instances, journals have been remarkably unwilling to correct the published record, despite formally insisting on the need to assure good practices and reproducibility [6]. This reluctance to correct may go so far as to protect from criticism even papers that represent demonstrated fraud. A recent example is provided by the retraction [7] of the paper describing a fraudulent structure of complement C3b protein [8]. Correction of this record has a long history. Although the original paper was published as a research letter in Nature, a paper that first postulated a possible fraud took more than six months to appear, and only in the on-line version of the journal [9]. The final retraction was based on the determination of fraud by an investigation at the University of Alabama that had been completed in 2009, so it took the journal seven years to acknowledge problems with the original publication, now ten years old. Several other affected entries had however been obsoleted by the PDB earlier, raising questions about the consistency of the policies.
Particularly relevant is the fact that the affected database entry was removed from the PDB not after the publication of the conclusions of the fraud investigation, but only after the journal had published the retraction note. In this note, the corresponding author explicitly does not agree to the retraction. This bears an important consequence: Not author agreement, but the retraction notice from the journal prompted the PDB to remove (obsolete) the entry. This places an enormous responsibility on the journal editors, who have - perhaps unwittingly - become the de facto wardens of database integrity.
Another case illustrating the often enigmatic delays between critique and retraction is the case of the botulinum B light chain - Sb2 peptide complex. After the original publication in 2000 [10], the peptide was found to be absent already in 2001 [11, 12]. Finally, the model was retracted only in 2009 [13], after a long series of protracted but unpublished letters to the editors. The ripple effect of such a delay can have devastating consequences not only for structural biology, as pointed out in response to a series of ABC transporter models published, from 2001 on, with incorrect handedness [14, 15].
Immune against retraction
An interesting case illustrating the reluctance of journal editors to retract publications - while in fact acknowledging the problem - is the recent exchange in the Journal of Immunology (http://jimmunol.org/content/196/2/521.2.full) where seven antibody crystal structures [16–18] have been found not to contain meaningful models for the bound peptide antigens [19, 20]. After considering the responses [21, 22], the journal editor states in agreement with the reviewers that “The consensus of these experts is clear that the quality of the data and the level of noise within the electron density map in the Salunke study preclude tracing peptide residues within the X-ray crystal structures” [23]. Per accepted rules of empirical epistemology, a bold claim should be supported by correspondingly bold evidence. According to [24], the more extraordinary the event, the greater the need of its being supported by strong proofs. For those who attest it, being able to deceive or to have been deceived, these two causes are as much more probable as the reality of the event is less [25]. Thus, a strong claim of a peptide bound in a specific pose does require equally strong evidence, which in the above papers is simply absent. Even more disturbing is the fact that these peptides also fail the test of prior probability, meaning that they are in the zeroth percentile of expected stereochemistry, indicating high-energy, strained conformations for which even stronger experimental evidence would be necessary. Needless to say, this violation of the most basic stereochemistry known to every student of structural biology sheds a disturbing light on the presentation of such an improbable model without evidence. If models with practically zero posterior probability can remain in the PDB, it is hard to argue that database integrity can be assured. Persistence of such implausible models without proper evidence certainly invites a public discussion why journal editors seem to think that a retraction (which, as already mentioned above, appears to be the de facto necessary minimum to allow the PDB to obsolete a model, when the responsible authors do not agree to such action) would be more detrimental to their journal’s reputation than a claim which in essence can be best described, in the words of Langmuir & Hall [26], as fringe science: “The first characteristic of pathological science is that the effect being studied is often at the limits of detectability or has very low statistical significance.”. Three related structures by the same corresponding author suffering from the same fatal shortcomings have been published in Immunity [27]. After an acknowledging note by the communicating editor, received in response to a critical comment submitted in late 2014, an admittedly apologetic e-mail notice promising future action has been received in May 2016. The models in question have been featured prominently in an editorial preview [28] and cited in Scopus 67 times.
Trivial mistake, just nuisance, carelessness, or capital error?
Some problems that we encountered might be not as severe, nevertheless journals are often not willing to address them. In the publication in Science by Schumacher et al. [29] we identified several important inconsistencies. While eight structures were listed in the refinement tables, only six were actually deposited in the PDB, and it turned out that one of them did not correspond at all to what was claimed in the paper. That structure (PDB ID 4ru7) was ultimately replaced (5fc0), but the sequence of the DNA was no longer the same. Other structure models claimed much better geometry than listed in the public PDB validation reports. A letter alerting the journal Science about the problem, sent by us several months ago, was immediately rejected - despite the fact that the problem was acknowledged. Ultimately, a formal correction and update have been announced by Science during the proof stage of this Viewpoint article.
3. A call for action
From the examples and arguments presented above, it is evident that journal editors wield enormous influence and thus should accept commensurate responsibility to assume a leading role in assisting the structural biology community in maintaining the validity and integrity of structural databases. Biomolecular crystallography enjoys the great advantage of objective data-driven and reliable validation procedures. The problems exemplified by incorrect crystallographic models thus provide a reliable starting point for initiating an overdue discussion with ramifications for research integrity affecting the entire biomedical research community.
An ounce of prevention…
To avoid future disputes - which would greatly reduce but by no means eliminate the need to address critical comments on incorrect models and their correction or removal - manuscript reviewers must be provided with all information necessary to judge the model quality. This information must be presented in a form accessible also to a non-expert reviewer. The salient point here is that without evaluating the match between the experimental evidence of electron density and the model on a local level at the point of biological interest, no sincere assessment is possible. This need for local inspection has been long recognized [30] and its importance for review has been re-emphasized [31] a decade ago. Encouragingly, even editors of non-specialist journals begin to realize that providing as a minimum the PDB validation reports might prevent the most egregious transgressions of model plausibility. We quote from [23] “The Journal of Immunology now requires that the PDB Summary Validation Report (available only recently) be included with submission of the manuscript so that it is available to editors and reviewers during the review process.”. Unfortunately, in many situations the PDB summary validation reports are insufficient to pinpoint the validity of a claim and models based on local electron density interpretation. From an informal polling of colleagues we conclude that most journal editors forward to manuscript authors any reviewers’ requests for structure factors and model coordinates to allow full validation of the claims, and we are aware of only a single case where this request was denied, and the paper subsequently rejected. However, we are also aware of dissenting opinions concerning full disclosure [31] and possible conflict of interest. Here again, clear policies are missing, and journal editors should actively engage with the structural biology and crystallography community to develop acceptable, system-wide guidelines.
…still requires a pound of cure…
For models that are indisputably incorrect, reviewer opinions almost always converge, and the need to remove the model from the database should be evident. Alas, without author approval, this is at present formally not possible, and a process must be developed to address the issue objectively and independent of personal grievances. Possibilities include clearly flagging such entries in the PDB, provision of alternate entries (only meaningful if an improved model can be obtained), or simply deleting the model from the database. Engagement of journal editors with the structural community, including the PDB, will be necessary to develop policies based on objective and sound evaluation criteria. What level of evidence needs to be produced to induce retraction of a paper whose conclusions are entirely based on an invalid structure model?
An interesting situation arises in less clear-cut cases when a paper reports an incorrect model but also contains independent parts or evidence for a claim that still may be valid and where outright retraction would not be warranted. At first thought, a conceivable solution to this problem of invalid models contaminating the structural databases might be to publish a partial “correction” that would allow removal of the implausible model from the PDB while maintaining otherwise unaffected the conclusions of the publication. However, this proposal carries its own perils. It may be justified in rare cases where the model (and its generator) are practically detached from the paper, exemplified in the diffraction data fabrication case of Betv1 [32, 33]. However, the general applicability of this model-only retraction as a universal modus operandi is problematic at best: The relative ease with which macromolecular structures can now be determined - the de facto commoditization of biomolecular crystallography - introduces a moral hazard. Following this logic, one might well reason as follows: “Let us just add some crystallography to support our hypothesis: it increases our impact and street credentials, and if something turns out to be wrong, first, slim chance it gets noticed, and if, we simply write a correction and forget the bad structure model”. Here again, journal editors must develop appropriate policies and communicate them to their contributors.
…but with hopeful outlook
We are firmly convinced that the serious problems of database contamination and persistence of papers based on incorrect models can and must be effectively addressed. The burden cannot be simply placed on the whistleblowers who take the effort to dissent and prepare thoughtful but mostly futile comments. The entire structural biology community, and particularly the journal editors, who exercise significant power in this respect, must engage in a constructive dialogue lest structural biology lose its credibility as an evidence-based empirical science.
Acknowledgments
BR enjoys partial support from the Austrian Science Foundation (FWF) under project P28395-B26. This work was supported in part by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research and by U01-HG008424 grant to WM.
Abbreviations
- PDB
Protein Data Bank
- DNA
deoxyribonucleic acid
References
- 1.Berman H. The Protein Data Bank: a historical perspective. Acta Crystallogr. 2008;A64:88–95. doi: 10.1107/S0108767307035623. [DOI] [PubMed] [Google Scholar]
- 2.Minor W, Dauter Z, Helliwell JR, Jaskolski M, Wlodawer A. Safeguarding Structural Data Repositories against Bad Apples. Structure. 2016;24:216–20. doi: 10.1016/j.str.2015.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dauter Z, Wlodawer A, Minor W, Jaskolski M, Rupp B. Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining. IUCrJ. 2014;1:179–93. doi: 10.1107/S2052252514005442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yao S, Flight RM, Rouchka EC, Moseley HNB. A less-biased analysis of metalloproteins reveals novel zinc coordination geometries. Proteins: Structure, Function, and Bioinformatics. 2015;83:1470–1487. doi: 10.1002/prot.24834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Raczynska JE, Wlodawer A, Jaskolski M. Prior knowledge or freedom of interpretation? A critical look at a recently published classification of “novel” Zn binding sites. Proteins: Structure, Function, and Bioinformatics. 2016 doi: 10.1002/prot.25024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.McNutt M. Journals unite for reproducibility. Science. 2014;346:679. doi: 10.1126/science.aaa1724. [DOI] [PubMed] [Google Scholar]
- 7.Abdul Ajees A, Gunasekaran K, Volanakis JE, Narayana SV, Kotwal GJ, Krishna Murthy HM. Retraction: The structure of complement C3b provides insights into complement activation and regulation. Nature. 2016;532:268. doi: 10.1038/nature16523. [DOI] [PubMed] [Google Scholar]
- 8.Abdul Ajees A, Gunasekaran K, Volanakis JE, Narayana SV, Kotwal GJ, Murthy HM. The structure of complement C3b provides insights into complement activation and regulation. Nature. 2006;444:221–5. doi: 10.1038/nature05258. [DOI] [PubMed] [Google Scholar]
- 9.Janssen JC, Read RJ, Brunger AT, Gros P. Crystallographic evidence for deviating C3b structure? Nature. 2007;448:E1–E3. doi: 10.1038/nature06102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hanson MA, Stevens RC. Cocrystal structure of synatptobrevin-II bound to botulinum neurotoxin type B at 2.0Å resolution. Nature Struct Biol. 2000;7:687–692. doi: 10.1038/77997. [DOI] [PubMed] [Google Scholar]
- 11.Rupp B, Segelke BW. Questions about the structure of the botulinum neurotoxin B light chain in complex with a target peptide. Nature Struct Biol. 2001;8:643–664. doi: 10.1038/90361. [DOI] [PubMed] [Google Scholar]
- 12.Hanson MA, Stevens RC. Response to Rupp and Segelke. Nature Struct Biol. 2001;8:664. [Google Scholar]
- 13.Hanson MA, Stevens RC. Retraction: Cocrystal structure of synaptobrevin-II bound to botulinum neurotoxin type B at 2.0 A resolution. Nat Struct Mol Biol. 2009;16:795–795. doi: 10.1038/nsmb0709-795. [DOI] [PubMed] [Google Scholar]
- 14.Dawson RJP, Locher KP. Structure of a bacterial multidrug ABC transporter. Nature. 2006;443:180–185. doi: 10.1038/nature05155. [DOI] [PubMed] [Google Scholar]
- 15.Petsko GA. And the second shall be first. Genome Biol. 2007;8:103–105. doi: 10.1186/gb-2007-8-2-103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Khan T, Salunke DM. Structural elucidation of the mechanistic basis of degeneracy in the primary humoral response. J Immunol. 2012;188:1819–27. doi: 10.4049/jimmunol.1102701. [DOI] [PubMed] [Google Scholar]
- 17.Khan T, Salunke DM. Adjustable locks and flexible keys: plasticity of epitope-paratope interactions in germline antibodies. J Immunol. 2014;192:5398–405. doi: 10.4049/jimmunol.1302143. [DOI] [PubMed] [Google Scholar]
- 18.Tapryal S, Gaur V, Kaur KJ, Salunke DM. Structural evaluation of a mimicry-recognizing paratope: plasticity in antigen-antibody interactions manifests in molecular mimicry. J Immunol. 2013;191:456–63. doi: 10.4049/jimmunol.1203260. [DOI] [PubMed] [Google Scholar]
- 19.Stanfield RL, Pozharski E, Rupp B. Comment on Three X-ray Crystal Structure Papers. J Immunology. 2016;196:521–524. doi: 10.4049/jimmunol.1501343. [DOI] [PubMed] [Google Scholar]
- 20.Stanfield RL, Pozharski E, Rupp B. Additional Comment on Three X-ray Crystal Structure Papers. J Immunology. 2016;196:528–530. doi: 10.4049/jimmunol.1502281. [DOI] [PubMed] [Google Scholar]
- 21.Salunke DM, Khan T, Gaur V, Tapryal S, Kaur K. Response to Comment on Three X-ray Crystal Structure Papers. J Immunol. 2016;196:524–8. doi: 10.4049/jimmunol.1501474. [DOI] [PubMed] [Google Scholar]
- 22.Salunke DM, Khan T, Gaur V, Tapryal S, Kaur K. Response to Additional Comment on Three X-ray Crystal Structure Papers. J Immunol. 2016;196:530. doi: 10.4049/jimmunol.1502468. [DOI] [PubMed] [Google Scholar]
- 23.Fink PJ. Comments from the Editor-in-Chief. J Immunol. 2016;196:521. doi: 10.4049/jimmunol.1502280. [DOI] [PubMed] [Google Scholar]
- 24.Laplace PS. Essai philosophique sur les probabilités. Paris Bachelier; Paris: 1814. [Google Scholar]
- 25.Truscot FW, Emory FL. P. S. Laplace: A Philosophical Essay on Probabilities. CreateSpace Independent Publishing Platform; 2015. [Google Scholar]
- 26.Langmuir I, Hall RN. Pathological Science. Phys Today. 1989;42:36–48. [Google Scholar]
- 27.Sethi DK, Agarwal A, Manivel V, Rao KV, Salunke DM. Differential epitope positioning within the germline antibody paratope enhances promiscuity in the primary immune response. Immunity. 2006;24:429–38. doi: 10.1016/j.immuni.2006.02.010. [DOI] [PubMed] [Google Scholar]
- 28.Mariuzza RA. Multiple Paths to Multispecificity. Immunity. 2006;24:359–361. doi: 10.1016/j.immuni.2006.03.009. [DOI] [PubMed] [Google Scholar]
- 29.Schumacher MA, Tonthat NK, Lee J, Rodriguez-Castaneda FA, Chinnam NB, Kalliomaa-Sanford AK, Ng IW, Barge MT, Shaw PL, Barilla D. Structures of archaeal DNA segregation machinery reveal bacterial and eukaryotic linkages. Science. 2015;349:1120–4. doi: 10.1126/science.aaa9046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Brändén CI, Jones TA. Between objectivity and subjectivity. Nature. 1990;343:687–689. [Google Scholar]
- 31.Rupp B. Real-space solution to the problem of full disclosure. Nature. 2006;444:817. doi: 10.1038/444817b. [DOI] [PubMed] [Google Scholar]
- 32.Rupp B. Detection and analysis of unusual features in the structural model and structure-factor data of a birch pollen allergen. Acta Crystallogr. 2012;F68:366–376. doi: 10.1107/S1744309112008421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zaborsky N, Brunner M, Wallner M, Himly M, Karl T, Schwarzenbacher R, Ferreira F, Achatz G. Response to Detection and analysis of unusual features in the structural model and structure-factor data of a birch pollen allergen. Acta Crystallogr. 2012;F68:377. doi: 10.1107/S1744309112008433. [DOI] [PMC free article] [PubMed] [Google Scholar]