Skip to main content
The BMJ logoLink to The BMJ
. 2001 Jul 21;323(7305):166. doi: 10.1136/bmj.323.7305.166

Code of conduct is needed for publishing raw data

Gunther Eysenbach 1,2, Eun-Ryoung Sa 1,2
PMCID: PMC1120796  PMID: 11463695

Editor—Hutchon in his article showed the benefits of publishing raw data on line.1 The method of opening up raw data for research has strong parallels to the “open source” movement of the software industry, where developers freely distribute the source code and allow usage and modification.2 The open source community has learnt that this rapid evolutionary process produces better software than the traditional closed model, in which only a very few programmers can see source, and everybody else must blindly use an opaque block of bits (www.opensource.org).

Publishing raw data may in a similar way enhance the speed and quality of research, as other researchers can reanalyse the data to verify results or to draw new conclusions. Preprint servers, as well as innovative e-journals, offer possibilities to share data and encourage other scholars to participate in the research process.2 The Journal of Medical Internet Research (www.jmir.org) has, from the beginning of its existence, explicitly invited authors to attach original data that could be downloaded and dynamically analysed, for example, with JAVA applets.3 Until today, however, no author has submitted a paper with raw data. Are authors perhaps afraid that other researchers analyse their data too thoroughly, “cream off,” and publish interesting results, and thus preclude the publication of further papers? In open source genomics research, debates over priority, authorship, and credit for analysing data in depth have already arisen.4,5 If researcher A laid open the complete dataset, and researcher B discovers a new relation or other “publishable” results in the dataset, what rights of first publication does researcher A have? Researcher B could probably publish new discoveries with a simple reference to the open source—which may be unsatisfactory for researcher A, especially if he or she planned to do further analyses with the dataset.

We may need a more clear code of practice on this issue. In the open source software industry, everybody who amends open source code to produce more advanced software agrees that the new software must be open source again, a practice that could be analogously applied in biomedical publishing. Also, one may encourage a practice where authors who made available the original raw data (and also subsequent authors who generated more results with these data) should be invited to act as co-authors in any subsequent publications. This prospect may enhance the willingness of researchers to open their raw data in the first place.

References

  • 1.Hutchon DJR. Infopoints: Publishing raw data and real time statistical analysis on e-journals. BMJ. 2001;322:529–530. doi: 10.1136/bmj.322.7285.530. . (3 March.) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Eysenbach G. The impact of preprint servers and electronic publishing on biomedical research. Curr Opin Immunol. 2000;12:499–503. doi: 10.1016/s0952-7915(00)00127-8. [DOI] [PubMed] [Google Scholar]
  • 3.Eysenbach G. Welcome to the Journal of Medical Internet Research. J Med Internet Res. 1999;1:e5. www.jmir.org/1999/1/e5/ . Online available at www.jmir.org/1999/1/e5/ (accessed 2 March 2001). (accessed 2 March 2001). [Google Scholar]
  • 4.Russ AP, Aparicio SA, Carlton MB. Open-source work even more vital to genome project than to software. Nature. 2000;404:809. doi: 10.1038/35009255. [DOI] [PubMed] [Google Scholar]
  • 5.Anonymous. Debates over credit for the annotation of genomes. Nature. 2000;405:719. doi: 10.1038/35015742. [DOI] [PubMed] [Google Scholar]

Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Publishing Group

RESOURCES