Skip to main content
The BMJ logoLink to The BMJ
editorial
. 2004 Jan 10;328(7431):61–62. doi: 10.1136/bmj.328.7431.61

Preserving today's scientific record for tomorrow

LOCKSS marries age old concepts of librarianship with modern technology

Victoria Reich 1,2,3, David Rosenthal 1,2,3
PMCID: PMC314037  PMID: 14715578

Let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.

Thomas Jefferson1

Information stored on paper can survive for millennia; information stored digitally today may not be recoverable this time next week. With seven million pages of new information added to the world wide web each day, the volatility of websites has emerged as an urgent problem, especially as websites are becoming the version of record for scientific journals. Three studies of links in peer reviewed journals all found their useful life to be a few years. 2-4 For Stuart Brand, president of the Long Now Foundation, “This is not a good way to run a civilization.”5 For librarians whose mission is to transmit today's intellectual, cultural, and historical output to the future, it's fast becoming a nightmare. A project initiated by Stanford University Libraries is coming to their aid.

Called LOCKSS (for “Lots of Copies Keeps Stuff Safe”), it aims to provide librarians with a cheap and easy way to collect, preserve, and provide access to their own, local copy of web published material (http://lockss.stanford.edu). The project has developed software that converts a personal computer into a digital preservation appliance. If a publisher gives permission, the appliance collects content by slowly crawling the publisher's site in the manner of a search engine. Access to the collected content is transparent; the appliance acts like a web cache to deliver requested pages from the publisher, or stored pages if the publisher fails to respond. In this way a library's readers see the subscribed pages at their original location, even though the publisher may no longer provide them there.

These appliances do not stand alone but are linked via the internet. They continually audit each other's content, comparing their versions by voting on its digest (a unique value computed from the content). If an appliance finds its copy outvoted and thus probably damaged, it can repair the damage from the appliances that outvoted it.6 LOCKSS uses this process of mutual audit and repair as the alternative of careful backups and manual auditing of the backup copies is very expensive. Librarians' defence against irreplaceable loss has always rested on redundancy (one library burns but only one of many copies of a work is destroyed). LOCKSS provides for Jefferson's “multiplication of copies,” but with an electronic twist.

Initially using content provided by the BMJ and adding other titles at an increasing rate, beta testing of the LOCKSS system is under way at 80 libraries worldwide and should go into production in spring 2004. Some 50 publishers of academic journals are supporting the project.

As flaws in digital preservation systems may not come to light until it is too late to save their content, diversity is essential. Fortunately, LOCKSS is not the only game in town. The Internet Archive makes heroic, if inevitably only partly successful, efforts to archive the entire web (www.archive.org/). The Dutch National Library is cooperating with the publisher Elsevier to preserve its journals.7 Debate continues over the economic and technical advantages of distributed versus centralised approaches to archiving, as national and institutional libraries plan for the digital future.

The LOCKSS team hopes to extend these techniques to other forms of content, for example less formal journals in the arts and humanities, and government documents. With such tools libraries can continue to serve as society's memory.

Competing interests: The LOCKSS program has received cash and support in kind from Sun Microsystems. Other computer companies currently support researchers who are contributing to the program. DR was until November 2002 employed by and holds shares in Sun Microsystems and in other computer hardware and software companies.

References


Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Publishing Group

RESOURCES