Skip to main content
Plant Physiology logoLink to Plant Physiology
editorial
. 2003 Jul;132(3):1131–1134. doi: 10.1104/pp.103.022541

Databases in the Biological Sciences. A User's Guide to the Current Copyright Landscape1

Gabriel M Ramsey 1, Elizabeth A Howard 1,*
PMCID: PMC526266  PMID: 12857794

The prevalence of databases and database analysis in the biological sciences has increased dramatically in recent years. The sequence for an entire plant genome, Arabidopsis, is now complete. With this development, a corresponding need for means of protecting the value associated with databases has arisen. Owners of databases have turned to intellectual property laws as an appropriate mechanism for protecting and leveraging that value. This creates situations where database users may inadvertently infringe upon the intellectual property contained in a database or embodied in the format of the database itself.

Although public databases are by design available to all, proprietary databases must be used in accordance with the procedures instituted by the creators of these databases. One mechanism used by owners of private databases to protect their product is to limit disclosure through contractual restrictions and maintain the information as trade secrets. Licensing agreements prohibiting unauthorized copying of a database have been upheld. However, this legal regime requires the maintenance of secrecy. For reasons ranging from theft of data to intentional disclosure based on practical, ideological, or competitive motivations, such databases may ultimately become publicly accessible. Owners of these databases must be vigilant to guard against inadvertent disclosures of their databases in published journals or public Web sites, and subscribers to the databases must be careful both to prevent dissemination of the databases to unauthorized users and to themselves restrict their use of the database to the terms allowed under their agreement with the database provider.

How do the owners and maintainers of databases protect their content and format? The strongest protection of databases is presently provided by the European Union's (EU) Database Directive. This regime was developed in response to the perception that existing European copyright protections were inadequate, leaving database owners unnecessarily exposed. The same limitations that led the EU to extend the reach of European copyright through the Database Directive can be perceived in the U.S. copyright framework as well. This discussion provides an overview of the EU database laws and current U.S. copyright law as it pertains to databases and considers the choices to be made by database users and owners.

ROBUST PROTECTION FOR BIOLOGICAL SCIENCES DATABASES. THE EU DATABASE DIRECTIVE

Currently, the most robust global protections for databases exist in the EU's 1996 “Directive on the Legal Protection of Databases.” The regime was developed out of limitations in protecting the substance of data in databases and the format of the databases themselves through copyright, the system still used in the United States. Because there is a tendency toward database standardization for ease of use, copyright law offers minimal protection for the database format itself. This is because there are “creativity” requirements for copyright protection. This “creativity” requirement is best illustrated by the seminal case considered by the Supreme Court of the United States, Feist Publications, Inc. v. Rural Telephone Service Co., Inc. (499 U.S. 340, 363, 1991). In this case, the Supreme Court held that telephone “white pages” employing an alphabetical organization lacks sufficient creativity for protection because such organization “... is not only unoriginal, it is practically inevitable.” Similarly, organization of scientific data by obviously functional and widely used categories or keywords (e.g. database sequence information organized by “gene name,” “protein name,” “author names,” and “organism names”) may lack the requisite creativity for copyright (for example of standard nucleotide-nucleotide search results incorporating largely functional fields and raw sequence data, see http://www.ncbi.nlm.nih.gov/blast/). However, many have observed that it is precisely these database attributes that make copyright protection difficult that also generate the most value. Further, considerable time and money may be invested in the creation of these databases. Thus, the EU's Database Directive was put in place to create the proper incentives to reward such investment and to protect real resultant value.

The primary feature of the Database Directive is that it replaces the “creativity” or “originality” requirement of copyright with protection for the effort and investment in accumulation of data. The Directive dramatically achieves this goal by applying copyright-type protection to certain compilations of data regardless of creative organization (see Council Directive 96/9, 1996 O.J. [L77/20]). The creator of a database can prevent the “extraction and/or reutilization of the whole or of a substantial part” of a database, as long as the creator can show that there has been “qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents” (Council Directive 96/9, 1996 O.J. [L77/7]). What constitutes “substantial” remains an open question that will only finally be resolved by guidance from the courts. Thus, protection for biological databases in the EU extends further than in the United States, which continues to rely on copyright, as will be discussed.

Users of European databases should be aware that the protection under the EU Directive lasts for 15 years and starts anew if the database has been substantially changed. Again, the threshold for what constitutes “substantial” is omitted from the language of the Directive; thus, a database user is well advised to err on the side of assuming that all EU databases are protected, without a clear basis for believing otherwise. Protection under the regime is available primarily to creators of databases in EU Member States and then only on a reciprocal basis. However, as the implementing legislation in individual member states has played out, even this aggressive framework has not yet decisively dealt with the issue of data protection. First, a number of member states have been reluctant to enact implementing legislation, and there has been confusion regarding what the Directive requires of such legislation (see Hugenholtz, 2001). Second, courts interpreting such legislation have done so inconsistently, although all have generally issued narrow interpretations.

The breadth of protection in Europe, then, is far greater than in the United States under U.S. copyright laws. In fact, to date, the courts of several countries—specifically, Belgium, France, Germany, Spain, and the UK—have issued injunctions or found damages under the Directive. U.S. companies can certainly capitalize on this fact by moving some of its operations to Europe or by finding a local partner to benefit from the Directive. What this means to users of privately available databases housed in Europe is that use of the database will be subject not only to the terms of the license under which access to the database is provided but also subject to the broad protections afforded by the EU Directive. What this means to users of publicly available databases housed in Europe is that they should refrain from wholesale copying of part or all of a database without investigating whether the database is protected from this very activity.

U.S. COPYRIGHT LAW MAY PROVIDE ONLY LIMITED PROTECTION

Many of the concerns that led to the development of the EU Database Directive are clearly seen in an analysis of U.S. copyright law. Although U.S. copyright can certainly provide protection for database owners, that protection is narrow, potentially risky, and largely fails to create incentives for database development. On the other hand, users of databases housed in the United States are less likely to accidentally infringe upon database owner rights. The Copyright Act is designed to protect “expression” created by an “author,” but expressly denies protection to any “idea, procedure, process, system, method of operation, concept, principle, or discovery” [17 U.S.C. § 102(b)]. In other words, copyright does not protect discovered scientific principles and ideas or discovered “facts.” Thus, much information in emerging databases in the biological sciences, particularly pure research results, is not protectable.

A DNA sequence, for example is not protectable in and of itself under copyright laws; rather, it is the organization of sequence information that may lend itself to copyright protection. Organization of sequence data, however, is one of the hallmarks of a useful database. The main benefits of databases to users lie not in housing pure data, but rather in allowing scientists to predict the meaning of those data. Even so, the form of expression chosen to articulate findings would be protectable only to the extent that it was not the only way to express the results. This can be problematic in scientific contexts because many theories and facts “lend themselves to a very limited manner of expression” (Silva v. MacLaine, 697 F. Supp. 1423, 1428; E.D. Mich. 1988).

Thus, there are relatively significant constraints on the ability of copyright to protect the substance of the data contained in biological databases. The data are likely to constitute non-copyrightable principles and natural facts. Further, accurate articulation of those principles and facts may be so limited by functional necessity that sufficiently “creative” expression may be impossible or undesirable.

COPYRIGHT PROTECTION FOR BIOLOGICAL DATABASES

Although copyright protection for individual pieces of data within biological databases is limited, “compilations” of such data may constitute copyrightable subject matter (see 17 U.S.C. § 103). When an author exercises the requisite creativity in choosing which facts to include and the placement or arrangement of those facts, such compilations may be protectable.

However, while databases are not per se un-copyrightable under U.S. copyright law, significant hurdles exist to obtaining copyright protection. First, courts have not wholly grappled with compilation “creativity,” and, thus, reliance on this theory may hold considerable risk. “Creativity” in this context usually amounts to a matter of characterization. That is, if a compilation is derived from a process characterized as “thoughtful” selection, protection may be afforded. However, selection or arrangement characterized as “obvious,” “typical,” or “routine” would be unprotected. As discussed above, the Supreme Court has observed that although originality “does not require that facts be presented in an innovative or surprising way,” it is equally true that “the selection and arrangement of facts cannot be so mechanical or routine as to require no creativity whatsoever” (Feist Publications, Inc. v. Rural Telephone Service Co., Inc., 499 U.S. 340, 362; 1991).

Although the collection of data in discrete database records, in the biological sciences context, returned by search queries are potentially protectable compilations (for example, such a query result would be a list of genes that is derived from a BLAST search, clustering analysis of mRNA, or metabolite expression data), in reality, database structures can be much more dynamic, with constantly changing records. Also, the discussion of whether such a database is copyrightable is complicated by the question of whether the physical or logical organization of the computer database itself is copyrightable. If this organization is driven by external factors, copyright protection may not be available (see Matthew Bender & Co. v. West Publishing Co., 158 F.3d 674, 682; 2d Cir. 1998; “the creative spark is missing” where “external factors so dictate selection that any person composing a compilation... would necessarily select the same categories of information”). This may be particularly problematic in scientific fields where a premium is placed on consistency and “creative” selection or organization of data may lack utility. Further, in tension with database copyright protection is the denial of protection for any “procedure,” “process,” “system,” or “method of operation.” Database structure/organization may derive from purely functional needs and may be viewed in functional terms, rather than as the result of “creative authorship.”

On the other hand, a given category is more likely to be sufficiently creative if it is only one of numerous possible choices (see e.g. American Dental Ass'n. v. Delta Dental Plans Ass'n., 126 F.3d 977, 979; 7th Cir. 1997; taxonomy of procedures was creative where they “could be classified... in any of a dozen different ways”). Similarly, copyright protection is more likely if the selection of data populating a given database record is more easily characterized as a matter of “discretion,” “personal taste,” or “judgment” (see Matthew Bender & Co. v. West Publishing Co., 158 F.3d 674, 682; 2d Cir. 1998). For example, sufficient creativity in composing the database may exist in selecting data carefully to create a database of narrow relevance—such as answering a particular question. However, even in that situation, courts would likely be inclined to deny protection if the database owner simply incorporated all relevant data on the question. That is, the “decisions” to incorporate certain information must be motivated by discretion or judgment, rather than scientific necessity. It has been observed that “creativity inheres in making nonobvious choices from among more than a few options” (see Matthew Bender & Co. v. West Publishing Co., 158 F.3d 674, 682, 2d Cir. 1998; Kregos v. Associated Press, 937 F.2d 700, 704, 2d Cir. 1991; decision to express pitching performance via nine statistics of many available statistics and data combinations sufficiently creative).

The second problem with database copyright protection is that copying data from the database is not infringement if the selection and arrangement in the new work is not substantially similar to the original work. That is, although limited protection against wholesale copying of all or a substantial portion of the database is afforded, the database owner is left exposed to rearrangement in noninfringing formats and would not be able to prevent uses of individual pieces of information. Thus, a database user's use of small portions of data from a database, even though copyrighted, may not constitute infringement.

From the perspective of the database user, then, the likelihood of encountering a copyrighted database in the United States is not high, although it is almost certain that use of any private database is going to be subject to restrictions set by the particular agreement under which access was obtained. Thus, it is important that database users understand the source of their database and that care is taken to understand the terms governing use of that database. This caution extends to not only those who seek to reproduce a given database but also to those who may themselves wish to display the database or to use design features of the database for their own private or publicly accessible database—if the database is private, any such use may be restricted or prevented by contract and as such may expose the unauthorized user to liability.

ATTEMPTING TO FILL THE GAPS. NEW PROPOSALS TO EXTEND U.S. COPYRIGHT PROTECTION FOR DATABASES

Several broad statutes, modeled on the European regime, have been proposed and rejected by Congress. These provisions have been defeated in the United States largely because of the Constitutional need to carefully balance free speech values and proprietary rights. The primary objection voiced by the Department of Justice to one of these proposals in the 105th Congress was that the First Amendment imposes “significant constraints on the ability of the government to restrict the dissemination of information that has been publicly disclosed.” Although this balance is already built into the United States copyright regime, there has been substantial disagreement regarding how the balance would work in database protection statutes that effectively extend copyright protection.

One state has recently considered passing a database protection bill. The Georgia Senate unanimously passed a bill in 2001 that is very similar in substance and scope to the EU Directive (see Georgia Database Protection and Economic Development Act of 2001, S. 214, 2001). The statute would essentially prevent unauthorized extraction of data from a database for use in commerce. The bill was held over to the 2002 session and assigned to a special judiciary subcommittee. However, it is likely that it would be even more difficult for a state to pass such legislation because in addition to First Amendment hurdles, it may also be preempted by the existing federal copyright law. Any such state legislation would ultimately have to be drawn extremely narrowly to avoid these barriers.

DIFFICULT CHOICES FOR BIOLOGICAL DATABASE OWNERS

Given the current landscape, owners of biological databases located in the United States are faced with potentially difficult decisions. On the one hand, such companies may opt to rely on U.S. copyright law. If a company's database contains more subjective content that is further from pure research results or if the database is arranged in a particularly creative format, such a strategy may be perfectly adequate. However, it seems likely that, in most cases, database owners will have to grapple with this question on an ongoing basis, given the nature and substance of many biological databases. The investment in this ongoing struggle to discern the scope of copyright and to adjust the database toward protection may be excessively burdensome.

Alternatively, database owners may turn to the more clear protections in Europe by moving databases. Although the initial cost of such an investment may be substantial, the company would avoid many of the uncertainties of copyright law. However, even taking advantage of this solution leaves the problem that the Directive is not recognized in the United States. Thus, at the end of the day, it is likely that most database owners will have to grapple with U.S. copyright laws at least to some degree. The result may be that database owners will hesitate to rely on any single protective regime, continuing to utilize licensing and trade secrets measures to the extent possible, with an eye toward falling within the European Directive and U.S. copyright law where necessary.

1

This manuscript was solicited by Robert Last as an Editor's Choice paper. Elizabeth A. Howard is a partner and Gabriel M. Ramsey is an associate in the intellectual property department of Orrick, Herrington & Sutcliffe's Silicon Valley office. The views expressed herein are solely those of the authors and should not be attributed to the firm or any of its clients.

References

  1. Hugenholtz PB (2001) The New Database Right: Early Case Law from Europe. Presentation at the Ninth Annual Conference on International IP Law and Policy, Fordham University School of Law, April 19–20, 2001. http://www.ivir.nl/publications/hugenholz/fordham2001.html

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES