Skip to main content
. 2008 Aug 30;37(Database issue):D921–D924. doi: 10.1093/nar/gkn546

Table 1.

Déjà vu content by category and category definitions

Duplication type Count Description
DISTINCT 1379 There are a number of reasons for different citations to have a high similarity, including citations that describe related, but very distinct publications. A pair of citations identified by computer similarity, which after inspection is, for example, clearly a continuation of a study which has evolved, and the text represents new information that is categorized as a distinct and unique work
DUPLICATE 2443 A pair of citations that was identical or nearly identical. The citations report on a study with the same or very similar results and conclusions.
ERRATUM 188 Only a fraction of the MEDLINE records that are apparently corrections to previous entries are marked as errata. If a title/abstract pair is either labeled as errata or if it is clear that a correction has been made (author list, spelling, small changes to abstract or title wording, etc.), then the errata classification is used.
SANCTIONED 1619 There are a number of reasons for different citations to have a high level of similarity, some of which play a special, very important, and very legitimate role in the reporting of science. Examples include periodic reviews, periodic guidelines, specialized databases and specialized federal register citations. Citation pairs of this type, identified through computer text similarity have been manually classified to the category sanctioned.
NO ABSTRACT 16 In some cases highly similar titles are flagged as potential duplicates, but the non-identity MEDLINE record does not contain an abstract, we designate that pair as a ‘NO ABSTRACT’ to indicate that its status cannot be determined.
UNVERIFIED 69115 Deja vu is a database of duplicate publications, as identified using a number of different techniques, with the principle one being text similarity comparisons. Those putative duplicates identified by any of these techniques, prior to human verification and assignment to another category, are initially loaded into these categories, and since our software also inspects the author lists, they are loaded into unverified categories that have either overlapping authors (SA) or not (DA).
TOTAL 74 760

Up to date statistics and definitions are available at http://spore.swmed.edu/dejavu/help and http://spore.swmed.edu/dejavu/statistics/.