Table 1.
Duplication type | Count | Description |
---|---|---|
DISTINCT | 1379 | There are a number of reasons for different citations to have a high similarity, including citations that describe related, but very distinct publications. A pair of citations identified by computer similarity, which after inspection is, for example, clearly a continuation of a study which has evolved, and the text represents new information that is categorized as a distinct and unique work |
DUPLICATE | 2443 | A pair of citations that was identical or nearly identical. The citations report on a study with the same or very similar results and conclusions. |
ERRATUM | 188 | Only a fraction of the MEDLINE records that are apparently corrections to previous entries are marked as errata. If a title/abstract pair is either labeled as errata or if it is clear that a correction has been made (author list, spelling, small changes to abstract or title wording, etc.), then the errata classification is used. |
SANCTIONED | 1619 | There are a number of reasons for different citations to have a high level of similarity, some of which play a special, very important, and very legitimate role in the reporting of science. Examples include periodic reviews, periodic guidelines, specialized databases and specialized federal register citations. Citation pairs of this type, identified through computer text similarity have been manually classified to the category sanctioned. |
NO ABSTRACT | 16 | In some cases highly similar titles are flagged as potential duplicates, but the non-identity MEDLINE record does not contain an abstract, we designate that pair as a ‘NO ABSTRACT’ to indicate that its status cannot be determined. |
UNVERIFIED | 69115 | Deja vu is a database of duplicate publications, as identified using a number of different techniques, with the principle one being text similarity comparisons. Those putative duplicates identified by any of these techniques, prior to human verification and assignment to another category, are initially loaded into these categories, and since our software also inspects the author lists, they are loaded into unverified categories that have either overlapping authors (SA) or not (DA). |
TOTAL | 74 760 |
Up to date statistics and definitions are available at http://spore.swmed.edu/dejavu/help and http://spore.swmed.edu/dejavu/statistics/.