Skip to main content
. 2014 Jan 16;2014:bat086. doi: 10.1093/database/bat086

Table 2.

Categorization of article fields

Category Field-group Notes
Decisive PMID Median/low applicability
Single selectivity
High accuracy
DOI Median/low applicability
Single selectivity
High accuracy
Reliable partially decisive Year High applicability (rarely null)
Low selectivity
High accuracy
ISSN and EISSN Median applicability
Median selectivity
High accuracy
Journal name High applicability
Median selectivity
Median accuracy
Abbreviations are common
Title High Applicability
High Selectivity
Median Accuracy
Missing parts case exists
Useful but not reliable Paging group (volume and issue and page) Median applicability
High selectivity
Median accuracy
Missing parts are common
Author list High applicability
High selectivity
Low accuracy
Missing some author is not rare
Name word order may vary

Record fields can be categorized according to their applicability, selectivity and accuracy. Applicability is the number of non-null values/number of total records. If a field has very few null/empty values, it has high applicability. Selectivity: the average selectivity of a field is 1 – (1/number unique field values.) If a field value of the field is shared by only very few records, the field have high selectivity. Especially, if we say a field has single selectivity or decisive, it means that any non-null value of this field is unique among records. Accuracy is the average probability of correctness of any value in a field. If a field has low accuracy, it is not a good idea to use it as a duplication indicator.