Table 2.
Category | Field-group | Notes |
---|---|---|
Decisive | PMID | Median/low applicability |
Single selectivity | ||
High accuracy | ||
DOI | Median/low applicability | |
Single selectivity | ||
High accuracy | ||
Reliable partially decisive | Year | High applicability (rarely null) |
Low selectivity | ||
High accuracy | ||
ISSN and EISSN | Median applicability | |
Median selectivity | ||
High accuracy | ||
Journal name | High applicability | |
Median selectivity | ||
Median accuracy | ||
Abbreviations are common | ||
Title | High Applicability | |
High Selectivity | ||
Median Accuracy | ||
Missing parts case exists | ||
Useful but not reliable | Paging group (volume and issue and page) | Median applicability |
High selectivity | ||
Median accuracy | ||
Missing parts are common | ||
Author list | High applicability | |
High selectivity | ||
Low accuracy | ||
Missing some author is not rare | ||
Name word order may vary |
Record fields can be categorized according to their applicability, selectivity and accuracy. Applicability is the number of non-null values/number of total records. If a field has very few null/empty values, it has high applicability. Selectivity: the average selectivity of a field is 1 – (1/number unique field values.) If a field value of the field is shared by only very few records, the field have high selectivity. Especially, if we say a field has single selectivity or decisive, it means that any non-null value of this field is unique among records. Accuracy is the average probability of correctness of any value in a field. If a field has low accuracy, it is not a good idea to use it as a duplication indicator.