Skip to main content
. 2020 Sep 28;8:e9916. doi: 10.7717/peerj.9916

Table 2. The automated filters used in this study.

Test Type Basis Rationale
Biodiversity institutions Error Gazetteer-based Records may have coordinates at the location of biodiversity institutions, e.g., because they were erroneously entered with the physical location of the specimen or because they represent individuals from captivity or horticulture, which were not clearly labeled as such
Equal lat/lon Error Gazetteer-based Coordinates with equal latitude and longitude are usually indicative of data entry errors
Sea Error Gazetteer-based Coordinates from terrestrial organisms in the sea are usually indicative of data entry errors, e.g., swapped latitude and longitude
Zeros Error Gazetteer-based Coordinates with plain zeros are often indicative of data entry errors
Capitals Unfit Gazetteer-based Records may be assigned to the coordinates of country capitals based on a vague locality description
Duplicates Unfit Gazetteer-based Duplicated records may add unnecessary computational burden, in particular for large scale biodiversity analyses and distribution modelling for many species
Political centroids Unfit Gazetteer-based Records may be assigned to the coordinates of the centroids of political entities based on a vague locality description
Urban areas Unfit Gazetteer-based Records from urban areas are not necessarily errors, but often represent imprecise records automatically geo-referenced from vague locality descriptions or old records from different land-use types
Basis of record Unfit Meta-data Records might be unsuitable or unreliable for certain analyses dependent on their source, e.g., ‘fossil’ or ‘unknown’
Collection year Unfit Meta-data Coordinates from old records are more likely to be imprecise or erroneous coordinates since they are derived from geo-referencing based on the locality description. This is more problematic for older records, since names or borders of places may change
Coordinate precision Unfit Meta-data Records may be unsuitable for a study if their precision is lower than the study analysis scale
Identification level Unfit Meta-data Records may be unsuitable if they are not identified to species level.
Individual count Unfit Meta-data Records may be unsuitable if the number of recorded individuals is 0 or if the count is too high. This may be related to data-entry or data-basing problems (e.g., defaulting to 0 for numerical values), indicate records from DNA barcoding and in some cases indicate records of absence.