Skip to main content
. 2022 Sep 29;8:e1085. doi: 10.7717/peerj-cs.1085

Table 2. Tasks for the heuristics-based evaluation of epidemiological data using the Wikidata SPARQL endpoint.

Each validation task is given with its identifier, a brief description of the heuristic validation criteria and an example where the data does not fit them. See the section “Constraint-driven heuristics-based validation of epidemiological data” for definitions of the epidemiological variables.

Task Description Sample filtered deficient statement
Validating qualifiers of COVID-19 epidemiological statements
V1 Verify Z as a date > November 01, 2019 COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 20
V2 Verify Q as any subclass of (P279*) of medical diagnosis (Q177719) COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020 <determination method> COVID-19 Dashboard
Ensuring the cumulative pattern of c, d, r, and t
V3 Identify c, d, r and t statements having a value in date Z+1 not superior or equal to the one in date Z (Verify if dZdZ+1, rZrZ+1, tZtZ+1, and cZcZ+1) (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 6 <point in time> March 24, 2020)
V4 Find missing values of c, d, r and t in date Z+1 where corresponding values in dates Z and Z+2 are equal (COVID-19 pandemic in X <number of cases> 5 <point in time> March 24, 2020) AND (COVID-19 pandemic in X <number of cases> 6 <point in time> March 26, 2020) AND (COVID-19 pandemic in X <number of cases> no value <point in time> March 25, 2020)
Validating values of epidemiological data for a given date
V5 Identifying c, d, r, h, and t statements with negative values COVID-19 pandemic in X <number of cases> -5 <point in time> March 25, 2020
V6 Identify h statements having a value superior to the number of cases for a date Z (COVID-19 pandemic in X <number of hospitalized cases> 15 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020)
V7 Identify c statements having a value superior or equal to the number of clinical tests for a date Z (COVID-19 pandemic in X <number of clinical tests> 4 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020)
V8 Identify c statements having a value inferior to the number of deaths for a date Z (COVID-19 pandemic in X <number of deaths> 10 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020)
V9 Identify c statements having a value inferior to the number of recoveries for a date Z (COVID-19 pandemic in X <number of recoveries> 10 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020)
V10 Comparing the epidemiological variables of a general outbreak with the ones of its components (COVID-19 pandemic in X <number of cases> 10 <point in time> March 25, 2020) AND (COVID-19 pandemic in Y <number of cases> 5 <point in time> March 25, 2020) WHERE X is a district of Y