Table 2. Tasks for the heuristics-based evaluation of epidemiological data using the Wikidata SPARQL endpoint.
Each validation task is given with its identifier, a brief description of the heuristic validation criteria and an example where the data does not fit them. See the section “Constraint-driven heuristics-based validation of epidemiological data” for definitions of the epidemiological variables.
| Task | Description | Sample filtered deficient statement |
|---|---|---|
| Validating qualifiers of COVID-19 epidemiological statements | ||
| V1 | Verify Z as a date > November 01, 2019 | COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 20 |
| V2 | Verify Q as any subclass of (P279*) of medical diagnosis (Q177719) | COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020 <determination method> COVID-19 Dashboard |
| Ensuring the cumulative pattern of c, d, r, and t | ||
| V3 | Identify c, d, r and t statements having a value in date Z+1 not superior or equal to the one in date Z (Verify if dZ ≤ dZ+1, rZ ≤ rZ+1, tZ ≤ tZ+1, and cZ ≤ cZ+1) | (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 6 <point in time> March 24, 2020) |
| V4 | Find missing values of c, d, r and t in date Z+1 where corresponding values in dates Z and Z+2 are equal | (COVID-19 pandemic in X <number of cases> 5 <point in time> March 24, 2020) AND (COVID-19 pandemic in X <number of cases> 6 <point in time> March 26, 2020) AND (COVID-19 pandemic in X <number of cases> no value <point in time> March 25, 2020) |
| Validating values of epidemiological data for a given date | ||
| V5 | Identifying c, d, r, h, and t statements with negative values | COVID-19 pandemic in X <number of cases> -5 <point in time> March 25, 2020 |
| V6 | Identify h statements having a value superior to the number of cases for a date Z | (COVID-19 pandemic in X <number of hospitalized cases> 15 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020) |
| V7 | Identify c statements having a value superior or equal to the number of clinical tests for a date Z | (COVID-19 pandemic in X <number of clinical tests> 4 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020) |
| V8 | Identify c statements having a value inferior to the number of deaths for a date Z | (COVID-19 pandemic in X <number of deaths> 10 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020) |
| V9 | Identify c statements having a value inferior to the number of recoveries for a date Z | (COVID-19 pandemic in X <number of recoveries> 10 <point in time> March 25, 2020) AND (COVID-19 pandemic in X <number of cases> 5 <point in time> March 25, 2020) |
| V10 | Comparing the epidemiological variables of a general outbreak with the ones of its components | (COVID-19 pandemic in X <number of cases> 10 <point in time> March 25, 2020) AND (COVID-19 pandemic in Y <number of cases> 5 <point in time> March 25, 2020) WHERE X is a district of Y |