Low-cost sensor systems for measuring air quality have received widespread scientific and media attention over recent years. It has become an established technical methodology to improve the data quality of such sensor systems by colocating them at traditional air quality monitoring stations equipped with reference instrumentation and field-calibrating individual units using various statistical techniques. Methods range from (multi)linear regression to more complex statistical techniques, often using additional predictor variables such as air temperature or relative humidity (e.g., Spinelle et al.(4)), and occasionally data not actually measured by the sensor system itself (e.g., station observations or model output). Most of these techniques improve the level of agreement between sensor-derived data and reference data, in many cases eliminating issues such as chemical interferences and sensor-to-sensor variability. It is not always clear, however, the extent to which the data arising from such processing are still a true and independent measurement by the sensor system, or some blend of secondary data and model prediction. Noticing this development, Hagler et al. (2018)(2) warned that some systems may use predictor variables for calibration in such a way that a line is crossed from justifiable and empirical correction of a known artifact to a method that is essentially a predictive statistical model. In addition, the processing steps that are carried out along the way are often not clearly communicated. The current lack of governmental or third-party standards for low-cost sensor performance(5) and occasional lack of distinction between sensors and sensor systems further complicates data processing.
Adding to the observations and recommendations made by Hagler et al. (2018)(2), we have further noticed that there is substantial and consistent confusion within both the scientific community and the interested public regarding the amount and type of processing applied to sensor data, and at what point derived data can be considered to have lost a meaningful link to quantitative traceability. The relevance of this issue to air quality sensors is significant since in most countries air quality targets and standards are set out in primary legislation and measured attainment of those targets has demanding traceability requirements. Clarity regarding the level of sensor data processing is important for evaluation of sensor technology, as well as correct use and interpretation of its data.
To address this challenge we propose a unified terminology of processing levels for low-cost air quality sensor systems. A strict sequence of processing levels is already common practice in satellite remote sensing, where it has been in wide use across multiple agencies for decades.(1) We have adapted these levels and suggest a sequence of processing levels for data from low-cost air quality sensor systems (Table1).
Table 1:
Proposed Processing Levels for Low-Cost Sensor Systems for Air Quality a
| Level | Name | Definition | Example: Gas sensors | Example: Particle sensors |
|---|---|---|---|---|
| Level-0 | Raw measurements | Original measurand produced by the sensor system | Voltage corresponding to measured quantity, e.g. current for electrochemical sensors, resistance or conductance for metal oxide sensors or transmitted light intensity for infra-red sensors | Voltage corresponding to light scattered by nephelometers, or to particle counts for bins of optical particle counters |
| Level-1 | Intermediate geophysical quantities | Estimate derived from corresponding Level-0 data, using basic physical principles or simple calibration equations, and no compensation schemes. | For electrochemical sensors, NO2 concentration in μg/m3 or ppb, using only Level-0 data from the NO2 sensor itself with no additional corrections beyond factory calibration. Essentially “raw data in concentration units”. | Binned particle counts or PM mass in μg/m3 derived from Level-0 data using simple calibration and assumed particle density |
| Level-2A | Standard geophysical quantities | Estimate using sensor plus other on-board sensors demonstrated as appropriate to use for artifact correction and directly related to measurement principle. | NO2 concentration in μg/m3 or ppb, derived from onboard NO2/NO/O3 sensors, corrected for interferences and/or T/RH effects using onboard data | PM concentration in μg/m3, corrected for T/RH effects with onboard-measured T/RH |
| Level-2B | Standard geophysical quantities-extended | As Level-2A but also using external data demonstrated as appropriate to use for artifact correction and directly related to measurement principle | As Level-2A but using external T/RH from nearby station | As Level-2A but using external T/RH from nearby station |
| Measurement/prediction boundary | ||||
| Level-3 | Advanced geophysical quantities | Estimate using sensor plus internal/external data to adjust values, not constrained to data inputs proven as causes of measurement bias or related to measurement principle | NO2 concentration in μg/m3 or ppb, corrected for T/RH effects, and using data from nearby meteorological stations or models | PM concentration in μg/m3, corrected for T/RH effects and using data from nearby stations or models |
| Level-4 | Spatially continuous geophysical quantities | Spatially continuous maps derived from network of distributed sensor systems | Map of NO2 concentrations in μg/m3 or ppb, e.g. derived using assimilation of sensor network data into physical model | Map of PM2.5 concentrations in μg/m3, e.g. derived using assimilation of sensor network data into physical model |
T/RH stands for temperature and relative humidity. The spatial support of all Levels except Level-4 is point measurements at single locations or for entire networks.
See Hagler et al. (2018).(2)
The proposed processing levels range from Level-0, indicating output from the electronically interfaced raw sensor signal, to Level-4, representing a spatially continuous map of concentrations derived from a network of sensor systems, for example using spatial interpolation or data assimilation into a chemical transport model.(3) The levels therefore represent a sequence from least processed to most processed information. Loosely mirroring the processing levels typically used for satellite data, Level-0 represents raw instrument output, Level-2 represents the standard product used for most scientific applications, and Level-4 represents a combination of the data with other spatial data sources (e.g., a model). However, in the specifics the proposed levels differ from those used in remote sensing to accommodate the unique requirements of low-cost sensor data.
The usability of data at each processing level depends on the end-user application. Level-1 or −2 data, if it meets the right standards, may be useful for measuring progress against air quality targets. Level-4 is a blended product using data from multiple sources that may be most useful and applicable for public information systems. Note that the level designation merely represents the amount of processing carried out to the data set and does not reflect data quality. The latter needs to be ensured using appropriate QA/QC strategies when sensor systems are deployed. The levels further do not have to be passed sequentially but can be labels describing the approximate amount of processing applied to a data product (e.g., directly going from Level-0 to Level-2). The levels do not imply anything about processing location (e.g., in the sensor system itself or in the cloud) or whether data is available in near real-time. Most of the sensors systems that can be readily purchased on the market nowadays offer Level-1 or Level-2 data, although some open systems also provide Level-0 data. We consider the step from Level-2 to Level-3 as the transition point from true measurements to a type of statistical prediction or modeling. All levels except Level-4 apply to individual sensor systems as well as entire networks. However, exploiting the “network knowledge” can add significant value to the data. Such cases are mostly covered by Level-3, but, once more mature, network-based processing techniques could conceivably receive their own terminology.
It should be noted that we do not believe that any of the described levels are inherently better than others, they simply serve different purposes and user communities. However, we do think it is essential that the type and amount of processing performed on a given sensor data set is communicated transparently so that the data users can make informed decisions. This is particularly important for scientific, operational, and policy applications where methods have to be thoroughly documented and their fitness for purpose demonstrated.
We believe that the presented harmonized terminology of processing levels can contribute toward this goal without requiring the sensor manufacturers to necessarily publicize their proprietary algorithms (although entirely open systems are preferable, particularly for scientific applications). It is further our hope that adoption of the suggested processing levels (or a derivation) within the community will contribute to simplifying and improving the communication between manufacturers, researchers, and other users. Overall, we think that a unified terminology is a first step toward improved data integrity and transparency and that it will ultimately lead to a better use of this new technology.
Footnotes
The authors declare no competing financial interest.
The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the United States Environmental Protection Agency. It has been subjected to Agency review and approved for publication. Mention of trade names or commercial products does not constitute an endorsement or recommendation for use.
References
- EOS Data Panel (1986). Earth Observing System: Report of the EOS Data Panel. Data and information system. Volume IIa. National Aeronautics and Space Administration, Goddard Space Flight Center. [Google Scholar]
- Hagler GS, Williams R, Papapostolou V, and Polidori A. (2018). Air Quality Sensors and Data Adjustment Algorithms: When Is It No Longer a Measurement? Environmental Science and Technology, 52(10):5530–5531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider P, Castell N, Vogt M, Dauge FR, Lahoz WA, and Bartonova A. (2017). Mapping urban air quality in near real-time using observations from low-cost sensors and model information. Environment International, 106(May):234–247. [DOI] [PubMed] [Google Scholar]
- Spinelle L, Gerboles M, Villani MG, Aleixandre M, & Bonavitacola F. (2015). Field calibration of a cluster of low-cost available sensors for air quality monitoring. Part A: Ozone and nitrogen dioxide. Sensors and Actuators B: Chemical, 215, 249–257. [Google Scholar]
- Williams R, Duvall R, Kilaru V, Hagler G, Benedict K, Rice J, Kaufman A, et al. Deliberating Performance Targets Workshop: Potential Paths for Sensor Progress. Atmospheric Environment. Published April 19, 2019, 10.1016/j.aeaoa.2019.100031 [DOI] [PMC free article] [PubMed] [Google Scholar]
