Development and machine learning-based calibration of low-cost multiparametric stations for the measurement of CO2 and CH4 in air

R Biagi; M Ferrari; S Venturi; M Sacco; G Montegrossi; F Tassi

doi:10.1016/j.heliyon.2024.e29772

. 2024 Apr 24;10(9):e29772. doi: 10.1016/j.heliyon.2024.e29772

Development and machine learning-based calibration of low-cost multiparametric stations for the measurement of CO₂ and CH₄ in air

R Biagi ^a,^⁎, M Ferrari ^a, S Venturi ^a,^b,^c, M Sacco ^d, G Montegrossi ^b, F Tassi ^a,^b

PMCID: PMC11076643 PMID: 38720758

Abstract

The pressing issue of atmospheric pollution has prompted the exploration of affordable methods for measuring and monitoring air contaminants as complementary techniques to standard methods, able to produce high-density data in time and space. The main challenge of this low-cost approach regards the in-field accuracy and reliability of the sensors. This study presents the development of low-cost stations for high-time resolution measurements of CO₂ and CH₄ concentrations calibrated via an in-field machine learning-based method. The calibration models were built based on measurements parallelly performed with the low-cost sensors and a CRDS analyzer for CO₂ and CH₄ as reference instrument, accounting for air temperature and relative humidity as external variables.

To ensure versatility across locations, diversified datasets were collected, consisting of measurements performed in various environments and seasons. The calibration models, trained with 70 % for modeling, 15 % for validation, and 15 % for testing, demonstrated robustness with CO₂ and CH₄ predictions achieving R² values from 0.8781 to 0.9827 and 0.7312 to 0.9410, and mean absolute errors ranging from 3.76 to 1.95 ppm and 0.03 to 0.01 ppm, for CO₂ and CH₄, respectively. These promising results pave the way for extending these stations to monitor additional air contaminants, like PM, NO_x, and CO through the same calibration process, integrating them with remote data transmission modules to facilitate real-time access, control, and processing for end-users.

Keywords: Air quality, Low-cost sensors, Machine learning, Greenhouse gases

Graphical abstract

Highlights

•
Development of low-cost stations for CO₂ and CH₄ concentration measurements in air.
•
In-field calibrations using a reference instrument in different environmental settings.
•
Calibration of sensors based on a machine-learning algorithm.
•
ML algorithm providing corrected data for factors that affect the LCSs performances.
•
LCSs reliably useful for air quality monitoring.

1. Introduction

Air quality has emerged as one of the most pressing environmental issues of the modern era, posing significant risks to human health, global climate, and the overall well-being of ecosystems. According to the World Health Organization, almost all of the global population breathes air with levels of harmful pollutants exceeding those recommended by guidelines, causing up to 4.2 million premature deaths worldwide [1] primarily due to the insurgence of cardiovascular and respiratory diseases, e.g. Ref. [2], as well as cancers, e.g., Refs. [3,4]. Furthermore, several drivers of air pollution, e.g., activities involving fossil fuel combustion, contribute significantly to major climate forcers, including carbon dioxide (CO₂) and methane (CH₄), which are major greenhouse gases accountable for global warming. Therefore, managing policies to reduce air pollution offers a win-win strategy for climate change mitigation and human health safeguarding. Nevertheless, effective air quality monitoring is crucial. Traditional methods involving sophisticated equipment at stationary monitoring sites, while long-standing, encounter hindrances due to high setup costs and maintenance expenses [5]. This results in insufficient monitoring coverage in rural and non-urban areas, particularly in resource-limited regions and developing countries, leading to limited data resolution in terms of time and space that does not allow to capture the significant variability that atmospheric pollutant concentrations exhibit depending on local sources and features of the surrounding environment [6,7].

In recent years, a paradigm shift in air quality monitoring has occurred with the rise of low-cost sensors (LCSs) for detecting a wide variety of atmospheric pollutants, from particulate matter to gaseous compounds. Gas sensors mostly work on metal oxide semiconductor (MOS) and electrochemical (EC) technologies, whilst non-dispersive infrared (NDIR) and photo-ionization detectors (PID) are other less used technologies. The MOS sensors detect the target gas through the changes in the electrical proprieties (i.e., resistance or conductivity) due to the adsorption of the gas on a semiconductor film exposed to the air [8]. The EC sensors generally operate in amperometry mode, wherein the electrochemical reactions between the target gas and an electrolyte produce a current dependent on the gas concentration [8]. The NDIR technology, widely applied for CO₂ sensor making [[9], [10], [11]], is based on the spectroscopic principle in which the gas concentrations are proportional to the amount of infrared (IR) light being absorbed by the gas molecules in the air, measured as the difference between the amount of light radiated by the IR lamp and the amount of IR light received by the detector [12,13]. In the PID sensors, the air samples are ionized by UV light; this ionization process leads to the release of electrons and the creation of positively charged ions that generate an electric current signal output. The concentration of the target gas influences the number of ions produced, resulting in a higher or lower current [14].

Extensive research has been conducted on LCSs, e.g. Refs. [5,7,12,[14], [15], [16], [17], [18], [19], [20], [21], [22], [23]], pointing out that the new sensing technologies, though cannot replace traditional equipment, can create new opportunities for broadening access to air quality monitoring. LCSs provide cost-effective means to measure atmospheric pollutant levels in real-time that may enable the tracking of emitting sources [5,24]. This great potential must be accompanied by the evaluation of the accuracy and reliability of data measured by LCSs compared to those of the reference instruments. A shared concern is that these sensors cannot be employed out-of-the-box relying on manufacturer-provided conversion models for calibration [25,26], since it cannot be assumed that they exhibit the same responses to the target pollutant under standard conditions and in outdoor environments where they would be applied [27,28]. In fact, LCSs are dependent on environmental temperature and humidity, cross-sensitivity to other species, and their responses can change as they age due to factors like poisoning [5,29]. In the MOS and EC sensors, these limitations are related to the physicochemical properties of the sensors according to the type of electrolyte, electrode, or semiconductor material used, e.g., Refs. [[30], [31], [32]]; whilst the NDIR sensors for CO₂ undergo cross-sensitivity in presence of high humidity content, since H₂O absorbs the same infrared wavelength of CO₂ [12]. Many studies have supported that some of these constraints can be overcome with careful data processing and network design [27,29,[33], [34], [35], [36], [37], [38]]. Given the non-linearity and cross-sensitivity of these sensors, the challenge lies in developing a model that can convert the measured sensor parameter into an output that accounts for external variables. Regression-based models (e.g., linear regression, orthogonal regression, multiple linear regression, polynomial regression) can provide reasonable results and are still widely used for the calibration of LCSs [39]. Despite their many advantages, the calibration coefficients generally change under varying meteorological and microenvironmental conditions, not describing the very complex system of pollutants formation and dispersion in the air [28,40]. Machine learning (ML)-based algorithms have recently emerged as a promising avenue for facing calibration problems, by enhancing the applicability and reducing the effort required in this process [29,41]. Compared to the other methods, ML techniques are problem-specific and data-driven, so usually gain higher accuracies [40]. The general idea of these approaches is to co-locate LCSs next to a reference station and to train a supervised model that can correct the error of the LCSs [18]. There are different categories of supervised learning, the most common are (i) Random Forest (RF), an ensemble learning method that works by constructing a multitude of decision trees during the training phase, the results of which are used collectively to produce the final output [42]; (ii) Gradient Boosted Decision Tree, as the RF is an ensemble learning method but, instead of combining the different results of multiple decision trees at the end of the process, it combines the results during the process itself [43]; (iii) Artificial Neural Network, which are structures consisting of a large number of parallel and strongly interconnected processing units simulating the physiology of the human brain, where each processing unit is similar to a biological neuron and all neurons are organized into layers; the first layer receives input using the activation function and produces outputs, which are analyzed by the next layer of neurons [44].

In this study, we present the assembling of a network of low-cost stations, equipped with NDIR sensors for CO₂ and MOS sensors for CH₄, as well as sensors for air temperature and relative humidity, based on Arduino UNO Rev3 microcontroller boards and featured with data loggers. We aim to improve the calibration strategies of low-cost sensors by using the LinearForestRegressor (LFR) algorithm, available in the Phyton library linear-tree by Cerliani [45], an ensemble machine learning algorithm that combines the strength of Linear Regression Models with the nonparametric learning ability of RF. The choice was driven by the algorithm's relative simplicity and robustness, as well as its rapid data processing time, which are pivotal characteristics for expanding the use of these technologies to (almost) everyone. The calibration approach involved the simultaneous collection of measurements conducted with both the low-cost sensors and the Picarro G2201i Cavity Ring-Down Spectroscopy (CRDS) analyzer, used as a reference for CO₂ and CH₄, and accounting also for air temperature and relative humidity. In order to develop a general calibration model and mitigate site transferability issues, which refer to the decline in the performance of calibrated devices when moved from one location to another, e.g. Ref. [41], we built the calibration model via a dataset that encompassed several measurements collected in different environments and seasons. This approach broadens the scope of the training dataset to encompass a wide range of concentrations and environmental conditions.

2. Materials and methods

2.1. Low-cost station design

Fig. 1 displays the overall architecture of the low-cost stations. They are powered by a supply unit (1) consisting of a rechargeable 12 V car battery (to be equipped with a solar panel for extended battery life) and a voltage regulator to drop the voltage to 5 V (i.e., the operating voltage of Arduino [46]). Alternatively, they can be powered using a 9 V charging cable to be connected to a 220 V socket, depending on the availability of electrical current. The core of the setup is an Arduino UNO Rev3 board based on the Atmel ATmega328P microcontroller (2) [46]. The board has been programmed through the Arduino IDE software, an integrated development environment in C/C++ (https://www.arduino.cc/en/software), exploiting the manufacturers' libraries and the hosting code available online.

The sensing unit consists of a Sensirion SCD30 sensor for CO₂ (3), a Figaro NGM2611-E13 sensor for CH₄ (4), and an Adafruit DHT22 sensor for air temperature (T) and relative humidity (RH) (5). The Sensirion SCD30 is a digital CO₂ sensor based on NDIR technology. It measures CO₂ concentrations in the range of 400-10,000 ppm with a declared accuracy of (±30 ppm) and a response time of 2 s [47]. In addition to CO₂, it measures the temperature (from −40 to 70 °C with an accuracy of ± (0.4 °C + 0.023 × (T [°C] – 25 °C))) and humidity (0–100 % with an accuracy of ±3 % RH) of the surrounding environment using a thermistor and a capacitive humidity sensor, respectively [47]. The sensor communicates via I²C or UART bus; in this study, the I²C bus connection was used.

The Figaro NGM2611-E13 is an analog module for natural gas alarms based on the Figaro TGS 2611-E00, which operates based on the principle of MOS. Under the presence of CH₄, the sensing area (a metal oxide semiconductor, such as SnO₂ or TiO₂, in the form of granular micro-crystals; see Ref. [15] and references therein for more information) responds to the target gas molecules by exhibiting a proportional decreasing resistance (R_s) [15,23,48], calculated from the following equation [48]:

R_{s} = (\frac{V_{c}}{V_{L}} - 1) \times R_{L}

where V_c is the total circuit voltage across both the sensing area and the reference resistor (5 V), V_L is the output voltage across the reference resistor and varies in response to how the sensing area resistance (R_s) varies, and R_L is a reference resistor connected in series with the sensing area. According to Refs. [15,23], it can be challenging to determine R_L, so it could be advantageous to calculate the relative sensor response as follows:

\frac{R_{S}}{R_{0}} = \frac{(\frac{V_{C}}{V_{L}} - 1)}{(\frac{V_{C}}{V_{0}} - 1)}

where R₀ represents empirical reference resistance corresponding to the lowest measured sensor output voltage in clean air [23]. The R_s/R₀ ratio was used to convert sensor signal output to CH₄ concentration readings in the calibration procedure (Section 2.2). The sensor underwent a factory calibration at 5,000 ppm, 20 °C, and 65 % of RH, and the detection range specified by the manufacturer is 500–10,000 ppm [48]. While this mole fraction is not relevant for atmospheric concentration applications, the NGM2611-E13 was successfully used for measuring indoor [49] and outdoor [15] ambient concentrations of methane (2–9 ppm), and for flux measurements from water bodies [23,50].

The Adafruit DHT22 employs a capacitive humidity sensor and a thermistor for the measurement of ambient conditions. It measures relative humidity within a range of 0–100 % and an accuracy of 2–5%, and temperature spanning from −40 to 80 °C with an accuracy of ±0.5 °C. The sensor then generates a digital signal on the data pin [51].

The architecture includes also an Adafruit DS3231 Real-Time Clock (RTC) as a precise temporal reference (6). Finally, the data logger, consisting of an Adafruit MicroSD Breakout Board (7), allows the recording and storing of data collected by the sensors with a time resolution of 10 s in a text file on a micro-SD card. A technical note with the circuit scheme and the programming code is accessible in the Supplementary Material.

2.2. Calibration procedure

2.2.1. Measuring instruments

Aiming to correct the response of the low-cost stations in real-world environments, we developed a calibration procedure based on a ML algorithm using data measured by six low-cost stations for CO₂ and CH₄ (hereafter named stations M (mother station), 1, 2, 3, 4, and 5; the architecture and functions are explained in Section 2.1) and a Picarro G2201i, the latter being used as a reference instrument. The Picarro G2201i (hereafter referred to as Picarro) is a high-frequency (1 measure per second) CRDS analyzer of CO₂ and CH₄ concentrations (in ppm). Its operating interval ranges from 380 (average atmospheric values) to 2,000 ppm for CO₂, from 1.8 to 12 ppm for CH₄ in high-precision mode, and from 10 to 1,000 ppm for CH₄ in high-range mode [52]. The Picarro's calibration was performed at the beginning of each measuring period using the following standards (Air Liquide): (i) 380, 500, and 1,000 ppm CO₂, (ii) 1.8, 5, and 10 ppm CH₄. The precision was within 0.2 ppm CO₂ and 0.05 ppm CH₄. The instrument was further checked at the end of the measurements.

2.2.2. Measuring sites

One of the major concerns when calibrating LCSs regards site transferability, i.e., moving a calibrated device from the location where the calibration has been performed to another one, which usually leads to a performance loss due to measurement conditions beyond the training domain [41], and reference therein. To mitigate this issue and create cost-effective stations capable of delivering robust performance in different locations, i.e. encompassing a wide range of concentrations and ambient conditions, the datasets used for ML-based calibration procedure included measurements performed in different seasons and a variety of environmental settings, as follows: (i) Municipality of Scandicci (Metropolitan area of Florence, Tuscany), representing a widely urbanized and industrialized area; (ii) locality of Galluzzo (south of Florence, Tuscany), chosen as a sub-urban site; (iii) localities of Renazzo and (iv) Barbiano in the Po Plain (the first in Ferrara Province, and the latter located in Ravenna Province, Emilia-Romagna), characterized by the presence of two domestic wells emitting notably high concentrations of CH₄ (up to 16 ppm of CH₄ measured in air); (v) Vulcano Island (Aeolian Archipelago, Sicily) and (vi) Municipality of Pozzuoli (Naples, Campania), chosen as hydrothermal end-members characterized by considerable concentrations of H₂S in the air (up to hundreds of ppb), which can possibly act as an interference species for LCS; (vii) an industrial plant extracting and refining CO₂ in the Municipality of Montepulciano (Siena, Tuscany); (viii) the Padule di Fucecchio wetland, the largest Italian inner wetland, stretching between the provinces of Florence, Pistoia, Lucca and Pisa (northwestern Tuscany). It should be pointed out that, due to sporadic malfunctions in some stations, the calibration datasets varied in the amount of data, both for CO₂ and CH₄. The measurements covered the summer, fall, and winter of 2022, and the winter and part of spring of 2023. Minute-averages were obtained from the datasets acquired from each sampling site, both for the low-cost stations and the reference instrument, and further used for the calibration treatment.

2.2.3. Calibration methods

The calibration models were constructed using the LFR algorithm, available in the library linear-tree for Phyton by Cerliani (2022) (https://github.com/cerlymarco/linear-tree). The LFR is an ensemble machine learning algorithm, revised starting from the work of [53], which generalizes the well-known RF algorithm by combining it with linear models. RF is one of the best-performing learning algorithms in environmental science since it easily adapts to nonlinearities found in environmental data [54]. It is a supervised algorithm based on the construction of multiple decision trees that follows the concept of ensemble learning, where the combination of multiple ML models results in predictions that are more reliable than those of individual models. Each decision tree consists of a series of nodes, which branch out into multiple tree levels until reaching the final one, known as the leaf node. In each leaf node, there are at least one or more samples extracted from the training data. The prediction made by each tree for any set of predictors is determined by calculating the average of these samples [41]. To prevent the trees from becoming correlated with each other, RF enhances their diversity by having them grow from distinct training data subsets. This is achieved through a process known as bagging, which involves the creation of training data by repeatedly sampling from the original dataset with replacement. In other words, data is drawn from the initial sample to form the next subset, with no data being permanently removed from the input sample. Consequently, some data may be included multiple times during training, while others may not be used at all. Thus, greater stability is achieved [55]. However, being a completely non-parametric predictive algorithm, RF may display some limitations in describing the relationship between the response and the predictors, running into issues of underfitting, which occurs when the model is too simple to capture the complexity of the data, or overfitting, i.e. when the model is too complex and fits the training data too closely, but generalizes poorly to new data. Moreover, RF is not able to perform extrapolation when predictions are required on data that fall outside the domain of the training dataset. To address these limitations and achieve an accurate model over a wider concentration range, the LFR algorithm first fits a linear model on the whole dataset, then a RF is trained on the same dataset but using the residuals of the previous steps as the target. The final predictions are the sum of the raw linear predictions and the residuals modeled by the RF [45]. In this way, the strength of linear models improves the nonparametric learning ability of tree-based algorithms. The signals from the low-cost sensors (i.e., CO₂ concentrations and R_s/R₀ ratio for CO₂ and CH₄ sensors, respectively), which have been generically renamed raw concentrations in Fig. 2, and the environmental variables were set as predictors (X), or features of the models, whilst the reference station signal represented the prediction target (y).

Fig. 2 — Scheme of the calibration procedure. Six low-cost stations for CO₂ and CH₄ were co-located with a reference measurement station (Picarro G2201i). Air temperature and relative humidity were also measured as key environmental variables that can disturb the sensors' signal outputs. The low-cost sensor signal (i.e., CO₂ concentrations and R_s/R₀ ratio for CO₂ and CH₄ sensors, respectively), which have been generically renamed *raw concentrations*, and the environmental variables were set as predictors (X), or features of the models, whilst the reference station signal represented the prediction target (y). The time resolution was set to minute averages. We trained separate calibration datasets for each CO₂ and CH₄ sensor with the Linear Forest Regression (LFR) machine learning algorithm. The training models were evaluated using the R² coefficient and the mean absolute error (MAE), assessing the 95 % confidence interval through the bootstrap technique (1,000 bootstrap samples). The ability of the model to predict unknown data was evaluated on out-of-sample test data, i.e. on data that were not used during the training phases, using the R² coefficient and the MAE.

Before the construction of the predictive models, the entire datasets were processed to clean from outliers through the interquartile range statistical method (IQR). Then, datasets were divided into three parts to construct the predictive models: training, validation, and test data. The training and validation datasets were used during the learning phase. The test dataset was used afterward to evaluate the quality of the model. In this way, it was possible to determine the ability of the model to predict new cases not used during the learning phase. The training datasets were 70 % of the primary datasets, whereas the test and validate datasets included the remaining 15 % and 15 %, respectively. This type of splitting is commonly used in the supervised training of ML models [[56], [57], [58]], allowing sufficient data for training and model quality control. The degrees of freedom of the algorithms were tuned by selecting the best hyperparameter values through the GridSearchCV function, (available in the Scikit-Learn library for Phyton). These hyperparameters control the growth of the random forest and the shape of decision trees, avoiding the overfitting problem and obtaining a model with good generalization capability, i.e., the ability to transfer the high accuracy achieved in the training phase to the test one. The training models and test data were evaluated through the coefficient of correlation R² and the mean absolute error (MAE). To assess the 95 % confidence interval of R² and MAE in the training datasets, bootstrap elaborations were performed with the construction of 1,000 samples through resampling. This process involved repeatedly selecting and training the model on different subsets of the training data, allowing us to capture a range of performance outcomes and quantify the uncertainty associated with our R² and MAE estimates. A scheme of the procedure's steps is reported in Fig. 2.

3. Results

In the following sections, the summary descriptive statistical parameters of the calibration datasets are reported for each station (Table 1 and Table 2). Counts, minimum, maximum, mean, and standard deviation values of CO₂ and CH₄ concentrations measured by the low-cost stations (referred to as CO₂_station and R_s/R₀, respectively in Table 1 and Table 2, and hereafter broadly referred in the text to as raw concentrations), as well as those relative to the reference instrument (CO₂_Picarro, Table 1, and CH₄_Picarro, Table 2), are described together with the environmental parameters (T and RH).

Table 1.

Summary descriptive statistical parameters of CO₂ datasets collected for each station and used for the calibration procedure. The concentrations of CO₂, of both the Picarro reference instrument and the low-cost stations, are in ppm; temperature is in °C; relative humidity is in %.

Station 1					Station 2
	CO₂_Picarro	CO₂_station	T	RH		CO₂_Picarro	CO₂_station	T	RH
units	(ppm)	(ppm)	(°C)	(%)	units	(ppm)	(ppm)	(°C)	(%)
count	30377	30377	30377	30377	count	10732	10732	10732	10732
mean	467	518	22.2	60	mean	433	485	31.5	38
std	43.6	49.0	5.5	15.9	std	14.5	25.0	5.2	13.6
min	404	397	7.1	14	min	409	406	13.4	14
25 %	430	479	18.4	48	25 %	422	467	27.6	26
50 %	457	510	20.8	67	50 %	428	479	31.7	36
75 %	495	554	25.6	72	75 %	437	505	35.6	47
max	592	654	37.8	79	max	475	558	47.2	90
Station 3					Station 4
	CO₂_Picarro	CO₂_station	T	RH		CO₂_Picarro	CO₂_station	T	RH
units	(ppm)	(ppm)	(°C)	(%)	units	(ppm)	(ppm)	(°C)	(%)
count	9340	9340	9340	9340	count	9459	9459	9459	9459
mean	430	628	32.2	35	mean	429	337	31.8	36
std	10.6	43.6	5.0	11.9	std	10.1	20.0	5.0	12.7
min	406	517	16.2	13	min	409	276	21.2	14
25 %	422	592	28.3	24	25 %	422	323	28.1	25
50 %	426	621	32.6	33	50 %	426	333	32.4	34
75 %	433	665	36.2	42	75 %	433	349	35.7	46
max	460	778	43.4	73	max	459	406	41.8	78
Station 5					Station M

	CO₂_Picarro	CO₂_station	T	RH		CO₂_Picarro	CO₂_station	T	RH
units	(ppm)	(ppm)	(°C)	(%)	units	(ppm)	(ppm)	(°C)	(%)
count	9398	9398	9398	9398	count	9765	9765	9765	9765
mean	429	506	31.6	36	mean	430	362	30.7	41
std	9.4	21.0	5.2	12.4	std	10.8	16.7	5.2	13.2
min	407	445	16.5	14	min	407	314	14.1	15
25 %	422	492	27.8	25	25 %	422	350	26.7	29
50 %	426	504	32.5	34	50 %	426	360	31.3	39
75 %	432	520	35.7	45	75 %	433	373	34.5	50
max	457	575	42.0	77	max	462	419	42.1	77

Open in a new tab

Table 2.

Summary descriptive statistical parameters of CH₄ datasets collected for each station and used for the calibration procedure. R_s/R₀ is the relative sensor response (see Section 2.1) and is a pure number. The concentrations of CH₄, of both the Picarro reference instrument and the low-cost stations, are in ppm; temperature is in °C; and relative humidity is in %.

Station 1					Station 2
	CH₄_Picarro	Rₛ/R₀	T	RH		CH₄_Picarro	Rₛ/R₀	T	RH
units	(ppm)		(°C)	(%)	units	(ppm)		(°C)	(%)
Count	29607	29607	29607	29607	count	3221	3221	3221	3221
mean	2.14	0.78	22.0	61	mean	2.05	0.46	33.2	40
std	0.096	0.064	5.3	15.8	std	0.057	0.026	4.2	11.7
min	1.99	0.595	6.7	14	min	1.96	0.40	25.8	19
25 %	2.05	0.733	18.4	50	25 %	2.00	0.44	29.7	30
50 %	2.12	0.764	20.6	67	50 %	2.04	0.46	33.0	41
75 %	2.21	0.817	25.2	72	75 %	2.09	0.49	37.0	49
max	2.42	0.944	38.5	79	max	2.23	0.54	42.4	62
Station 3					Station 4
	CH₄_Picarro	Rₛ/R₀	T	RH		CH₄_Picarro	Rₛ/R₀	T	RH
units	(ppm)		(°C)	(%)	units	(ppm)		(°C)	(%)
count	3228	3228	3228	3228	count	3230	3230	3230	3230
mean	2.05	0.87	33.5	40	mean	2.05	0.34	33.1	41
std	0.057	0.05	4.1	11.4	std	0.057	0.020	4.2	11.7
min	1.96	0.72	26.3	18	min	1.96	0.28	25.4	20
25 %	2.00	0.83	30.1	30	25 %	2.00	0.33	29.4	31
50 %	2.04	0.85	33.3	40	50 %	2.04	0.34	33.3	41
75 %	2.09	0.92	36.9	49	75 %	2.09	0.36	36.8	49
max	2.22	1	43.4	60	max	2.23	0.39	41.8	62
Station 5					Station M
	CH₄_Picarro	Rₛ/R₀	T	RH		CH₄_Picarro	Rₛ/R₀	T	RH
units	(ppm)		(°C)	(%)	units	(ppm)		(°C)	(%)
count	10337	10337	10337	10337	count	9810	9810	9810	9810
mean	2.04	0.35	31.2	37	mean	2.05	0.07	30.4	41
std	0.04	0.024	5.1	13.1	std	0.038	0.004	5.2	13.9
min	1.96	0.26	20.6	14	min	1.97	0.06	20.5	15
25 %	2.01	0.33	27.2	26	25 %	2.01	0.07	26.3	29
50 %	2.03	0.35	31.7	37	50 %	2.03	0.07	30.4	41
75 %	2.07	0.37	35.4	46	75 %	2.07	0.07	34.5	51
max	2.15	0.41	42.0	77	max	2.15	0.08	42.1	77

Open in a new tab

3.1. CO₂ and environmental parameters datasets

The dataset gathered for station 1 was the broadest one, with a total of 30,377 data, and presented the widest CO₂ concentration range measured by the Picarro (varying from 409 to 475 ppm, mean value: 467 ppm, standard deviation: 43.6 ppm), whilst the raw concentrations, recorded by the CO₂ low-cost sensor, ranged from 404 to 592 ppm, with a mean value of 518 ppm and a standard deviation of 49.0 ppm. Temperature and relative humidity ranged from 7.1 to 37.8 °C, and from 14 to 79 %, respectively (mean values of 22.2 °C and 60 %, and standard deviations of 5.5 °C and 15.9 %, respectively) (Table 1).

The dataset collected for station 2 (10,732 data) displayed CO₂ concentrations from 409 to 475 ppm (mean value: 433 ppm, standard deviation: 14.5 ppm), and from 406 to 558 ppm (mean value: 485, standard deviation: 25.0), for the Picarro and the low-cost sensor, respectively. The temperature reached a minimum value of 13.4 °C and a maximum of 47.2 °C (mean value: 31.5 °C, standard deviation: 5.2 °C), while the relative humidity ranged from 14 to 90 % (mean value: 38 %, standard deviation: 13.6 %) (Table 1).

Concerning station 3, the model was built on a dataset of 9,340 data, with CO₂ concentrations between 406 and 460 ppm for the Picarro (mean value: 430 ppm, standard deviation: 10.6 ppm), and raw concentrations between 517 and 778 ppm (mean value: 628 ppm, standard deviation: 43.6 ppm). The temperature and relative humidity ranged from 16.2 to 43.4 °C (mean value: 32.2 °C, standard deviation: 5.0 °C) and from 13 to 73 % (mean value: 35 %, standard deviation: 11.9 %), respectively (Table 1).

Station 4's dataset (9,459 counts) displayed CO₂ concentrations measured by the Picarro ranging from 409 to 459 ppm, with a mean value of 429 ppm and a standard deviation of 10.1 ppm, and CO₂ raw concentrations ranging from 276 to 406 ppm, with a mean value of 337 ppm and a standard deviation of 20.0 ppm. The temperature varied from a minimum of 21.2 °C to a maximum of 41.8 °C, with a mean value of 31.8 °C (standard deviation: 5.0 °C). The relative humidity ranged from 14 to 78 %, with a mean value of 36 % and a standard deviation of 12.7 % (Table 1).

The dataset of Station 5 included 9,398 data. CO₂ concentrations of Picarro ranged from 407 to 457 ppm (mean value: 429 ppm, standard deviation: 9.4 ppm), whilst those of the low-cost sensor were from 445 to 575 ppm, with a mean value of 506 ppm and a standard deviation of 21.0 ppm. The temperature and relative humidity varied from 16.5 to 42 °C, and from 14 to 77 %, respectively, with mean values of 31.6 °C (standard deviation: 5.2 °C) and 36 % (standard deviation: 12.4 %), respectively (Table 1).

Finally, station M's dataset was made of 9,765 data, displaying CO₂ concentrations that varied from 407 to 462 ppm for the Picarro (mean value: 430 ppm, and standard deviation: 10.8 ppm), and from 314 to 419 ppm for the low-cost sensor (mean value: 362 ppm, and standard deviation: 16.7 ppm). The temperature was on average 30.7 °C (standard deviation: 5.2 °C), ranging between 14.1 and 42.1 °C; whilst the relative humidity ranged from 15 to 77 %, with a mean value of 41 % and a standard deviation of 13.2 % (Table 1).

3.2. CH₄ and environmental parameters datasets

Analogously to CO₂, different datasets specific to each station were gathered for CH₄ concentrations, raw data, and the relative environmental parameters to train and evaluate the calibration models, wherein the R_s/R₀ ratios were used to convert the sensor signal output (see Section 2.1) to CH₄ concentrations.

As it was for CO₂, the dataset for CH₄ calibration relative to station 1 was the largest, including 29,607 data, with CH₄ concentration values measured by the Picarro ranging from 1.99 to 2.42 ppm (mean value: 2.14 ppm, standard deviation: 0.096 ppm), and raw values (R_s/R₀) measured by the low-cost sensor ranging from 0.59 to 0.94 (mean value: 0.78, standard deviation: 0.064). The environmental parameters varied from 6.7 to 38.5 °C (mean value: 22.0 °C, standard deviation: 5.3 °C) for the temperature, and from 14 to 79 % (mean value: 61 %, standard deviation: 15.8 %) for the relative humidity (Table 2).

Station 2's dataset, counting 3,221 data, displayed CH₄ concentrations between 1.96 and 2.23 ppm (mean value: 2.05 ppm, standard deviation: 0.057 ppm), and R_s/R₀ values ranging from 0.40 to 0.54 (mean value: 0.46, standard deviation: 0.026). The temperature was on average 33.2 °C, with a minimum of 25.8 °C and a maximum of 42.4 °C (standard deviation: 4.2 °C), while relative humidity varied from 19 to 62 %, with a mean value of 40 % and a standard deviation of 11.7 % (Table 2).

For station 3, the calibration dataset consisted of 3,228 data, with Picarro's CH₄ concentrations ranging from 1.96 to 2.22 ppm (mean value: 2.05 ppm, standard deviation: 0.057 ppm), and R_s/R₀ ratios from 0.72 to 1 (mean value: 0.87, standard deviation: 0.05). The temperature varied between 26.3 and 43.4 °C, with a mean value of 33.5 °C (standard deviation: 4.1 °C), whilst the relative humidity ranged from 18 to 60 %, with a mean value of 40 % (standard deviation: 11.4 %) (Table 2).

Station 4 (3,230 data) displayed mean values of 2.05 ppm for CH₄ concentrations recorded by the Picarro (minimum value: 1.96 ppm, maximum value: 2.23 ppm, standard deviation: 0.057 ppm), and 0.34 for the R_s/R₀ ratio (minimum value: 0.28, maximum value: 0.39, standard deviation: 0.020). Temperature and relative humidity of air varied from 25.4 to 41.8 °C, and from 20 to 62 %, respectively, with means values of 33.1 °C (standard deviation: 4.2 °C) and 41 % (standard deviation: 11.7 %), respectively (Table 2).

In the calibration dataset collected for station 5 (counting 10,337 data), CH₄ concentrations varied from 1.96 to 2.15 ppm, with a mean value of 2.04 ppm and a standard deviation of 0.04 ppm. Meanwhile, the raw data of the low-cost sensor ranged from 0.26 to 0.41, with a mean value of 0.35 and a standard deviation of 0.024. The mean value of the temperature was 31.2 °C, with values ranging from 20.6 to 42.0 °C (standard deviation: 5.1 °C), while relative humidity was between 14 and 77 %, with a mean value of 37 % and a standard deviation of 13.1 % (Table 2).

Finally, for Station M a total of 9,810 data were gathered for the calibration dataset, with CH₄ concentrations acquired by the Picarro ranging from 1.97 to 2.15 ppm (mean value: 2.05 ppm, standard deviation: 0.038 ppm), and the R_s/R₀ ratio from 0.06 to 0.08 (mean value: 0.07, standard deviation: 0.004). The environmental parameters varied from 20.5 to 42.1 °C for the temperature and from 15 to 77 % for the relative humidity, with mean values of 30.4 °C (standard deviation: 5.2 °C) and 41 % (standard deviation: 13.9 %), respectively (Table 2).

4. Discussion

4.1. Assessing models fit on training data

Following the calibration models for each station, the goodness of fit between the models' output concentrations and the reference instrument concentrations during the training phases (i.e., on the 70 % of datasets randomly selected to build the models) was assessed. Through the bootstrap statistical technique, the median value of the R² coefficient and the MAE of each model were paired with the relative 95 % confidence intervals (reported between the square brackets [] in the following text) which allowed to evaluate the variability of the scores, and thus to assess the accuracy of the models and the uncertainty associated to their previsions. The distribution plots of both the R² coefficient (Fig. 3) and the MAE (Fig. 4) were obtained through frequency histograms to which the kernel density estimates (KDE) were superimposed, using 20 classes for the histograms and the default parameters bw_method=’scott’ and bw_adjust =1 to calculate the bandwidth in KDE (seaborn library).

Fig. 3 — A) R² across 1,000 bootstrap samples of the training models for CO₂, showing median and 95 % confidence intervals. B) MAE across 1,000 bootstrap samples of the training models for CO₂, showing median and 95 % confidence intervals.

All the models for CO₂ and CH₄ calibrations were shown to be well correlated to the reference values during the training phase, with R² values ranging from 0.9855 [0.9844, 0.9865] (station 5; Fig. 3A) to 0.9974 [0.9972, 0.9975] (station 1; Fig. 3A), for CO₂ (R² scores and confidence intervals for each station are reported in Fig. 3A), and from 0.9611 [0.9598, 0.9624] (station 1; Fig. 4A) to 0.9918 [0.9912,0.9924] (station M; Fig. 4A), for CH₄ (R² values and confidence intervals for each station are reported in Fig. 4A). The R² distribution curves, both for CO₂ and CH₄ models, displayed normal distributions with narrow intervals of confidence around the median values (red dashed lines in Fig. 3, Fig. 4), demonstrating the predictive models to be robust and accurate. As just depicted, CO₂ calibration models showed on average slightly higher values of R² than those obtained for CH₄, this may be due to the generally larger size of the datasets used to train the models, and the wider range of concentrations experienced during the training window, whilst CH₄ values were around those of the atmospheric background. For this reason, each CO₂ model was trained on a more diversified dataset, resulting in higher R² values and a more accurate predictive ability. CH₄ models, on the other hand, having trained on smaller and less variable datasets, produced lower, but anyway optimal, R² values.

Further evaluation of calibration models' accuracy was carried out based on the MAE and its 95 % confidence interval, for both CO₂ and CH₄, (Fig. 3, Fig. 4B, respectively). In particular, the MAE ranged from 0.71 [0.69, 0.73] to 1.44 [1.42, 1.47] ppm for CO₂ (stations 4 and 1, respectively; Fig. 3B) and from 0.0023 [0.00225, 0.00240] to 0.0127 ppm [0.0126, 0.0129] for CH₄ (stations M and 1, respectively; Fig. 4B). Although the model of CO₂ relative to station 1 showed the best R² score, the MAE was higher than the other models. This may be due to training performed with a dataset that had on average higher reference concentration values (as reported in Table 1, Section 3.1). Analogously, station 1's model for CH₄ calibration has suffered the highest MAE, but in this case, it was associated with the worst R² value. Anyway, the magnitude of these MAEs can be considered more than satisfying, confirming the good performance of the calibration models thus trained. In a similar fashion to the R² distributions, the MAE distributions (Fig. 4) suggested that the training models can be seen, in the first analysis, as reliable and robust.

4.2. Evaluation of models using test data

To test the performance of the calibration models, they were applied to the testing data that were not used for model ﬁtting (i.e., the remaining 15 % of the total dataset). This was a key step to further assess the quality and the generalization ability of the models when predicting new data, providing an unbiased sense of model effectiveness.

The binary plots in Fig. 5A and B depict, respectively, CO₂ and CH₄ concentrations resulting from correction using the calibration models (full points), juxtaposed with the raw sensors' signals (shaded points), alongside the actual measured reference concentrations. It is evident that linear regression fit models between the raw sensors’ signals and the reference data are entirely unsuitable, as indicated by R² scores ranging from −21.3543 (station 4, Table 3) to −0.1899 (station 1, Table 3) for CO₂ sensors, and from −2,506 (station M, Table 3) to −403 (station 5, Table 3) for CH₄ sensors (note that the R² values were computed using the function sklearn. metrics.r2_score, which can return negative values). On the other hand, the R² values performed on the test data relative to CO₂ showed excellent performances, with values ranging from 0.8781 (station 5, Table 3) to 0.9827 (station 1, Table 3), and MAE values of 2.22 and 3.76 ppm, respectively (Table 3). Regression lines integrating the test data (Table 3) had a slope (m) close to or equal to 1, demonstrating the efficiency of the calibration procedure. However, for some stations, the y-axis intersection (b) differed from 0. While this deviation from the origin intersection fell within the mean absolute error for stations 1 and 2 (Table 3), the y-axis intersection values for stations 4 and 5 were −10.03 and + 6.08 ppm, respectively (Table 3). Therefore, these shifts from zero are to be taken into account when using stations 4 and 5 for CO₂ measurements.

Table 3.

R² scores reached by each calibration model, for both CO₂ and CH₄, and the relative mean absolute error (MAE) and mean absolute percentage error (MAPE). R² values between the raw sensors’ response and the reference values are also reported (R²raw concentrations).

ID	CO₂					CH₄
ID	R² raw concentrations	R² calibration	MAE	MAPE	regression line (y = mx + b)	R² raw concentrations	R² calibration	MAE	MAPE	regression line (y = mx + b)
Station 1	−0.1899	0.9827	3.76	0.81 %	y = 1.00x - 0.85	−471	0.7312	0.034	1.58 %	y = 1.00x - 0.000
Station 2	−4.0782	0.9467	2.19	0.50 %	y = 1.01x - 2.36	−603	0.8988	0.012	0.60 %	y = 1.01x - 0.025
Station 3	−20.3856	0.8906	2.24	0.52 %	y = 0.99x + 3.58	−553	0.8830	0.013	0.66 %	y = 1.00x + 0.006
Station 4	−21.3543	0.9167	1.95	0.45 %	y = 1.02x - 10.03	−804	0.9077	0.012	0.60 %	y = 1.03x - 0.061
Station 5	−13.1229	0.8781	2.22	0.52 %	y = 0.99x + 6.08	−403	0.9016	0.008	0.40 %	y = 1.01x - 0.012
Station M	−15.4463	0.8969	2.14	0.50 %	y = 1.01x - 4.05	−2507	0.9410	0.006	0.30 %	y = 1.01x - 0.030

Open in a new tab

Compared to CO₂, the calibrations on CH₄ data achieved a lower correlation, with data predictions relatively more dispersed than the reference values (R² values ranging from 0.7312 to 0.9410, and corresponding MAEs of 0.03 and 0.01 ppm, for station 1 and M respectively; Table 3), but slope and y-axis intersection values close to 1 and 0 ppm, respectively (Table 3). As aforementioned, this limitation stems from the relatively smaller training datasets gather for CH₄ and the reduced variability in sensor-recorded concentrations. Moreover, although station 1 had the highest counts in the CH₄ datasets, it exhibited the poorest performance during the test phase, reflecting the lowest scores achieved in the training window (Section 3.1). This could potentially be improved through further hyperparameter tuning, which may not have yet yielded the optimal results, and taking into account other potential interferents not considered in this study. Nevertheless, the results are highly promising, yielding the model's best generalization to date. This enables us to detect concentration fluctuations at levels as low as tens of ppb, even against a backdrop of background CH₄ values, a level of sensitivity and precision that would not have been expected based on the premises of the sensor manufacturer's datasheet.

Moreover, the MAEs calculated on the test and validation datasets have been compared during the post-training phase (Fig. 6), to evaluate if the models are not subject to overfitting. The MAEs calculated on the validation and test dataset are comparable, in fact, the differences in MAE test – MAE validation are in a small range around 0, which points out a low degree of overfitting [59].

5. Conclusions

In recent years, increasing awareness of the harmful impact of air pollution on human health, the global climate, and ecosystems has emphasized the need to seek cost-effective approaches for measuring and monitoring air pollution, able to increase the availability of high-density and comprehensive data across time and space. This study demonstrates that the LFR machine learning algorithm, when applied to low-cost CO₂ and CH₄ sensors, can provide accurate data to evaluate air quality. Table 4 displays the performance of the LFR calibration, determined in this study, along with results from other calibration studies that used sensors with the same operating principles [15,17,23,60,61]. It is noteworthy that studies involving machine learning show, on average, the highest R² scores, pinpointing that non-parametric regression models are better suited to address the challenges imposed by low-cost sensors. The approach proposed for the quantification of CO₂ and CH₄ in this study showed marked improvement relative to previous efforts, with models' output exhibiting excellent correlations with the reference values (R² values exceeding 0.8781 for CO₂ and 0.7312 for CH₄, respectively). Such high correlation coefficients underline the model's effectiveness in capturing variations in atmospheric gas concentrations. Furthermore, the fractional error of the proposed models at a 1-min time resolution was minimal, with less than 1 % for CO₂ and between 0.3 % and 2 % for CH₄. These small fractional errors corresponded to mean absolute errors of less than 4 ppm for CO₂ and less than 40 ppb for CH₄. This analytical precision is fundamental for air quality monitoring and understanding the evolution of greenhouse gases, whose even minor fluctuations in concentration levels can have significant implications. Very good results were achieved also by Ref. [23] (Table 4) through a two-step calibration approach, involving several linear, power, and Michaelis Menten-based equations (Table 4, mean R² values between 0.58 and 1.00). However, it's important to note that their calibration setup was conducted under laboratory conditions. Furthermore, their study focused on using the Figaro TGS 2611-E00 for measurements in flux chambers, and the equations they propose may not be optimized for CH₄ background concentrations [23].

Table 4.

Performance (R² values) of regression models on test data from this study and previous studies [15,17,23,60,61] using CO₂ and CH₄ low-cost sensors.

Target gas	Sensor	Study	Regression type	R²	Study location
CO₂	ELT S–100H	Spinelle et al., 2017	LR	0.021-0.71	Po Valley, Italy
			MLR	0.16
			ANN (machine learning)	0.79
	ELT S-100/300	Casey et al., 2019	ANN (machine learning)	0.85	Greeley, CO
	Sensirion NDIR SCD 30	This study	LFR (machine learning)	0.73-0.94	Several locations in Italy
CH₄	Figaro TGS 2600	Eugster and Kling, 2012	LM	0.2	Toolik Lake, AK
	Figaro TGS 2600	Collier-Oxandale et al., 2018	Inverted LM	0.37-0.76	Los Angeles, CA
	Figaro TGS 2600	Collier-Oxandale et al., 2018	Inverted LM	0.33-0.46	Platteville, CO
	Figaro TGS 2600	Casey et al., 2019	ANN (machine learning)	0.66	Greeley, CO
	TGS 2611-E00	Bastviken et al., 2020	Step 1: linear, power and Michaelis-Mented equations	0.58-1.00	Laboratory experiments (flux chambers)
	TGS 2611-E00	Bastviken et al., 2020	Step 2: linear and power functions	0.58-1.00	Laboratory experiments (flux chambers)
	TGS 2611-E00	This study	LFR (machine learning)	0.88-0.98	Several locations in Italy

Open in a new tab

The successful application of the LFR model to CO₂ and CH₄ low-cost sensor data indicates the potential of this approach for widespread use in air quality monitoring, both in research and practical applications. In fact, the accuracy and cost-effectiveness of this method make it a valuable tool for identifying trends and mitigating air pollution in various settings, possibly integrating the monitoring stations with sensors for other air contaminants (e.g., PM, NO_x, CO, etc.). However, there are still avenues for further improvement and exploration in this field. The study findings suggest that additional tuning of hyperparameters could enhance the performance of some models, potentially reducing the fractional error even further. Moreover, although the proposed calibration approach offers promising results with relatively straightforward implementation, site-specific data collection would be necessary to strengthen the calibration dataset before employing these stations and calibration for studying or monitoring purposes. Additionally, a key improvement will involve integrating the low-cost stations with remote data transmission modules LoRaWAN type, a low-energy communication protocol based on radio waves that will enable the seamless uploading of air quality data onto a centralized web server, facilitating real-time access, control, and processing for end-users. This perspective is pivotal to ensure a network of monitoring low-cost stations capable of overcoming the spatial heterogeneity that afflicts the current monitoring systems. Indeed, the empowerment with real-time accessibility to comprehensive air quality data would be instrumental in several domains. Firstly, it could assist regulatory bodies and policymakers in monitoring and implementing environmental standards. Secondly, it could equip researchers with updated and high-resolution data to manage pollutant species studies and forecasting models. Finally, it could provide a strong foundation for didactic purposes, enabling communities to actively engage in environmental awareness and prompt actions to safeguard public health.

Data availability statement

Data are included in the article's Supplementary Material section.

CRediT authorship contribution statement

R. Biagi: Writing – original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. M. Ferrari: Writing – original draft, Validation, Software, Methodology, Formal analysis, Data curation, Conceptualization, Investigation. S. Venturi: Writing – review & editing, Validation, Methodology, Data curation, Conceptualization, Investigation. M. Sacco: Conceptualization, Methodology, Software, Writing – review & editing. G. Montegrossi: Writing – review & editing, Methodology, Conceptualization. F. Tassi: Project administration, Methodology, Conceptualization, Validation, Writing – review & editing, Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors wish to thank the parish of San Felice (FI), the tennis club of San Giusto (FI), Mr. Marco Spisso, the Osservatorio Vesuviano (INGV section of Naples), Dr. Francesco Magi, and Centro di Ricerca, Documentazione e Promozione del Padule di Fucecchio, for logistical support provided during measurement collection. Dr. Antonio Randazzo (INGV, Rome, Italy) is kindly acknowledged for the help provided during the advancement of this study. Finally, the reviewers are kindly acknowledged for their generous and valuable comments on the manuscript.

Footnotes

^{Appendix A}

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2024.e29772.

References.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Multimedia component 1

mmc1.xlsx^{(1.3MB, xlsx)}

Multimedia component 2

mmc2.xlsx^{(567.4KB, xlsx)}

Multimedia component 3

mmc3.xlsx^{(497.6KB, xlsx)}

Multimedia component 4

mmc4.xlsx^{(512.7KB, xlsx)}

Multimedia component 5

mmc5.xlsx^{(508.1KB, xlsx)}

Multimedia component 6

mmc6.xlsx^{(523.2KB, xlsx)}

Multimedia component 7

mmc7.docx^{(954KB, docx)}

Multimedia component 8

mmc8.xlsx^{(1.6MB, xlsx)}

Multimedia component 9

mmc9.xlsx^{(229.5KB, xlsx)}

Multimedia component 10

mmc10.xlsx^{(229.6KB, xlsx)}

Multimedia component 11

mmc11.xlsx^{(469KB, xlsx)}

Multimedia component 12

mmc12.xlsx^{(798.8KB, xlsx)}

Multimedia component 13

mmc13.xlsx^{(671KB, xlsx)}

References

1.WHO Ambient (outdoor) air pollution. 2022. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health accessed August 29, 2023.
2.Andersen Z.J., Kristiansen L.C., Andersen K.K., Olsen T.S., Hvidberg M., Jensen S.S., Ketzel M., Loft S., Sørensen M., Tjønneland A., Overvad K., Raaschou-Nielsen O. Stroke and long-term exposure to outdoor air. Pollution From Nitrogen Dioxide A Cohort Study. 2011 doi: 10.1161/STROKEAHA.111.629246/-/DC1. [DOI] [PubMed] [Google Scholar]
3.Raaschou-Nielsen O., Andersen Z.J., Hvidberg M., Jensen S.S., Ketzel M., Sørensen M., Hansen J., Loft S., Overvad K., Tjønneland A. Air pollution from traffic and cancer incidence: a Danish cohort study. Environ. Health. 2011;10 doi: 10.1186/1476-069X-10-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Brunekreef B., Holgate S.T. Air pollution and health. Lancet. 2002;360:1233–1242. doi: 10.1016/S0140-6736(02)11274-8. [DOI] [PubMed] [Google Scholar]
5.Clements A.L., Griswold W.G., Abhijit R.S., Johnston J.E., Herting M.M., Thorson J., Collier-Oxandale A., Hannigan M. Low-cost air quality monitoring tools: from research to practice (A workshop summary) Sensors. 2017;17 doi: 10.3390/s17112478. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Zimmerman N., Presto A.A., Kumar S.P.N., Gu J., Hauryliuk A., Robinson E.S., Robinson A.L., Subramanian R. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech. 2018;11:291–313. doi: 10.5194/amt-11-291-2018. [DOI] [Google Scholar]
7.Idrees Z., Zheng L. Low cost air pollution monitoring systems: a review of protocols and enabling technologies. J Ind Inf Integr. 2020;17 doi: 10.1016/J.JII.2019.100123. [DOI] [Google Scholar]
8.Rai A.C., Kumar P., Pilla F., Skouloudis A.N., Di Sabatino S., Ratti C., Yasar A., Rickerby D. End-user perspective of low-cost sensors for outdoor air pollution monitoring. Sci. Total Environ. 2017;607–608:691–705. doi: 10.1016/j.scitotenv.2017.06.266. [DOI] [PubMed] [Google Scholar]
9.Ródenas García M., Spinazzé A., Branco P.T.B.S., Borghi F., Villena G., Cattaneo A., Di Gilio A., Mihucz V.G., Gómez Álvarez E., Lopes S.I., Bergmans B., Orłowski C., Karatzas K., Marques G., Saffell J., Sousa S.I.V. Review of low-cost sensors for indoor air quality: features and applications. Appl. Spectrosc. Rev. 2022;57:747–779. doi: 10.1080/05704928.2022.2085734. [DOI] [Google Scholar]
10.Afroz R., Guo X., Cheng C.W., Delorme A., Duruisseau-Kuntz R., Zhao R. Investigation of indoor air quality in university residences using low-cost sensors. Environ. Sci. J. Integr. Environ. Res.: Atmosphere. 2023;3:347–362. doi: 10.1039/d2ea00149g. [DOI] [Google Scholar]
11.Toat M., Soekirno S., Faisal F. Monitoring system for carbon dioxide gas concentration using NDIR sensors. AIP Conf. Proc. 2023;2604 doi: 10.1063/5.0116876. [DOI] [Google Scholar]
12.Dinh T.V., Choi I.Y., Son Y.S., Kim J.C. A review on non-dispersive infrared gas sensors: improvement of sensor detection limit and interference correction. Sensor. Actuator. B Chem. 2016;231:529–538. doi: 10.1016/j.snb.2016.03.040. [DOI] [Google Scholar]
13.Narayana M.V., Jalihal D., Shiva Nagendra S.M. Establishing A sustainable low-cost air quality monitoring setup: a survey of the state-of-the-art. Sensors. 2022;22 doi: 10.3390/s22010394. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Spinelle L., Gerboles M., Kok G., Persijn S., Sauerwald T. Review of portable and low-cost sensors for the ambient air monitoring of benzene and other volatile organic compounds. Sensors. 2017;17 doi: 10.3390/s17071520. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Eugster W., Kling G.W. Performance of a low-cost methane sensor for ambient concentration measurements in preliminary studies. Atmos. Meas. Tech. 2012;5:1925–1934. doi: 10.5194/amt-5-1925-2012. [DOI] [Google Scholar]
16.Martinez R.R., Santaren D., Laurent O., Cropley F., Mallet C., Ramonet M., Caldow C., Rivier L., Broquet G., Bouchet C., Juery C., Ciais P. The potential of low-cost tin-oxide sensors combined with machine learning for estimating atmospheric ch4 variations around background concentration. Atmosphere. 2021;12:1–22. doi: 10.3390/atmos12010107. [DOI] [Google Scholar]
17.Collier-Oxandale A., Gordon Casey J., Piedrahita R., Ortega J., Halliday H., Johnston J., Hannigan M.P. Assessing a low-cost methane sensor quantification system for use in complex rural and urban environments. Atmos. Meas. Tech. 2018;11:3569–3594. doi: 10.5194/amt-11-3569-2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Concas F., Mineraud J., Lagerspetz E., Varjonen S., Liu X., Puolamäki K., Nurmi P., Tarkoma S. Low-cost outdoor air quality monitoring and sensor calibration. ACM Trans. Sens. Netw. 2021;17 doi: 10.1145/3446005. [DOI] [Google Scholar]
19.Snyder E.G., Watkins T.H., Solomon P.A., Thoma E.D., Williams R.W., Hagler G.S.W., Shelow D., Hindin D.A., Kilaru V.J., Preuss P.W. The changing paradigm of air pollution monitoring. Environ. Sci. Technol. 2013;47:11369–11377. doi: 10.1021/es4022602. [DOI] [PubMed] [Google Scholar]
20.Morawska L., Thai P.K., Liu X., Asumadu-Sakyi A., Ayoko G., Bartonova A., Bedini A., Chai F., Christensen B., Dunbabin M., Gao J., Hagler G.S.W., Jayaratne R., Kumar P., Lau A.K.H., Louie P.K.K., Mazaheri M., Ning Z., Motta N., Mullins B., Rahman M.M., Ristovski Z., Shafiei M., Tjondronegoro D., Westerdahl D., Williams R. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how far have they gone? Environ. Int. 2018;116:286–299. doi: 10.1016/j.envint.2018.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Sun L., Wong K.C., Wei P., Ye S., Huang H., Yang F., Westerdahl D., Louie P.K.K., Luk C.W.Y., Ning Z. Development and application of a next generation air sensor network for the Hong Kong marathon 2015 air quality monitoring. Sensors. 2016;16 doi: 10.3390/s16020211. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Kumar P., Morawska L., Martani C., Biskos G., Neophytou M., Di Sabatino S., Bell M., Norford L., Britter R. The rise of low-cost sensing for managing air pollution in cities. Environ. Int. 2015;75:199–205. doi: 10.1016/j.envint.2014.11.019. [DOI] [PubMed] [Google Scholar]
23.Bastviken D., Nygren J., Schenk J., Parellada Massana R., Thanh Duc N. Technical note: facilitating the use of low-cost methane (ch4) sensors in flux chambers-calibration, data processing, and an open-source make-it-yourself logger. Biogeosciences. 2020;17:3659–3667. doi: 10.5194/bg-17-3659-2020. [DOI] [Google Scholar]
24.Liu X., Jayaratne R., Thai P., Kuhn T., Zing I., Christensen B., Lamont R., Dunbabin M., Zhu S., Gao J., Wainwright D., Neale D., Kan R., Kirkwood J., Morawska L. Low-cost sensors as an alternative for long-term air quality monitoring. Environ. Res. 2020;185 doi: 10.1016/j.envres.2020.109438. [DOI] [PubMed] [Google Scholar]
25.Lewis A.C., Lee J.D., Edwards P.M., Shaw M.D., Evans M.J., Moller S.J., Smith K.R., Buckley J.W., Ellis M., Gillot S.R., White A. Evaluating the performance of low cost chemical sensors for air pollution research. Faraday Discuss. 2016;189:85–103. doi: 10.1039/C5FD00201J. [DOI] [PubMed] [Google Scholar]
26.Borrego C., Costa A.M., Ginja J., Amorim M., Coutinho M., Karatzas K., Sioumis T., Katsifarakis N., Konstantinidis K., De Vito S., Esposito E., Smith P., André N., Gérard P., Francis L.A., Castell N., Schneider P., Viana M., Minguillón M.C., Reimringer W., Otjes R.P., von Sicard O., Pohle R., Elen B., Suriano D., Pfister V., Prato M., Dipinto S., Penza M. Assessment of air quality microsensors versus reference methods: the EuNetAir joint exercise. Atmos. Environ. 2016;147:246–263. doi: 10.1016/J.ATMOSENV.2016.09.050. [DOI] [Google Scholar]
27.Cordero J.M., Borge R., Narros A. Using statistical methods to carry out in field calibrations of low cost air quality sensors. Sensor. Actuator. B Chem. 2018;267:245–254. doi: 10.1016/j.snb.2018.04.021. [DOI] [Google Scholar]
28.Castell N., Dauge F.R., Schneider P., Vogt M., Lerner U., Fishbain B., Broday D., Bartonova A. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ. Int. 2017;99:293–302. doi: 10.1016/j.envint.2016.12.007. [DOI] [PubMed] [Google Scholar]
29.De Vito S., Esposito E., Salvato M., Popoola O., Formisano F., Jones R., Di Francia G. Calibrating chemical multisensory devices for real world applications: an in-depth comparison of quantitative machine learning approaches. Sensor. Actuator. B Chem. 2018;255:1191–1210. doi: 10.1016/j.snb.2017.07.155. [DOI] [Google Scholar]
30.Korotcenkov G. Metal oxides for solid-state gas sensors: what determines our choice? Mater. Sci. Eng., B. 2007;139:1–23. doi: 10.1016/j.mseb.2007.01.044. [DOI] [Google Scholar]
31.Helm I., Jalukse L., Leito I. Measurement uncertainty estimation in amperometric sensors: a tutorial review. Sensors. 2010;10:4430–4455. doi: 10.3390/s100504430. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Wang C., Yin L., Zhang L., Xiang D., Gao R. Metal oxide gas sensors: sensitivity and influencing factors. Sensors. 2010;10:2088–2106. doi: 10.3390/s100302088. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Popoola O.A.M., Carruthers D., Lad C., Bright V.B., Mead M.I., Stettler M.E.J., Saffell J.R., Jones R.L. Use of networks of low cost air quality sensors to quantify air quality in urban settings. Atmos. Environ. 2018;194:58–70. doi: 10.1016/j.atmosenv.2018.09.030. [DOI] [Google Scholar]
34.Penza M., Suriano D., Villani M.G., Spinelle L., Gerboles M. IEEE SENSORS 2014 Proceedings. 2014. Towards air quality indices in smart cities by calibrated low-cost sensors applied to networks; pp. 2012–2017. [Google Scholar]
35.Heimann I., Bright V.B., McLeod M.W., Mead M.I., Popoola O.A.M., Stewart G.B., Jones R.L. Source attribution of air pollution by spatial scale separation using high spatial density networks of low cost air quality sensors. Atmos. Environ. 2015;113:10–19. doi: 10.1016/j.atmosenv.2015.04.057. [DOI] [Google Scholar]
36.Spinelle L., Gerboles M., Villani M.G., Aleixandre M., Bonavitacola F. Sens Actuators B Chem. Elsevier B.V.; 2015. Field calibration of a cluster of low-cost available sensors for air quality monitoring. Part A: ozone and nitrogen dioxide; pp. 249–257. [DOI] [Google Scholar]
37.Sun L., Westerdahl D., Ning Z. Development and evaluation of a novel and cost-effective approach for low-cost NO2 sensor drift correction. Sensors. 2017;17 doi: 10.3390/s17081916. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Kim J., Shusterman A.A., Lieschke K.J., Newman C., Cohen R.C. The Berkeley Atmospheric CO2 Observation Network: field calibration and evaluation of low-cost air quality sensors. Atmos. Meas. Tech. 2018;11:1937–1946. doi: 10.5194/amt-11-1937-2018. [DOI] [Google Scholar]
39.Karagulian F., Barbiere M., Kotsev A., Spinelle L., Gerboles M., Lagler F., Redon N., Crunaire S., Borowiak A. Review of the performance of low-cost sensors for air quality monitoring. Atmosphere. 2019;10 doi: 10.3390/atmos10090506. [DOI] [Google Scholar]
40.Liang L. Calibrating low-cost sensors for ambient air monitoring: techniques, trends, and challenges. Environ. Res. 2021;197 doi: 10.1016/j.envres.2021.111163. [DOI] [PubMed] [Google Scholar]
41.Nowack P., Konstantinovskiy L., Gardiner H., Cant J. Machine learning calibration of low-cost NO2 and PM10 sensors: non-linear algorithms and their impact on site transferability. Atmos. Meas. Tech. 2021;14:5637–5655. doi: 10.5194/amt-14-5637-2021. [DOI] [Google Scholar]
42.Breiman L. 2001. Random Forests. [Google Scholar]
43.Bauer E., Chan P., Stolfo S., Wolpert D. 1999. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. [Google Scholar]
44.Walczak S., Cerpa N. Artificial neural networks. 2003. http://www.emsl.pnl.gov:2080/proj/neuron/neural/sys
45.Cerliani M. linear-tree. 2022. https://github.com/cerlymarco/linear-tree
46.Arduino® . 2023. Arduino® UNO R3 Product Reference Manual. [Google Scholar]
47.Sensirion Datasheet sensirion SCD30 sensor module. 2020. www.sensirion.com
48.Figaro Figaro technical information for TGS2611: REV 04/23. 2023. https://www.figarosensor.com/product/docs/ngm2611-e13_product%20information%28fusa%29_rev03.pdf
49.van den Bossche M., Rose N.T., De Wekker S.F.J. Potential of a low-cost gas sensor for atmospheric methane monitoring. Sensor. Actuator. B Chem. 2017;238:501–509. doi: 10.1016/J.SNB.2016.07.092. [DOI] [Google Scholar]
50.Thanh Duc N., Silverstein S., Wik M., Crill P., Bastviken D., Varner R.K. Technical note: greenhouse gas flux studies: an automated online system for gas emission measurements in aquatic environments. Hydrol. Earth Syst. Sci. 2020;24:3417–3430. doi: 10.5194/hess-24-3417-2020. [DOI] [Google Scholar]
51.Adafruit D.H.T.11. 2023. DHT22 and AM2302 Sensors.https://cdn-learn.adafruit.com/downloads/pdf/dht.pdf [Google Scholar]
52.PICARRO INC © . 2023. δ13C in CH4 and CO2 Gas Analyzer.https://www.picarro.com/support/library/documents/g2201i_analyzer_datasheet [Google Scholar]
53.Zhang H., Nettleton D., Zhu Z. 2017. Regression-Enhanced Random Forests. [Google Scholar]
54.Schonlau M., Zou R.Y. The random forest algorithm for statistical learning. STATA J. 2020;20:3–29. doi: 10.1177/1536867X20909688. [DOI] [Google Scholar]
55.Rodriguez-Galiano V., Sanchez-Castillo M., Chica-Olmo M., Chica-Rivas M. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015;71:804–818. doi: 10.1016/j.oregeorev.2015.01.001. [DOI] [Google Scholar]
56.Nguyen Q.H., Ly H.B., Ho L.S., Al-Ansari N., Van Le H., Tran V.Q., Prakash I., Pham B.T. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math. Probl Eng. 2021;2021 doi: 10.1155/2021/4832864. [DOI] [Google Scholar]
57.Muraina I.O. Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts. 2022. https://www.researchgate.net/publication/358284895
58.Vrigazova B. The proportion for splitting data into training and test set for the bootstrap in classification problems. Bus. Syst. Res. 2021;12:228–242. doi: 10.2478/bsrj-2021-0015. [DOI] [Google Scholar]
59.Lupi A., Luppichini M., Barsanti M., Bini M., Giannecchini R. Machine learning models to complete rainfall time series databases affected by missing or anomalous data. Earth Sci Inform. 2023 doi: 10.1007/s12145-023-01122-4. [DOI] [Google Scholar]
60.Spinelle L., Gerboles M., Villani M.G., Aleixandre M., Bonavitacola F. Field calibration of a cluster of low-cost commercially available sensors for air quality monitoring. Part B: NO, CO and CO2. Sensor. Actuator. B Chem. 2017;238:706–715. doi: 10.1016/j.snb.2016.07.036. [DOI] [Google Scholar]
61.Casey J.G., Collier-Oxandale A., Hannigan M. Performance of artificial neural networks and linear models to quantify 4 trace gas species in an oil and gas production region with low-cost sensors. Sensor. Actuator. B Chem. 2019;283:504–514. doi: 10.1016/j.snb.2018.12.049. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

mmc1.xlsx^{(1.3MB, xlsx)}

Multimedia component 2

mmc2.xlsx^{(567.4KB, xlsx)}

Multimedia component 3

mmc3.xlsx^{(497.6KB, xlsx)}

Multimedia component 4

mmc4.xlsx^{(512.7KB, xlsx)}

Multimedia component 5

mmc5.xlsx^{(508.1KB, xlsx)}

Multimedia component 6

mmc6.xlsx^{(523.2KB, xlsx)}

Multimedia component 7

mmc7.docx^{(954KB, docx)}

Multimedia component 8

mmc8.xlsx^{(1.6MB, xlsx)}

Multimedia component 9

mmc9.xlsx^{(229.5KB, xlsx)}

Multimedia component 10

mmc10.xlsx^{(229.6KB, xlsx)}

Multimedia component 11

mmc11.xlsx^{(469KB, xlsx)}

Multimedia component 12

mmc12.xlsx^{(798.8KB, xlsx)}

Multimedia component 13

mmc13.xlsx^{(671KB, xlsx)}

Data Availability Statement

Data are included in the article's Supplementary Material section.

[bib1] 1.WHO Ambient (outdoor) air pollution. 2022. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health accessed August 29, 2023.

[bib2] 2.Andersen Z.J., Kristiansen L.C., Andersen K.K., Olsen T.S., Hvidberg M., Jensen S.S., Ketzel M., Loft S., Sørensen M., Tjønneland A., Overvad K., Raaschou-Nielsen O. Stroke and long-term exposure to outdoor air. Pollution From Nitrogen Dioxide A Cohort Study. 2011 doi: 10.1161/STROKEAHA.111.629246/-/DC1. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Raaschou-Nielsen O., Andersen Z.J., Hvidberg M., Jensen S.S., Ketzel M., Sørensen M., Hansen J., Loft S., Overvad K., Tjønneland A. Air pollution from traffic and cancer incidence: a Danish cohort study. Environ. Health. 2011;10 doi: 10.1186/1476-069X-10-67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Brunekreef B., Holgate S.T. Air pollution and health. Lancet. 2002;360:1233–1242. doi: 10.1016/S0140-6736(02)11274-8. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Clements A.L., Griswold W.G., Abhijit R.S., Johnston J.E., Herting M.M., Thorson J., Collier-Oxandale A., Hannigan M. Low-cost air quality monitoring tools: from research to practice (A workshop summary) Sensors. 2017;17 doi: 10.3390/s17112478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Zimmerman N., Presto A.A., Kumar S.P.N., Gu J., Hauryliuk A., Robinson E.S., Robinson A.L., Subramanian R. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech. 2018;11:291–313. doi: 10.5194/amt-11-291-2018. [DOI] [Google Scholar]

[bib7] 7.Idrees Z., Zheng L. Low cost air pollution monitoring systems: a review of protocols and enabling technologies. J Ind Inf Integr. 2020;17 doi: 10.1016/J.JII.2019.100123. [DOI] [Google Scholar]

[bib8] 8.Rai A.C., Kumar P., Pilla F., Skouloudis A.N., Di Sabatino S., Ratti C., Yasar A., Rickerby D. End-user perspective of low-cost sensors for outdoor air pollution monitoring. Sci. Total Environ. 2017;607–608:691–705. doi: 10.1016/j.scitotenv.2017.06.266. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Ródenas García M., Spinazzé A., Branco P.T.B.S., Borghi F., Villena G., Cattaneo A., Di Gilio A., Mihucz V.G., Gómez Álvarez E., Lopes S.I., Bergmans B., Orłowski C., Karatzas K., Marques G., Saffell J., Sousa S.I.V. Review of low-cost sensors for indoor air quality: features and applications. Appl. Spectrosc. Rev. 2022;57:747–779. doi: 10.1080/05704928.2022.2085734. [DOI] [Google Scholar]

[bib10] 10.Afroz R., Guo X., Cheng C.W., Delorme A., Duruisseau-Kuntz R., Zhao R. Investigation of indoor air quality in university residences using low-cost sensors. Environ. Sci. J. Integr. Environ. Res.: Atmosphere. 2023;3:347–362. doi: 10.1039/d2ea00149g. [DOI] [Google Scholar]

[bib11] 11.Toat M., Soekirno S., Faisal F. Monitoring system for carbon dioxide gas concentration using NDIR sensors. AIP Conf. Proc. 2023;2604 doi: 10.1063/5.0116876. [DOI] [Google Scholar]

[bib12] 12.Dinh T.V., Choi I.Y., Son Y.S., Kim J.C. A review on non-dispersive infrared gas sensors: improvement of sensor detection limit and interference correction. Sensor. Actuator. B Chem. 2016;231:529–538. doi: 10.1016/j.snb.2016.03.040. [DOI] [Google Scholar]

[bib13] 13.Narayana M.V., Jalihal D., Shiva Nagendra S.M. Establishing A sustainable low-cost air quality monitoring setup: a survey of the state-of-the-art. Sensors. 2022;22 doi: 10.3390/s22010394. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Spinelle L., Gerboles M., Kok G., Persijn S., Sauerwald T. Review of portable and low-cost sensors for the ambient air monitoring of benzene and other volatile organic compounds. Sensors. 2017;17 doi: 10.3390/s17071520. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Eugster W., Kling G.W. Performance of a low-cost methane sensor for ambient concentration measurements in preliminary studies. Atmos. Meas. Tech. 2012;5:1925–1934. doi: 10.5194/amt-5-1925-2012. [DOI] [Google Scholar]

[bib16] 16.Martinez R.R., Santaren D., Laurent O., Cropley F., Mallet C., Ramonet M., Caldow C., Rivier L., Broquet G., Bouchet C., Juery C., Ciais P. The potential of low-cost tin-oxide sensors combined with machine learning for estimating atmospheric ch4 variations around background concentration. Atmosphere. 2021;12:1–22. doi: 10.3390/atmos12010107. [DOI] [Google Scholar]

[bib17] 17.Collier-Oxandale A., Gordon Casey J., Piedrahita R., Ortega J., Halliday H., Johnston J., Hannigan M.P. Assessing a low-cost methane sensor quantification system for use in complex rural and urban environments. Atmos. Meas. Tech. 2018;11:3569–3594. doi: 10.5194/amt-11-3569-2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Concas F., Mineraud J., Lagerspetz E., Varjonen S., Liu X., Puolamäki K., Nurmi P., Tarkoma S. Low-cost outdoor air quality monitoring and sensor calibration. ACM Trans. Sens. Netw. 2021;17 doi: 10.1145/3446005. [DOI] [Google Scholar]

[bib19] 19.Snyder E.G., Watkins T.H., Solomon P.A., Thoma E.D., Williams R.W., Hagler G.S.W., Shelow D., Hindin D.A., Kilaru V.J., Preuss P.W. The changing paradigm of air pollution monitoring. Environ. Sci. Technol. 2013;47:11369–11377. doi: 10.1021/es4022602. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Morawska L., Thai P.K., Liu X., Asumadu-Sakyi A., Ayoko G., Bartonova A., Bedini A., Chai F., Christensen B., Dunbabin M., Gao J., Hagler G.S.W., Jayaratne R., Kumar P., Lau A.K.H., Louie P.K.K., Mazaheri M., Ning Z., Motta N., Mullins B., Rahman M.M., Ristovski Z., Shafiei M., Tjondronegoro D., Westerdahl D., Williams R. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how far have they gone? Environ. Int. 2018;116:286–299. doi: 10.1016/j.envint.2018.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Sun L., Wong K.C., Wei P., Ye S., Huang H., Yang F., Westerdahl D., Louie P.K.K., Luk C.W.Y., Ning Z. Development and application of a next generation air sensor network for the Hong Kong marathon 2015 air quality monitoring. Sensors. 2016;16 doi: 10.3390/s16020211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Kumar P., Morawska L., Martani C., Biskos G., Neophytou M., Di Sabatino S., Bell M., Norford L., Britter R. The rise of low-cost sensing for managing air pollution in cities. Environ. Int. 2015;75:199–205. doi: 10.1016/j.envint.2014.11.019. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Bastviken D., Nygren J., Schenk J., Parellada Massana R., Thanh Duc N. Technical note: facilitating the use of low-cost methane (ch4) sensors in flux chambers-calibration, data processing, and an open-source make-it-yourself logger. Biogeosciences. 2020;17:3659–3667. doi: 10.5194/bg-17-3659-2020. [DOI] [Google Scholar]

[bib24] 24.Liu X., Jayaratne R., Thai P., Kuhn T., Zing I., Christensen B., Lamont R., Dunbabin M., Zhu S., Gao J., Wainwright D., Neale D., Kan R., Kirkwood J., Morawska L. Low-cost sensors as an alternative for long-term air quality monitoring. Environ. Res. 2020;185 doi: 10.1016/j.envres.2020.109438. [DOI] [PubMed] [Google Scholar]

[bib25] 25.Lewis A.C., Lee J.D., Edwards P.M., Shaw M.D., Evans M.J., Moller S.J., Smith K.R., Buckley J.W., Ellis M., Gillot S.R., White A. Evaluating the performance of low cost chemical sensors for air pollution research. Faraday Discuss. 2016;189:85–103. doi: 10.1039/C5FD00201J. [DOI] [PubMed] [Google Scholar]

[bib26] 26.Borrego C., Costa A.M., Ginja J., Amorim M., Coutinho M., Karatzas K., Sioumis T., Katsifarakis N., Konstantinidis K., De Vito S., Esposito E., Smith P., André N., Gérard P., Francis L.A., Castell N., Schneider P., Viana M., Minguillón M.C., Reimringer W., Otjes R.P., von Sicard O., Pohle R., Elen B., Suriano D., Pfister V., Prato M., Dipinto S., Penza M. Assessment of air quality microsensors versus reference methods: the EuNetAir joint exercise. Atmos. Environ. 2016;147:246–263. doi: 10.1016/J.ATMOSENV.2016.09.050. [DOI] [Google Scholar]

[bib27] 27.Cordero J.M., Borge R., Narros A. Using statistical methods to carry out in field calibrations of low cost air quality sensors. Sensor. Actuator. B Chem. 2018;267:245–254. doi: 10.1016/j.snb.2018.04.021. [DOI] [Google Scholar]

[bib28] 28.Castell N., Dauge F.R., Schneider P., Vogt M., Lerner U., Fishbain B., Broday D., Bartonova A. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ. Int. 2017;99:293–302. doi: 10.1016/j.envint.2016.12.007. [DOI] [PubMed] [Google Scholar]

[bib29] 29.De Vito S., Esposito E., Salvato M., Popoola O., Formisano F., Jones R., Di Francia G. Calibrating chemical multisensory devices for real world applications: an in-depth comparison of quantitative machine learning approaches. Sensor. Actuator. B Chem. 2018;255:1191–1210. doi: 10.1016/j.snb.2017.07.155. [DOI] [Google Scholar]

[bib30] 30.Korotcenkov G. Metal oxides for solid-state gas sensors: what determines our choice? Mater. Sci. Eng., B. 2007;139:1–23. doi: 10.1016/j.mseb.2007.01.044. [DOI] [Google Scholar]

[bib31] 31.Helm I., Jalukse L., Leito I. Measurement uncertainty estimation in amperometric sensors: a tutorial review. Sensors. 2010;10:4430–4455. doi: 10.3390/s100504430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Wang C., Yin L., Zhang L., Xiang D., Gao R. Metal oxide gas sensors: sensitivity and influencing factors. Sensors. 2010;10:2088–2106. doi: 10.3390/s100302088. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Popoola O.A.M., Carruthers D., Lad C., Bright V.B., Mead M.I., Stettler M.E.J., Saffell J.R., Jones R.L. Use of networks of low cost air quality sensors to quantify air quality in urban settings. Atmos. Environ. 2018;194:58–70. doi: 10.1016/j.atmosenv.2018.09.030. [DOI] [Google Scholar]

[bib34] 34.Penza M., Suriano D., Villani M.G., Spinelle L., Gerboles M. IEEE SENSORS 2014 Proceedings. 2014. Towards air quality indices in smart cities by calibrated low-cost sensors applied to networks; pp. 2012–2017. [Google Scholar]

[bib35] 35.Heimann I., Bright V.B., McLeod M.W., Mead M.I., Popoola O.A.M., Stewart G.B., Jones R.L. Source attribution of air pollution by spatial scale separation using high spatial density networks of low cost air quality sensors. Atmos. Environ. 2015;113:10–19. doi: 10.1016/j.atmosenv.2015.04.057. [DOI] [Google Scholar]

[bib36] 36.Spinelle L., Gerboles M., Villani M.G., Aleixandre M., Bonavitacola F. Sens Actuators B Chem. Elsevier B.V.; 2015. Field calibration of a cluster of low-cost available sensors for air quality monitoring. Part A: ozone and nitrogen dioxide; pp. 249–257. [DOI] [Google Scholar]

[bib37] 37.Sun L., Westerdahl D., Ning Z. Development and evaluation of a novel and cost-effective approach for low-cost NO2 sensor drift correction. Sensors. 2017;17 doi: 10.3390/s17081916. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Kim J., Shusterman A.A., Lieschke K.J., Newman C., Cohen R.C. The Berkeley Atmospheric CO2 Observation Network: field calibration and evaluation of low-cost air quality sensors. Atmos. Meas. Tech. 2018;11:1937–1946. doi: 10.5194/amt-11-1937-2018. [DOI] [Google Scholar]

[bib39] 39.Karagulian F., Barbiere M., Kotsev A., Spinelle L., Gerboles M., Lagler F., Redon N., Crunaire S., Borowiak A. Review of the performance of low-cost sensors for air quality monitoring. Atmosphere. 2019;10 doi: 10.3390/atmos10090506. [DOI] [Google Scholar]

[bib40] 40.Liang L. Calibrating low-cost sensors for ambient air monitoring: techniques, trends, and challenges. Environ. Res. 2021;197 doi: 10.1016/j.envres.2021.111163. [DOI] [PubMed] [Google Scholar]

[bib41] 41.Nowack P., Konstantinovskiy L., Gardiner H., Cant J. Machine learning calibration of low-cost NO2 and PM10 sensors: non-linear algorithms and their impact on site transferability. Atmos. Meas. Tech. 2021;14:5637–5655. doi: 10.5194/amt-14-5637-2021. [DOI] [Google Scholar]

[bib42] 42.Breiman L. 2001. Random Forests. [Google Scholar]

[bib43] 43.Bauer E., Chan P., Stolfo S., Wolpert D. 1999. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. [Google Scholar]

[bib44] 44.Walczak S., Cerpa N. Artificial neural networks. 2003. http://www.emsl.pnl.gov:2080/proj/neuron/neural/sys

[bib45] 45.Cerliani M. linear-tree. 2022. https://github.com/cerlymarco/linear-tree

[bib46] 46.Arduino® . 2023. Arduino® UNO R3 Product Reference Manual. [Google Scholar]

[bib47] 47.Sensirion Datasheet sensirion SCD30 sensor module. 2020. www.sensirion.com

[bib48] 48.Figaro Figaro technical information for TGS2611: REV 04/23. 2023. https://www.figarosensor.com/product/docs/ngm2611-e13_product%20information%28fusa%29_rev03.pdf

[bib49] 49.van den Bossche M., Rose N.T., De Wekker S.F.J. Potential of a low-cost gas sensor for atmospheric methane monitoring. Sensor. Actuator. B Chem. 2017;238:501–509. doi: 10.1016/J.SNB.2016.07.092. [DOI] [Google Scholar]

[bib50] 50.Thanh Duc N., Silverstein S., Wik M., Crill P., Bastviken D., Varner R.K. Technical note: greenhouse gas flux studies: an automated online system for gas emission measurements in aquatic environments. Hydrol. Earth Syst. Sci. 2020;24:3417–3430. doi: 10.5194/hess-24-3417-2020. [DOI] [Google Scholar]

[bib51] 51.Adafruit D.H.T.11. 2023. DHT22 and AM2302 Sensors.https://cdn-learn.adafruit.com/downloads/pdf/dht.pdf [Google Scholar]

[bib52] 52.PICARRO INC © . 2023. δ13C in CH4 and CO2 Gas Analyzer.https://www.picarro.com/support/library/documents/g2201i_analyzer_datasheet [Google Scholar]

[bib53] 53.Zhang H., Nettleton D., Zhu Z. 2017. Regression-Enhanced Random Forests. [Google Scholar]

[bib54] 54.Schonlau M., Zou R.Y. The random forest algorithm for statistical learning. STATA J. 2020;20:3–29. doi: 10.1177/1536867X20909688. [DOI] [Google Scholar]

[bib55] 55.Rodriguez-Galiano V., Sanchez-Castillo M., Chica-Olmo M., Chica-Rivas M. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015;71:804–818. doi: 10.1016/j.oregeorev.2015.01.001. [DOI] [Google Scholar]

[bib56] 56.Nguyen Q.H., Ly H.B., Ho L.S., Al-Ansari N., Van Le H., Tran V.Q., Prakash I., Pham B.T. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math. Probl Eng. 2021;2021 doi: 10.1155/2021/4832864. [DOI] [Google Scholar]

[bib57] 57.Muraina I.O. Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts. 2022. https://www.researchgate.net/publication/358284895

[bib58] 58.Vrigazova B. The proportion for splitting data into training and test set for the bootstrap in classification problems. Bus. Syst. Res. 2021;12:228–242. doi: 10.2478/bsrj-2021-0015. [DOI] [Google Scholar]

[bib59] 59.Lupi A., Luppichini M., Barsanti M., Bini M., Giannecchini R. Machine learning models to complete rainfall time series databases affected by missing or anomalous data. Earth Sci Inform. 2023 doi: 10.1007/s12145-023-01122-4. [DOI] [Google Scholar]

[bib60] 60.Spinelle L., Gerboles M., Villani M.G., Aleixandre M., Bonavitacola F. Field calibration of a cluster of low-cost commercially available sensors for air quality monitoring. Part B: NO, CO and CO2. Sensor. Actuator. B Chem. 2017;238:706–715. doi: 10.1016/j.snb.2016.07.036. [DOI] [Google Scholar]

[bib61] 61.Casey J.G., Collier-Oxandale A., Hannigan M. Performance of artificial neural networks and linear models to quantify 4 trace gas species in an oil and gas production region with low-cost sensors. Sensor. Actuator. B Chem. 2019;283:504–514. doi: 10.1016/j.snb.2018.12.049. [DOI] [Google Scholar]

PERMALINK

Development and machine learning-based calibration of low-cost multiparametric stations for the measurement of CO2 and CH4 in air

R Biagi

M Ferrari

S Venturi

M Sacco

G Montegrossi

F Tassi

Abstract

Graphical abstract

Highlights

1. Introduction

2. Materials and methods

2.1. Low-cost station design

Fig. 1.

2.2. Calibration procedure

2.2.1. Measuring instruments

2.2.2. Measuring sites

2.2.3. Calibration methods

Fig. 2.

3. Results

Table 1.

Table 2.

3.1. CO2 and environmental parameters datasets

3.2. CH4 and environmental parameters datasets

4. Discussion

4.1. Assessing models fit on training data

Fig. 3.

Fig. 4.

4.2. Evaluation of models using test data

Fig. 5.

Table 3.

Fig. 6.

5. Conclusions

Table 4.

Data availability statement

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgment

Footnotes

Appendix A. Supplementary data

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Development and machine learning-based calibration of low-cost multiparametric stations for the measurement of CO₂ and CH₄ in air

3.1. CO₂ and environmental parameters datasets

3.2. CH₄ and environmental parameters datasets