Twelve-Year Analysis of NO2 Concentration Measurements at Belisario Station (Quito, Ecuador) Using Statistical Inference Techniques

Wilmar Hernandez; Alfredo Mendez

doi:10.3390/s20205831

. 2020 Oct 15;20(20):5831. doi: 10.3390/s20205831

Twelve-Year Analysis of NO₂ Concentration Measurements at Belisario Station (Quito, Ecuador) Using Statistical Inference Techniques

Wilmar Hernandez ^1,^*, Alfredo Mendez ²

PMCID: PMC7602597 PMID: 33076389

Abstract

In this paper, a robust analysis of nitrogen dioxide (NO₂) concentration measurements taken at Belisario station (Quito, Ecuador) was performed. The data used for the analysis constitute a set of measurements taken from 1 January 2008 to 31 December 2019. Furthermore, the analysis was carried out in a robust way, defining variables that represent years, months, days and hours, and classifying these variables based on estimates of the central tendency and dispersion of the data. The estimators used here were classic, nonparametric, based on a bootstrap method, and robust. Additionally, confidence intervals based on these estimators were built, and these intervals were used to categorize the variables under study. The results of this research showed that the NO₂ concentration at Belisario station is not harmful to humans. Moreover, it was shown that this concentration tends to be stable across the years, changes slightly during the days of the week, and varies greatly when analyzed by months and hours of the day. Here, the precision provided by both nonparametric and robust statistical methods served to comprehensively proof the aforementioned. Finally, it can be concluded that the city of Quito is progressing on the right path in terms of improving air quality, because it has been shown that there is a decreasing tendency in the NO₂ concentration across the years. In addition, according to the Quito Air Quality Index, most of the observations are in either the desirable level or acceptable level of air pollution, and the number of observations that are in the desirable level of air pollution increases across the years.

Keywords: statistical inference, classic analysis, nonparametric analysis, robust analysis, classic confidence interval, nonparametric confidence interval, bootstrap confidence interval, robust confidence interval, classification and categorization of NO₂ concentration measurements

1. Introduction

Nitrogen dioxide (NO₂) is a yellowish brown toxic and irritant gas, and together with nitric oxide (NO) is known as nitrogen oxides (NOx) [1,2]. Several negative health effects are attributed to continued exposure to this pollutant, such as: acute bronchitis, asthma, reduced lung capacity, allergies, eye and mucous membrane irritation [3]. It is a secondary precursor to ozone (O₃) and particulate matter PM_2.5 [4,5].

NO₂ is mainly formed from the oxidation of NO as a result of combustion in vehicle engines and combustion plants with oxygen (O₂) from the air [1]:

2 NO + O_{2} \leftrightarrow 2 {NO}_{2}

(1)

To a lesser extent atmospheric NO₂ comes from natural sources such as volcanic eruptions, atmospheric electric shocks, and biomass combustion. It is a very strong oxidant; it reacts with water (H₂O) and OH radical producing nitric acid (HNO₃) [6]:

3 {NO}_{2} + H_{2} O \to 2 {HNO}_{3} + NO

(2)

OH + {NO}_{2} \to {HNO}_{3}

(3)

The particles that form from this acid can be suspended in the air or fall like acid rain [1]. The World Health Organization (WHO) recommends a daily NO₂ concentration of $0.11 ppm (200 μ g / m^{3})$ average of 1 h once a year and $0.023 ppm (40 μ g / m^{3})$ as an annual arithmetic average to preserve health [7]. NO₂ production volumes correspond to vehicular traffic; being this way, they are generally higher in cities than in the countryside.

The negative effects that air pollution due to NO₂ concentrations has on human health and the way in which this pollution is produced, lead to the need for an in-depth statistical analysis of the behavior of this pollutant in environments where its generation is more likely. In this sense, the city of Quito, the capital of Ecuador, is a good place to carry out this analysis taking into account the information collected by measurement instruments for more than a decade.

In this paper, Quito was chosen to carry out the analysis mentioned above because this city is an example of how the environmental impact of vehicular traffic, traffic jams and poor fuel quality can affect pollution levels. Likewise, the growth of the city, of industrial zones and the fact that Quito is surrounded by a mountain range are factors that also influence the concentration of pollutants in this city [8].

Taking into account what has been said above, the main objective of this paper is to carry out the robust analysis of NO₂ concentration measurements in one the most important air-quality monitoring stations in Quito. To this end, the Belisario air-quality monitoring station [9] was chosen to carry out the research, and 12 years of NO₂ concentration measurements (i.e., from 1 January 2008 to 31 December 2019) were analyzed. Other pollutants that are also measured at Belisario station are the following: CO (carbon monoxide), SO₂ (sulfur dioxide), O₃ (ozone), and PM_2.5 (fine particulate matter = particles with a diameter less than or equal to $2.5 μ m$ ) [8,10]. However, it is important to mention that this paper focused on the robust analysis of the behavior of NO₂ concentration at Belisario station, because NO₂ is a pollutant that is predominant in the appearance and duration of health problems [3]. In fact, the study of toxicity mechanisms due to NO₂ in humans is a subject of great interest to the international scientific community [3]. For example, it is of great interest to understand the relationship between exposure to NO₂ and sensitivity to viral infections, among other things.

The analysis performed in this paper was aimed at estimating the central tendency and dispersion of the data, grouping and classifying data, and determining similarities and differences between data. Furthermore, the analysis was performed by using both robust and nonrobust confidence intervals, and classical, nonparametric, resampling and robust analysis methodologies were used [11,12,13,14,15].

A research in which the statistical analysis of NO₂ concentration measurements was carried out is shown in [16]. In [16], in order to perform air-quality monitoring measurements in an area of Puerto Rico, low-cost particulate matter (PM_2.5) and NO₂ sensors were placed across eight locations of that area. In [16], the NO₂ concentration measurements were taken from October 2016 to February 2017, spatial and temporal trends of PM_2.5 and NO₂ were analyzed, and the measurements were collected using the U.S. Environmental Protection Agency (EPA)-designed Citizen Science Air Monitor (CSAM).

The analysis carried out in [16] was based on the use of classical inference methods, where linear regressions were used to normalize each low-cost sensor and weather station with a reference signal. Additionally, the correlation function was used to find the degree of linear dependence between sensor measurements and the median of the reference signal. Moreover, in [16] the Pearson coefficient and the coefficient of divergence were used to explore the spatial variability between CSAM locations. Furthermore, the coefficient of variance was used to calculate the precision between sensors.

In order to safeguarding human health by establishing limits for air pollution due to NO₂, among other air pollution variables, air quality estimations were conducted in Milan, Italy, from 2013 to 2016 [17]. The research presented in [17] was carried out by using machine learning and deep learning models aimed at obtaining a robust estimate of pollutants. Additionally, in [17] the following configurations were employed: a linear regressor, a multilayer perceptron with Bayesian regularization, a random forest regressor, and a long-short term memory network.

The concept of robustness used in [17] has different meanings. First, it was argued that the first linear model used in [17] can be seen as a standard ordinary least squares algorithm that reduces the influence of strong outliers in many cases. Furthermore, the authors of [17] suggested that the linear models used in the sense of their paper were used as a robust baseline.

Second, in [17] it was stated that due to the sampling procedure and ensemble learning of random forests, the latter provide solutions that are robust because they do not suffer from overfitting in the same way that simpler regression trees do.

Third, and finally, in [17] the F1-score was used as a robustness metric against unbalanced classes in a multi-class categorization problem, which was associated with the estimated time series produced by the trained models during an evaluation process of the regression estimates.

Seasonal variations in NO_x concentrations in Changchun (Jilin, China) were studied in [18]. In that paper, both monthly and daily average NO_x concentration variations were also studied. Additionally, in [18] the lineal dependence between NO_x and NO₂, among other air pollutants in Changchun, was found. Moreover, in [18] the coefficient of divergence was employed to assess the differences between the spatial distribution of NO₂ concentrations at several monitoring sites from 2016 to 2018.

In addition, the sensitivity of SO₂, NO and NO₂ concentrations to the relevant factors that had an effect on the air quality inside a 20% biodiesel air-conditioned bus was studied in [19]. The bus used in [19] was chosen from a fleet of Toledo (OH, USA) and continuous monitoring of the abovementioned pollutants were conducted with indoor temperature and indoor relative humidity as comfort parameters. The study time of [19] comprised the spring, summer, autumn, and winter seasons from April 2007 to March 2008. Additionally, in [19] a summary statistic of the seasons was presented and a linear dependence was detected among month, season and other variables.

Furthermore, in [19] the regression tree method was used to study the sensitivity for in-bus air pollutant concentrations (i.e., SO₂, NO, and NO₂), and the analysis of variance was used to determine the statistically significant variables. Lastly, in [19] the quantification of the relationship between the in-bus air pollutant concentrations and the statistically significant variables was carried out, and the dynamics of in-bus pollution was compared with atmospheric physics.

The papers shown above were focused on the statistical analysis of sets of NO₂ concentration measurements that were carried out over long periods of time. In these papers, tools of parametric statistical inference or artificial intelligence or machine learning were used to estimate the central tendency of the data and, in some cases, to carry out its modeling. In this sense, the results obtained in these papers were satisfactory. Nevertheless, several of these papers lack of an exhaustive analysis of the central tendency of the data and, in general, of the dispersion of the data. Furthermore, they did not establish the similarities and differences that allow the construction of robust confidence intervals in which the central tendency of the data is found, with at least the 95% confidence level.

This is important, because robust statistical inference [13,14,15] allows significant conclusions to be drawn about the data, even when few data are available and without the need to eliminate outliers, which could be carriers of important information about the physical system that generated them.

Something worth mentioning is that in practically all the papers shown above [16,17,18,19] the authors faced the problem of missing data, and used different tools to fill in the gaps. However, these values are artificial and their sole purpose is to put an estimate of the true value that should go in those gaps. On the contrary, however, using robust statistics in the sense of [13,14,15] the researchers do not have to do the above, because the analysis does not require having a large amount of data to draw significant conclusions, nor does the data distribution need to be Gaussian or parametric. This last characteristic is also shared by nonparametric statistical inference [11,12].

The research presented in this paper is in total agreement with what was said in the previous paragraphs and can be seen as a continuation of the research presented in [4,5,20,21].

At this point, it is important to mentioned that an analysis of air pollution variables in Quito was also carried out in [8]. However, in the report presented in [8] the robust analysis of air pollution variables was not performed, and only the mean value and the maximum values of the set of observations were taken into account. Therefore, the research presented in this paper, together with the research presented in [4,5,20,21], could be used as reference material to study in a rigorous, comprehensive way the behavior of the NO₂ concentration at Belisario station, from January 2008 to December 2019. Belisario is one of the most important air-quality monitoring stations of the Ministry of the Environment of Ecuador and belongs to Quito Metropolitan Atmospheric Monitoring Network [8].

The objectives of this paper are as follows:

(1)
Group the NO₂ concentration measurements, taken from 1 January 2008 to 31 December 2019 at Belisario station, in sets of variables that represent the years, months, days of the week, and hours of the day.
(2)
Obtain estimates of the central tendency of the data and their dispersion, using classic, nonparametric, resampling, and robust methods.
(3)
Categorize the data and find confidence intervals that allow quantifying the differences between categories.
(4)
Find periodic behaviors in the variables.

Other research in which some robust statistics tools have also been used in some way are the following. In [22], robust linear regression models were used to reduce the influence of outliers in a least square fitting problem using M-estimation. The air pollution variables under study in [22] were the following: carbon monoxide (CO), carbon dioxide (CO₂), nitrogen dioxide (NO₂), ozone (O₃), volatile organic compounds (VOC), and particulate matter (both PM_2.5 and PM₁₀).

Furthermore, some of the robust statistical methods used in the research presented in this paper were also used in [23] to compare results obtained in [23] for different lenses and calibration processes, in a three-dimensional reconstruction problem of archeological remains. Moreover, robust linear regression was also used in [24] to estimate black carbon concentration in two sites in Helsinki, Finland.

Finally, it is important to mention that nonparametric statistical methods have also been used to estimate the behavior of air pollution variables [25,26,27,28,29,30,31,32]. However, the results obtained using nonparametric methods are inferior to those obtained using robust methods, in the sense that robust methods are practically immune to the influence of extreme values. Therefore, robust methods generate confidence intervals that are narrower than those generated using nonparametric methods [5,20,21].

In this paper, in order to summarize the set of NO₂ concentration measurements, summary statistics are provided in Section 2. In addition, the analysis of the sets of measurements using nonparametric methods is carried out in Section 3, and the robust analysis of these sets is carried out in Section 4. Section 5 of this paper provides a discussion of the results, and the conclusions of the paper are presented in Section 6.

2. Summary Statistics of 12 Years of NO₂ Concentration Measurements at Belisario Station

According to [8], measurements were performed with Thermo Scientific Models 42C and 42i NO-NO₂-NO_x analyzers [33,34]. These measuring instruments are used as measurement standards in many countries. For example, EPA has designated them as reference and equivalent methods [35]. The detailed explanation of the experimental conditions in which the measurements of air pollutants are carried out in Quito is given in [8].

In Quito, the monthly behavior of the NO₂ concentration seems to repeat every year [8]. In addition, in [8] it is said that the lowest concentrations of NO₂ occur in the summer and the highest in the months of March, October, and November. In this regard, it is important to mention that, as stated in [8], there could be reasons, such as rains, high speed winds and volcanic eruptions, that cause the NO₂ concentration to rise or fall at certain times.

The data collection process in this research started on 1 January 2008 and ended on 31 December 2019, and the sampling period was equal to one hour [8,10]. Therefore, 12 years of NO₂ concentration measurements corresponding to 105,193 data will be analyzed. However, since some data did not appear and others had negative values, the research was carried out with at least 96% of all possible data. In other words, less than 4% of the total of all possible data was lost. Furthermore, in accordance with [36], at least 75% percent of scheduled samples for each year were collected.

In this research, NO₂ concentration measurements were divided into four families of sets. These families consisted of the sets that represent the 12 years under study, the sets that represent the 12 months of the year, the sets that represent 7 days a week, and the sets that represent the 24 h of the day in groups of 2 h. Additionally, the following variables were formed:

$X_{k}$ , $k = 1, \dots, 12$ , stands for the set of samples collected in year $2007 + k$ .
$Y_{k}$ , $k = 1, \dots, 12$ , stands for the set of samples collected in the k-th month of the year.
$Z_{k}$ , $k = 1, \dots, 7$ , stands for the set of samples collected on the k-th day of the week.
$W_{k}$ , $k = 1, \dots, 12$ , stands for the set of samples collected at each of the 24 h of the day but with hours in groups of 2 h.

In this paper, since there were no problems related to the lack of data to carry out the analysis, the time instants in which no information was recorded were not taken into account, because one of the advantages of robust statistical analysis is that this type of analysis allows to draw significant conclusions even with a few samples. As can be seen, in the case study in this paper there was a huge number of samples. Therefore, taking into account the above, there were no data scarcity problems. Moreover, the variables under study were considered to be linearly independent, because the linear correlation values between the variables were very close to zero.

In order to understand the characteristics of the NO₂ concentration, making a preliminary analysis of the data, Table 1 shows a statistical summary of the samples collected. Additionally, Figure 1 shows the box plot diagrams of the NO₂ concentration measurements by years, and the moving averages (MAs) of these measurements are shown in Figure 2, Figure 3 and Figure 4. Figure 2 shows the MA of the sequence consisting of all samples collected during the 12 years, Figure 3 shows the MA of the samples of the first six years, and Figure 4 shows the MA of the samples of the last six years. Time series studies use this type technique to analyze trends of variables [37,38].

Table 1.

Summary statistics of the NO₂ concentration measurements.

Year	Count	$Mean (μ g / m^{3})$	$Median (μ g / m^{3})$	$Standard Deviation (μ g / m^{3})$	Skewness	Kurtosis	$Minimum (μ g / m^{3})$	$Maximum (μ g / m^{3})$	Outliers %
$2008 (X_{1})$	8420	30.6714	29.190	13.8055	0.7028	3.7409	1.57	104.06	1.90
$2009 (X_{2})$	8463	29.0274	27.510	13.5032	0.8793	4.7740	1.39	121.16	1.86
$2010 (X_{3})$	8568	29.1409	27.970	13.0294	0.7802	4.5930	1.71	122.61	1.75
$2011 (X_{4})$	8462	27.4369	26.010	11.8807	0.6098	3.5480	1.46	84.99	1.67
$2012 (X_{5})$	8591	24.6597	23.380	11.8465	0.6422	3.4568	0.42	81.60	1.56
$2013 (X_{6})$	8288	28.0209	26.835	12.4174	0.8509	5.0159	1.81	114.84	1.68
$2014 (X_{7})$	8647	27.9431	26.430	12.8477	0.8255	4.9352	2.08	149.67	1.54
$2015 (X_{8})$	8529	27.0562	25.130	13.4024	0.8700	4.1198	0.29	110.47	2.03
$2016 (X_{9})$	8496	26.8132	25.610	12.8345	0.6392	3.6433	0	100.45	1.33
$2017 (X_{10})$	8282	23.4152	22.220	10.8507	0.7390	3.8362	0	88.03	1.79
$2018 (X_{11})$	8333	28.8103	27.180	12.7965	0.6763	3.5120	0	85.58	1.55
$2019 (X_{12})$	8474	25.9156	24.465	11.7287	0.6842	3.6442	2.17	89.66	1.33
Total	101,553	27.4108	25.950	12.7553	0.7747	4.2050	0	149.67	1.66

Open in a new tab

Box plots of the variables. The outliers are shown by using red circles and, according to QAQI [8], the dashed straight line indicates the separation between the desirable level of pollution (i.e., $[0, 100 μ g / m^{3})$ ) and the acceptable level of pollution (i.e., $[100 μ g / m^{3}, 200 μ g / m^{3})$ ).

Moving average of the sequence consisting of the NO₂ concentration measurements from 1 January 2008 to 31 December 2019.

Moving average of the NO₂ concentration measurements during the years from $2008$ to $2013$ : $X_{1} (2008)$ , $X_{2} (2009)$ , $X_{3} (2010)$ , $X_{4} (2011)$ , $X_{5} (2012)$ , and $X_{6} (2013)$ .

Moving average of the NO₂ concentration measurements during the years from $2014$ to 2019: $X_{7} (2014)$ , $X_{8} (2015)$ , $X_{9} (2016)$ , $X_{10} (2017)$ , $X_{11} (2018)$ , and $X_{12} (2019)$ .

In the box plots shown in Figure 1, a dashed straight line has been included indicating the separation between having a desirable level of pollution due to the NO₂ concentration at Belisario station (i.e., $[0, 100 μ g / m^{3})$ ) and an acceptable level of pollution (i.e., $[100 μ g / m^{3}, 200 μ g / m^{3})$ ). This classification, regarding the air pollution due to the NO₂ concentration, is established in Quito by the Quito Air Quality Index (QAQI) [8].

The type of smoothing by using MA employed in this paper is the following: Given the sequence $x$ of length $k$ , find the average of the data set $x_{h}, x_{h - 1}, \dots, x_{h - m + 1}$ for each $h \geq m$ , where $x_{h}$ is the value of the sequence at the $h$ position and $m < k$ . This is done in order to make that each particular datum loses its individual influence, although this process makes that the researcher loses $m - 1$ observations when analyzing the data. Here, the size of the MA was 720 due to the fact that this number is the number of data that there is in a 30-day month [5].

Table 1 shows that, for each of the variables under study, all the medians are less than the means. Furthermore, it is shown that the value of skewness is greater than zero and that all kurtosis values are greater than 3.4. In this sense, it should be noted that in the years 2013 and 2014 the kurtosis values are close to 5. Moreover, the values of the standard deviations are not small when compared with the values of the means, and Figure 1 shows that there are many outliers. All this indicates that the variables come from heavy-tailed distributions, or that their behavior may be due to the existence of a mix of distributions [14,39]. The aforementioned shows that these observations do not come from variables whose distribution is Gaussian [40].

In Figure 1, it can be seen that half of the years under study have observations that are above the desirable level of pollution [8]. Nevertheless, every year the NO₂ concentration presents abnormally high observations. The latter again indicates that the variables under study do not come from Gaussian distributions, although the percentage of outliers does not exceed 2.1%.

Figure 2 shows that the NO₂ concentration was stable during the 12 years of the study. In addition, it is observed that the maximum is reached in the third quarter of the year and the minimum in the second quarter. This is in accordance with what was said in [8]. Moreover, Figure 2, Figure 3 and Figure 4 show that once smoothing is performed, where each particular value loses importance with respect to the analysis in general, the values of the observations do not even reach half the maximum value of the desirable level of pollution. Therefore, exceeding the desirable level of pollution occurs at specific moments, which are never exceeded in a sustainable manner.

Due to the fact that there is a huge number of observations, in this paper classical statistical inference was tried to be used. However, as all the variables came from heavy-tailed distributions, several variable transformations [40] had to be carried out in order to make the variables fit a Gaussian distribution.

For the purpose of achieving smooth transformations around zero and to work with differentiable functions in the interval defined by the range of measured values, a transformation of the $T_{i} = \sqrt{X_{i} + 1}$ type was used, where $X_{i}$ was the NO₂ concentration in the i-th year, $i = 1, \dots, 12$ (i.e., X₁(2018),X₂(2019), …, X₁₂(2019))). One of the advantages of working with smooth transformations is that they share all the good properties that smooth functions have. Therefore, it is recommended to always look for simple transformations of the data to make them fit known distributions, starting with the Gaussian distribution [40].

In this research, with the variable change made, it was possible to adjust all the $T_{i}$ variables to normal random variables with p-values greater than $0.05$ , except for the years $2011 (X_{4})$ and $2015 (X_{8})$ . However, the year 2011 could be adjusted to observations from a normal distribution by using the $T_{4} = \sqrt{X_{4} + 4}$ transformation, and the year 2015 could also be adjusted to observations from a normal distribution by using the $T_{8} = \sqrt{X_{8} + 0.5}$ transformation. In both cases, the adjustments were achieved with p-values greater than 0.05. This leads to the use of classical inference, although the results will be compared with the results provided by nonparametric statistics and robust statistics. In Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, the box plots and the smoothed observations are shown by using moving averages for the months of the year ( $Y_{i}, i = 1, \dots, 12$ ), for the days of the week ( $Z_{i}, i = 1, \dots, 7$ ), and for the hours of the day in groups of two hours ( $W_{i}, i = 1, \dots, 12$ ).

Box plot of data: $Y_{1}$ (January), $Y_{2}$ (February), $Y_{3}$ (March), $Y_{4}$ (April), $Y_{5}$ (May), $Y_{6}$ (June), $Y_{7}$ (July), $Y_{8}$ (August), $Y_{9}$ (September), $Y_{10}$ (October), $Y_{11}$ (November), and $Y_{12}$ (December).

Box plot of data: $Z_{1}$ (Monday), $Z_{2}$ (Tuesday), $Z_{3}$ (Wednesday), $Z_{4}$ (Thursday), $Z_{5}$ (Friday), $Z_{6}$ (Saturday), and $Z_{7}$ (Sunday).

Box plot of data: $W_{1}$ (0:00–1:00), $W_{2}$ (2:00–3:00), $W_{3}$ (4:00–5:00), $W_{4}$ (6:00–7:00), $W_{5}$ (8:00–9:00), $W_{6}$ (10:00–11:00), $W_{7}$ (12:00–13:00), $W_{8}$ (14:00–15:00), $W_{9}$ (16:00–17:00), $W_{10}$ (18:00–19:00), $W_{11}$ (20:00–21:00), and $W_{12}$ (22:00–23:00).

Moving average of the NO₂ concentrations: $Y_{1}$ (January), $Y_{2}$ (February), $Y_{3}$ (March), $Y_{4}$ (April), $Y_{5}$ (May), $Y_{6}$ (June), $Y_{7}$ (July), $Y_{8}$ (August), $Y_{9}$ (September), $Y_{10}$ (October), $Y_{11}$ (November), and $Y_{12}$ (December).

Moving average of the NO₂ concentrations: $Z_{1}$ (Monday), $Z_{2} ($ Tuesday), $Z_{3}$ (Wednesday), $Z_{4}$ (Thursday), $Z_{5}$ (Friday), $Z_{6}$ (Saturday), and $Z_{7}$ (Sunday).

Moving average of the NO₂ concentrations: $W_{1}$ (0:00–1:00), $W_{2}$ (2:00–3:00), $W_{3}$ (4:00–5:00), $W_{4}$ (6:00–7:00), $W_{5}$ (8:00–9:00), $W_{6}$ (10:00–11:00), $W_{7}$ (12:00–13:00), $W_{8}$ (14:00–15:00), $W_{9}$ (16:00–17:00), $W_{10}$ (18:00–19:00), $W_{11}$ (20:00–21:00), and $W_{12}$ (22:00–23:00).

The box plots shown in Figure 5, Figure 6 and Figure 7 show that the variables taken into account have extreme observations on the right, which are generally close to each other, although occasionally some of the variables present very extreme observations. Therefore, the desirable NO₂ concentration level is sometimes exceeded.

The behavior between the months of the year is very similar (see Figure 8) and it is also similar between the days of the week (see Figure 9), except on Sundays. On the other hand, when analyzing the behavior of the NO₂ concentration during daylight hours (see Figure 7 and Figure 10), it is observed that the concentration seems to increase at certain hours in the morning (approximately at 8:00) and at certain hours of the night (approximately at 20:00).

To what has been said above, it must be added that there are no differences between the months, the days of the week, and the hours of the day with respect to the concentration of NO₂ across the years. This is because the moving average graphs of each of the variables (see Figure 8, Figure 9 and Figure 10) seem to have behaviors close to periodicity, which is more noticeable when the measurement period is shorter (see Figure 9 and Figure 10).

Here, it is important to mention that this type of periodic behavior has also been observed in the analysis of other air pollution variables [21]. However, the perfect understanding of this behavior requires an in-depth analysis, because at first glance it is seen that the possible period of the signal depends on the time interval that is chosen for its representation. Therefore, if there is some kind of periodicity, it manifests itself in a modulated or variable way. Nevertheless, obtaining a mathematical model that captures the possible periodicity of this signal, together with amplitude, frequency and phase modulations, is beyond the scope of this paper and remains a future research topic.

In the same way that was done for the analysis by years, it was also attempted to make the variables that represent the months, days and hours of the day fit a Gaussian distribution. This process was conducted by carrying out several variable transformations. In the case under study, it was possible to fit the months to variables from Gaussian distributions with a p-value greater than 0.05 by using a transformation of the $T_{i} = \sqrt{Y_{i} + 4}$ type. However, the variables representing the days of the week and hours of the day could not be fitted to any statistical distribution that were not a heavy-tailed distribution. Therefore, in this paper, for these last two types of variables, classical inference methods will be considered to a lesser extent than nonparametric and robust inference methods.

3. Analysis of NO₂ Concentration Measurements Using Nonparametric Methods

This section is aimed at testing whether the variables have different medians by using nonparametric statistical methods. In this research, the variability of the measurements could be caused by some distinguishing features of the years, months, days, and hours of the day in which the measurements were taken. In addition, random causes could be generated by undesirable climatic conditions or by measurement noise of the measuring instrument.

In this paper, the variables under analysis have been considered to be linearly independent, because the linear correlation between them was very close to zero. Therefore, if there were some kind of linear dependency between the variables, that dependency would be very weak. Furthermore, it is important to mention that one of the objectives of this paper is to study whether the distributions of the variables are equal or not between them, and analyze the differences that may exist between the variables. Therefore, the study of the study of different types of nonlinear dependencies that may exist between variables is beyond the scope of the paper.

Here, the Wilcoxon rank sum test [11,12] was used to test whether the observations come from distributions with equal medians. This test was also satisfactorily used in [4,20,21,25,26]. The null hypothesis was $H_{0} : M e d i a n = M_{0}$ , and the alternative hypothesis was $H_{1} : M e d i a n \neq M_{0}$ . In short, if it is considered that both the null hypothesis is true and the observations are stable, then half of the observations will be less than $M_{0}$ and the others will be greater than $M_{0}$ . For the analysis, the confidence level was $(1 - α)$ , being $α = 0.05$ the significance level. Thus, the bilateral nonparametric confidence intervals for the median were calculated as in [4,20].

With a confidence level equal to 95%, the limits of the confidence intervals for the median that were found in this paper are shown in Table 2. Additionally, a graphical representation of the confidence intervals for both the median and the transformed mean are shown in Figure 11. These confidence intervals are both classic and nonparametric. In Figure 11, the axis of the abscissa represents the variables (that is, the years) and the ordinate axis represents the NO₂ concentration. In addition, in Figure 11, the nonparametric confidence interval centered on the median is represented to the left of each variable, while the classic confidence interval for the mean of the transformed data is represented on the right. Moreover, the x that is shown above each of the classic confidence intervals is the mean of the untransformed data of the displayed variable.

Table 2.

Confidence interval limits for the median, with $α = 0.05$ .

Variable	$Lower Limit (μ g / m^{3})$	$Upper Limit (μ g / m^{3})$
$X_{1} (2008)$	28.84	29.59
$X_{2} (2009)$	27.21	27.86
$X_{3} (2010)$	27.68	28.31
$X_{4} (2011)$	25.75	26.32
$X_{5} (2012)$	23.12	23.69
$X_{6} (2013)$	26.55	27.16
$X_{7} (2014)$	26.18	26.83
$X_{8} (2015)$	24.82	25.48
$X_{9} (2016)$	25.30	25.98
$X_{10} (2017)$	21.91	22.47
$X_{11} (2018)$	26.79	27.53
$X_{12} (2019)$	24.19	24.75

Open in a new tab

95% classic and nonparametric confidence intervals for the mean of the transformed data and the median of the untransformed data, respectively. The nonparametric confidence interval is shown to the left of each variable and the classical confidence interval to the right of each variable. The x above each confidence interval is the mean of the untransformed variable, and the horizontal dashed lines are used to separate the categories. $X_{1}$ ( $2008)$ , $X_{2}$ ( $2009)$ , $X_{3}$ ( $2010)$ , $X_{4}$ ( $2011)$ , $X_{5}$ ( $2012)$ , $X_{6}$ ( $2013)$ , $X_{7}$ ( $2014)$ , $X_{8}$ ( $2015)$ , $X_{9}$ ( $2016)$ , $X_{10}$ ( $2017)$ , $X_{11}$ ( $2018)$ , and $X_{12}$ ( $2019)$ .

In Figure 11, all the means of the untransformed data are outside both the median-centered nonparametric confidence intervals and the classic confidence intervals for the transformed data, although the classic confidence intervals were built for the mean of the transformed data. Additionally, Figure 11 shows that there is a parallelism between the intervals found by the two methods (that is, the classic method and the nonparametric method) and that, in each variable, both confidence intervals contain the median of the data. However, it is important to mention that in the case of nonparametric intervals, the median is included in these intervals by construction of the intervals, which is not the case with the classic intervals. This fact corroborates that the data analysis carried out using nonparametric methods and whose distribution is not a normal distribution, produce satisfactory results in terms of location measurements.

Analyzing Figure 11, it can be seen how five categories have been established to classify the data, which are separated by horizontal dashed lines. The year 2008 is in the first category. The years 2009, 2010, 2011, 2013, 2014, 2015, 2016, and 2018 are in the second category. The year 2019 is in the third category. The year 2012 is in the fourth category and, lastly, the year 2017 is in the fifth category.

At this point, it is important to mention that all the categories are separated from each other by an NO₂ concentration equal to a unit of measurement. Although, the third category has an amplitude of three units of measurement. However, there are differences in the widths of the measurement intervals. On the other hand, it is also verified again that the confidence intervals are at the desirable level of air pollution, according to QAQI [8]. Furthermore, in no case do the values of the confidence intervals come close to the acceptable level of air pollution due to the NO₂ concentration, which shows that the desirable level of air pollution is exceeded in a circumstantial way.

Next, Figure 12, Figure 13 and Figure 14 show the nonparametric confidence intervals for the medians of the variables that represent the months, days and hours of the day in groups of two hours.

95% nonparametric confidence intervals for the medians: $Y_{1}$ (January), $Y_{2}$ (February), $Y_{3}$ (March), $Y_{4}$ (April), $Y_{5}$ (May), $Y_{6}$ (June), $Y_{7}$ (July), $Y_{8}$ (August), $Y_{9}$ (September), $Y_{10}$ (October), $Y_{11}$ (November), and $Y_{12}$ (December).

95% nonparametric confidence intervals for the medians: $Z_{1}$ (Monday), $Z_{2}$ (Tuesday), $Z_{3}$ (Wednesday), $Z_{4}$ (Thursday), $Z_{5}$ (Friday), $Z_{6}$ (Saturday), and $Z_{7}$ (Sunday).

95% nonparametric confidence intervals for the medians: $W_{1}$ (0:00–1:00), $W_{2}$ (2:00–3:00), $W_{3}$ (4:00–5:00), $W_{4}$ (6:00–7:00), $W_{5}$ (8:00–9:00), $W_{6}$ (10:00–11:00), $W_{7}$ (12:00–13:00), $W_{8}$ (14:00–15:00), $W_{9}$ (16:00–17:00), $W_{10}$ (18:00–19:00), $W_{11}$ (20:00–21:00), and $W_{12}$ (22:00–23:00).

As can be seen in Figure 12, five categories are established for the months using the nonparametric confidence intervals, which coincide with the results of the Wilcoxon rank sum test [11,12], taking into account low p-values. The NO₂ concentration level, according to the months, seems to behave periodically across the years. The lowest concentration levels occur in summer, then it grows to the highest level obtained at the beginning of autumn, to lower and stabilize in winter and spring. At the end of spring, the decline in concentration begins. Although the variation is very small, the larger the median, the grater the width of the nonparametric confidence intervals. This feature has already been seen in other studies [20].

With respect to the analysis of the NO₂ concentration by weeks, Figure 13 shows that this concentration decreases by 20% on weekends compared to working days, where there is a maximum on Fridays. Additionally, the NO₂ concentration remains stable during the working days. The results obtained with these nonparametric confidence intervals are analogous to those obtained by using the Wilcoxon rank sum test. Four categories are established for the NO₂ concentration: one formed on Sunday, another formed on Friday, a third formed on Tuesday, Wednesday and Thursday, and the last one formed on Monday and Saturday. This last category represents the transition from weekdays to weekends.

Finally, in the study of the hours of the day, Figure 14 shows that the NO₂ concentration has two minimums, two maximums, and hours of transition between these extremes. The minimums occur at approximately 2:00 and 14:00 each day, while the maximums occur at 9:00 and 19:00. Furthermore, the increases and decreases in NO₂ concentrations are very pronounced. These go from concentration values ranging from $18 μ g / m^{3}$ to $33 μ g / m^{3}$ or from $20 μ g / m^{3}$ to $36 μ g / m^{3}$ , representing jumps in value almost twice the concentration. There are many categories similar to those obtained with the Wilcoxon rank sum test, but these can be summarized in five: one formed by each minimum, another formed by each maximum, and the rest of the categories are transition categories from one state to another.

4. Robust Analysis of the NO₂ Concentration Measurements

Robust Statistics is concerned with carrying out the analysis using statistics that suffer little variation compared to samples that present observations that are far from the vast majority of the data [13,14,15]. Therefore, this paper is aimed at both obtaining measures of central tendency and scale that are insensitive to extreme observations and assessing which parameters can help characterize the variables that are under consideration.

Extreme observations have little influence on the behavior of robust estimators, because with these estimators the influence curves are bounded [41]. These curves are used to characterize robust estimators and are intended to measure the influence that one observation has on the others. Furthermore, in addition to the property of being bounded, the influence curves have other properties, such as continuity and differentiability.

In this paper, robust estimators are applied to the ordered sample [12] of $O_{1}, \dots, O_{n}$ , which is $O_{(1)} \leq O_{(2)} \leq \dots \leq O_{(n)}$ , where $O_{(1)}$ stands for the observation that has the smallest value, $O_{(2)}$ stands for the observation that has the second smallest value, and so on.

4.1. Estimators of Central Tendency and Scale

In this section, location statistics will be used to indicate around which values most of the data are grouped. In addition, these values will be used to obtain deductions that determine the center of the distributions. The statistics used in this section can be found in [13,14,15,42,43,44].

L-location estimators:

Trimean $(T M)$ [15,42].
$α$ -trimmed mean $(T (α))$ [13,14,15].
$α$ -winsorized mean $(W (α))$ [13].

M-location estimators:

Andrew’s wave $(T_{w a} (c))$ [13,15].
Biweight $(T_{b i} (c))$ [13,14].

Scale estimators:

Sample standard deviation $(s_{x})$ [13,14].
Mean absolute deviation $(M A D_{m e a n})$ [13,14].
Median absolute deviation ( $M A D)$ [13,14].
One-half of the fourth-spread $(S R H)$ [13,43].
Least median squares $(L M S)$ [14].
Winsorized standard error ( $s^{W} (α)$ ) [15].
Andrew’s wave ( $s_{ω a} (c)$ ) [13].
Biweight ( $S_{b i} (c)$ ) [13,14].
Estimator based on a subrange $(C_{n}^{α})$ [44].

Table 3 shows the point estimates of location and the point estimates of scale are shown in Table 4. A graphical representation of the aforementioned location and scale estimators is shown in Figure 15 and Figure 16, respectively.

Table 3.

Point estimates of location $(μ g / m^{3})$ .

Year	Mean	$Median m_{e}$	$Trimean T m$	$0.2 - Trimmed Mean T (0.2)$	$0.3 - Trimmed Mean T (0.3)$	$0.2 - Winsorized Mean W (0.2)$	$0.3 - Winsorized Mean W (0.3)$	$Andrew ’ s Wave T_{w a} (2.4 π)$	$Biweight T_{b i} (9)$
$2008 (X_{1})$	30.6714	29.1900	29.4600	30.1907	28.3553	29.7156	29.4234	29.9821	29.9665
$2009 (X_{2})$	29.0274	27.5100	27.8175	27.4588	25.5255	28.0363	27.7640	28.2836	28.2726
$2010 (X_{3})$	29.1409	27.9700	28.1075	29.8856	29.1451	28.3203	28.0605	28.5222	28.5242
$2011 (X_{4})$	27.4369	26.0100	26.4650	26.9828	24.9782	26.7571	26.4540	26.9136	26.8660
$2012 (X_{5})$	24.6597	23.3800	23.6725	22.8323	20.3695	23.9037	23.6632	24.1463	24.1215
$2013 (X_{6})$	28.0209	26.8350	27.0750	27.9641	24.9496	27.2365	27.0573	27.4127	27.4106
$2014 (X_{7})$	28.0209	26.8350	27.0750	27.9641	24.9496	27.2365	27.0573	27.4127	27.4106
$2015 (X_{8})$	27.9431	26.4300	26.8250	25.8820	24.2028	27.1266	26.7925	27.3246	27.2912
$2016 (X_{9})$	27.0562	25.1300	25.6250	25.3397	22.3397	25.9492	25.5318	26.2370	26.1992
$2017 (X_{10})$	26.8132	25.6100	25.8300	24.5848	22.0544	26.0465	25.7487	26.3029	26.2872
$2018 (X_{11})$	23.4152	22.2200	22.4400	23.3262	23.3533	22.6466	22.3710	22.8758	22.8619
$2019 (X_{12})$	28.8103	27.1800	27.6150	28.3698	25.1177	27.9231	27.5771	28.2248	28.1775
All years	27.4108	25.9500	26.2850	27.0175	27.1500	26.5410	26.2272	26.7754	26.7526

Open in a new tab

Table 4.

Point estimates of scale $(μ g / m^{3})$ .

Year	$s_{x}$	$m A D_{m e a n}$	$m A D$	$S R H$	$L m S$	$s^{W} (0.2)$	$s_{w a} (2.4 π)$	$s_{b i} (9)$	$C_{n}^{0.2}$
$2008 (X_{1})$	13.8055	10.8025	8.7000	8.7900	8.5300	8.3386	13.4431	13.4482	12.7180
$2009 (X_{2})$	13.5032	10.4953	8.5100	8.6250	8.4100	8.2029	12.8821	12.9066	12.5176
$2010 (X_{3})$	13.0294	10.1525	8.2800	8.3350	8.2150	7.9726	12.6009	12.6008	12.2625
$2011 (X_{4})$	11.8807	9.3367	7.4600	7.6000	7.3750	7.3040	11.7007	11.6932	11.0964
$2012 (X_{5})$	11.8465	9.3780	7.7500	7.8050	7.6050	7.4137	11.6913	11.6828	11.3332
$2013 (X_{6})$	12.4174	9.6233	7.9650	7.9850	7.8200	7.5782	11.8653	11.8836	11.7341
$2014 (X_{7})$	12.8477	10.0650	8.2400	8.4300	8.0400	7.9912	12.4651	12.4657	12.0803
$2015 (X_{8})$	13.4024	10.4634	8.4100	8.6400	8.0400	8.1693	12.8284	12.8536	11.9345
$2016 (X_{9})$	12.8345	10.1818	8.5650	8.6400	8.3300	8.2007	12.6827	12.6662	12.3354
$2017 (X_{10})$	10.8507	8.5253	6.9600	7.0100	6.7900	6.7041	10.5899	10.5901	10.1125
$2018 (X_{11})$	12.7965	10.1228	8.3600	8.5200	8.0150	8.0536	12.5949	12.5894	11.9163
$2019 (X_{12})$	11.7287	9.2944	7.6050	7.7100	7.5000	7.3684	11.5476	11.5364	11.3150
All years	12.7553	9.9982	8.1800	8.2700	8.0250	7.8793	12.3747	12.3789	11.9345

Open in a new tab

Graphical representation of the location estimates for the twelve years under study. Location estimators: mean, median, trimean, 0.2-trimmed mean, 0.3-trimmed mean, 0.2-winsorized mean, 0.3-winsorized mean, Andrew’s wave, and biweight.

Graphical representation of the scale estimates for the twelve years under study. Scale estimators: sample standard deviation ( $s_{x}$ , mean absolute deviation ( $M A D_{m e a n})$ , median absolute deviation ( $M A D)$ , one-half of the fourth-spread ( $S R H$ ), least median squares ( $L M S$ ), estimator based on a subrange ( $C_{n}^{α}$ ), winsorized standard error ( $s^{W} (0.2)$ ), Andrew’s wave ( $s_{ω a} (2.4 π)$ ), and biweight ( $S_{b i} (c)$ ).

Figure 15 shows the location estimates, which indicate that NO2 concertation levels are very stable across the years. Furthermore, there is a maximum in 2008 and another in 2018. In addition, there is a minimum in 2012 and another in 2017, and the oscillation occurs between 20 $(μ g / m^{3})$ and 30 $(μ g / m^{3})$ . Note that, in general, all measures of centralization for each variable fluctuate between the mean and 0.3-trimmed mean.

Figure 16 shows that all the scale estimators are very similar to each other, varying 3 $(μ g / m^{3})$ up or down. In addition, the behavior of all estimates is very uniform, growing and decreasing in the same periods. Moreover, it is observed that the standard deviation is the highest scale estimate and that the other scale estimates are bounded lower by the scale estimator 0.2-winsorized standard deviation.

At this point, it is important to note that these scale estimates are relatively high compared to the location estimates. The foregoing indicates that the variability in the NO₂ concentration measurements is high. Furthermore, it indicates that although there are few outliers, there are many observations with high values, when these observations are compared with the center of the distribution. This had already been seen before when observing the amplitude of the box plot diagrams in Figure 1.

Figure 17 shows the location estimates by month, day and hour, where the variables are as follows: $Y_{1}$ (January), $Y_{2}$ (February), $Y_{3}$ (March), $Y_{4}$ (April), $Y_{5}$ (May), $Y_{6}$ (June), $Y_{7}$ (July), $Y_{8}$ (August), $Y_{9}$ (September), $Y_{10}$ (October), $Y_{11}$ (November), and $Y_{12}$ (December); $Z_{1}$ (Monday), $Z_{2}$ (Tuesday), $Z_{3}$ (Wednesday), $Z_{4}$ (Thursday), $Z_{5}$ (Friday), $Z_{6}$ (Saturday), and $Z_{7}$ (Sunday); an $W_{1}$ (0:00–1:00), $W_{2}$ (2:00–3:00), $W_{3}$ (4:00–5:00), $W_{4}$ (6:00–7:00), $W_{5}$ (8:00–9:00), $W_{6}$ (10:00–11:00), $W_{7}$ (12:00–13:00), $W_{8}$ (14:00–15:00), $W_{9}$ (16:00–17:00), $W_{10}$ (18:00–19:00), $W_{11}$ (20:00–21:00), and $W_{12}$ (22:00–23:00). In addition, for the abovementioned variables, the scale estimates are shown in Figure 18.

Estimates of location by month, day and hours.

Estimates of scale by month, day and hours.

In Figure 17, the estimates of location by months, days of the week and hours of the day in groups of two hours are very similar to each other, with the difference of 1 $μ g / m^{3}$ from one type of estimator to another. Therefore, the difference between the estimators is negligible.

Figure 17a shows that the location estimates again reflect the same characteristics seen in the nonparametric estimates. In the analysis by months, the concentration of NO₂ goes from having a minimum at approximately 22 $μ g / m^{3}$ , in the middle of summer, to having a maximum at the beginning of autumn, showing a growth of 50% in the concentration of NO₂. Afterwards, there is a decrease in the concentration of NO₂ tending towards 25 $μ g / m^{3}$ , where the NO₂ concentration remains in a steady state until the arrival of the following summer.

In Figure 17b, the days of the week show two states, that on working days and that on weekends. On weekends, the NO₂ concentration drops considerably compared to the value it reaches during working days. Lastly, Figure 17c shows that the hours of the day have two very pronounced minimums and maximums, in which the value of the NO₂ concentration of the minimums is almost doubled.

Comparing Figure 17 and Figure 18, it is observed that the scale estimates are very high with respect to the location estimates, which suggests high variability in the data, although with few outliers. In addition, Figure 18 shows that the scale estimates are grouped into three steps: (1) the step formed by the highest estimates, $s_{x}$ , $s_{w a} (2.4 π)$ , $s_{b i} (9)$ and $C_{n}^{0.2}$ ; (2) the step formed by $M A D_{m e a n}$ ; and (3) the step formed by lowest estimates, $L M S$ , $S R H$ , $M A D$ and 0.2-winsorized standard deviation.

Figure 18 also shows that the variables representing the months, days and hours seem to follow the same pattern. Specifically, these variables increase and decrease at the same times, for the group that represents the months (see Figure 18a), the one that represents the days of the week (see Figure 18b) and the one that represents the hours of the day (see Figure 18c).

Finally, it is observed that there is a concordance between location and scale estimates. Specifically, the increase in the NO₂ concentration leads to an increase in its variability.

4.2. Confidence Intervals

Following the methodology used in [5,20,21], in this section confidence intervals will be constructed to classify the variables under study, categorize said variables, and establish similarities and differences between these variables that bring out possible patterns of behavior of the concentration of NO₂ at Belisario station. In addition, the location and scale estimators presented in Section 4.1 were used to build the following confidence intervals.

$(\bar{X}, s_{x})$ , where $\bar{X}$ stands for the mean.
$(M_{e}, M A D)$ , where $M_{e}$ stands for the median.
$(M_{e}, I Q R)$ , where $I Q R$ stands for the interquartile range.
$(T (α), s^{W} (α))$ .
$(T_{w a} (c), s_{w a} (c))$ .
$(T_{b i} (c), s_{b i} (c))$ .

Furthermore, a bootstrap method was used to build confidence intervals [15,20,21]. With all the above, nine confidence intervals were built for all variables: one classic confidence interval, one classic confidence interval based on data inversion of the transformed data using the function $\sqrt{X_{i} + 1}$ , being $X_{i}$ the i-th variable to be transformed, one confidence interval based on a bootstrap method, one nonparametric confidence interval, and five robust confidence intervals.

Figure 19 shows the confidence intervals built for the years 2008, 2014 and 2019. Showing the confidence intervals for more variables does not provide significant information, because these intervals present the same characteristics for all the variables.

95% confidence intervals ( $C I_{0.95}$ ): classic, classic based on data inversion of transformed data, robust, boostrap, and nonparametric confidence intervals.

Specifically, the classic confidence intervals are those that are the most displaced towards the highest values, while the confidence intervals based on the Andrew’s wave and the biweight are analogous in all variables. Furthermore, these two types of intervals cover lower values than those mentioned above, with a difference equal to $1 μ g / m^{3}$ . The rest of the confidence intervals cover the median of the data and differ from the first ones by $1 μ g / m^{3}$ . Therefore, the difference between the highest estimates and the lowest estimates is equal to $2 μ g / m^{3}$ , which is practically negligible.

Taking into account all the above, the pairs of estimators $(T (α), s^{W} (α))$ and $(T_{b i} (c), s_{b i} (c))$ , for $α = 0.2$ and $c = 9$ , will be used to carry out the comparison of the variables. The reasons that justify having made this decision are the following.

First, the classical confidence intervals are based on the underlying distribution of the data being approximately normal, which is not true in this research.

Second, the results obtained with the estimators $(M_{e}, M A D)$ , $(M_{e}, I Q R)$ and bootstrap are analogous to those obtained by the nonparametric estimators. Although, it must be said that the pair of estimators $(M_{e}, M A D)$ allows us to better observe the differences in the grouping of variables. Therefore, the possibility of carrying out the grouping of similar behaviors in the NO₂ concentration by years has been lost.

Third, and finally, as the results obtained with the Andrew’s wave are similar to those obtained with the biweight, any one of these two estimators can be chosen to perform the analysis.

Table 5 shows the limits and lengths of the 95% confidence intervals for $(T (0.2), s^{W} (0.2))$ ) and $(T_{b i} (9), s_{b i} (9))$ . In addition, a graphical representation of the confidence intervals is shown in Figure 20. In this figure, horizontal dashed lines have been included to classify the variables. Note that this was done previously when the classification of the medians provided by the Wilcoxon rank sum test was performed in Section 3, when building the nonparametric confidence intervals. When looking at Figure 20, it is important to mention that with the estimators (T(0.2),s^W (0.2)) and (T_bi (9),s_bi (9)) the classifications of the variables are more refined than those obtained by the nonparametric estimators. Nevertheless, it must be realized that the differences between one another do not reach 1 $(μ g / m^{3})$ , which represents a very insignificant difference.

Table 5.

95% confidence intervals ( $C I_{0.95}$ and confidence interval lengths: (T(0.2),s^W (0.2)) and (T_bi (9),s_bi (9)).

Variable	$C I_{95}$	$Lower Limit (μ g / m^{3})$	$Upper Limit (μ g / m^{3})$	$Length (μ g / m^{3})$
$X_{1} (2008)$	$(T (0.2), s^{W} (0.2))$	29.8938	30.4876	0.5938
$X_{1} (2008)$	$(T_{b i} (9), s_{b i} (9))$	29.6949	30.2693	0.5745
$X_{2} (2009)$	$(T (0.2), s^{W} (0.2))$	27.1675	27.7502	0.5827
$X_{2} (2009)$	$(T_{b i} (9), s_{b i} (9))$	28.0091	28.5581	0.5490
$X_{3} (2010)$	$(T (0.2), s^{W} (0.2))$	29.6042	30.1670	0.5628
$X_{3} (2010)$	$(T_{b i} (9), s_{b i} (9))$	28.2554	28.7891	0.5337
$X_{4} (2011)$	$(T (0.2), s^{W} (0.2))$	26.7233	27.2422	0.5189
$X_{4} (2011)$	$(T_{b i} (9), s_{b i} (9))$	26.6642	27.1629	0.4987
$X_{5} (2012)$	$(T (0.2), s^{W} (0.2))$	22.5709	23.0936	0.5227
$X_{5} (2012)$	$(T_{b i} (9), s_{b i} (9))$	23.8990	24.3935	0.4945
$X_{6} (2013)$	$(T (0.2), s^{W} (0.2))$	27.6921	28.2360	0.5440
$X_{6} (2013)$	$(T_{b i} (9), s_{b i} (9))$	27.1572	27.6682	0.5110
$X_{7} (2014)$	$(T (0.2), s^{W} (0.2))$	25.6012	26.1627	0.5616
$X_{7} (2014)$	$(T_{b i} (9), s_{b i} (9))$	27.0618	27.5874	0.5256
$X_{8} (2015)$	$(T (0.2), s^{W} (0.2))$	25.0507	25.6287	0.5780
$X_{8} (2015)$	$(T_{b i} (9), s_{b i} (9))$	25.9647	26.5093	0.5446
$X_{9} (2016)$	$(T (0.2), s^{W} (0.2))$	24.2941	24.8755	0.5814
$X_{9} (2016)$	$(T_{b i} (9), s_{b i} (9))$	26.0332	26.5726	0.5395
$X_{10} (2017)$	$(T (0.2), s^{W} (0.2))$	23.0855	23.5669	0.4814
$X_{10} (2017)$	$(T_{b i} (9), s_{b i} (9))$	22.6477	23.1040	0.4562
$X_{11} (2018)$	$(T (0.2), s^{W} (0.2))$	28.0815	28.6580	0.5765
$X_{11} (2018)$	$(T_{b i} (9), s_{b i} (9))$	27.9544	28.4953	0.5410
$X_{12} (2019)$	$(T (0.2), s^{W} (0.2))$	24.5174	25.0405	0.5231
$X_{12} (2019)$	$(T_{b i} (9), s_{b i} (9))$	25.1326	25.6244	0.4918

Open in a new tab

95% confidence intervals by using the pairs of estimators (T(0.2),*s^W* (0.2)) and (T_bi (9),s_bi (9)).

Regarding the amplitudes of the confidence intervals, Figure 20 shows that the confidence intervals found with the pair of estimators $(T_{b i} (9), s_{b i} (9))$ are narrower than the confidence intervals found with the pair of estimators $(T (0.2), s^{W} (0.2))$ . Specifically, the width of the confidence intervals found with $(T_{b i} (9), s_{b i} (9))$ is between 2.8% and 7.2% narrower than the width of the intervals found with $(T (0.2), s^{W} (0.2))$ . Furthermore, it is important to mention that the amplitudes of these intervals evolve in parallel to the median of the data.

Figure 21 shows the confidence intervals based on $(T (0.2), s^{W} (0.2))$ and $(T_{b i} (9), s_{b i} (9))$ for the analysis by months, days and hours. In Figure 21a,d, it is observed that the lowest NO₂ concentration values are reached in the month of July, with values that are approximately equal to $20 μ g / m^{3}$ .

95% confidence intervals, (T(0.2),*s^W* (0.2)) and (T_bi (9),s_bi (9)), by month (Y₁ = January, …, Y₁₂ = December), day of the week (Z₁ = Monday, …, Z₇ = Sunday), and every two hours (W₁ = 0:00–1:00, …, W₁₂ = 22:00–23:00).

From that moment, the NO₂ concentration rises until reaching to its maximum value, which is approximately equal to $33 μ g / m^{3}$ and which is reached in October. In other words, in the summer months the NO₂ concentration drops to a little more than half its maximum value. Furthermore, the drop in NO₂ concentration levels appears to have two steps. The first of these steps is reached at the end of the year, at which point the NO₂ concentration stabilizes and remains stable until April. Then, the NO₂ concentration drops again until it reaches its minimum value in July. These results are similar to those obtained in Figure 12. Furthermore, the amplitudes of the confidence intervals seem, in general, to be smaller than for the analysis by years. Likewise, there is also the effect that the higher the median value, the greater the width of the intervals.

Regarding the analysis by days of the week (see Figure 21b,e), it can be said that this is in every way analogous to that obtained with nonparametric estimators (see Figure 13). On weekends, the NO₂ concentration reaches the minimum, reducing to 25% of the value reached on weekdays. In addition, there are three categories: one for Sundays, another for weekdays, and then the transition categories. Although, it should be noted that on Fridays the highest NO₂ concentration value of the entire week is reached.

Finally, with respect to the analysis by hours of the day (see Figure 21c,f), it is important to say that the results are also very similar to those found with nonparametric estimators (see Figure 14). For both types of confidence intervals, the NO₂ concentration reaches maximum values at approximately 9:00 and 19:00. That is, the maximum is reached in the first hours of the beginning of the working day and at the end of the working day. In addition, the concentration of NO₂ suffers very pronounced falls, reaching two relative minimums: one at approximately 2:00 and the other at approximately 14:00, the minimum reached at 2:00 being the deepest. Among the abovementioned minimums and maximums, there are only transitioning levels, because there is no any value at which the NO₂ concentration remains stable for several hours.

5. Discussion

From a statistical summary of the data it could be observed that the vast majority of the observations are at a desirable level of air pollution, due to the levels of NO₂ concentrations at the Belisario air-quality monitoring station [9]. Furthermore, those observations that were not at a desirable level of pollution were few extreme values that were at an acceptable level of air pollution. The criteria used in Quito to establish desirable and acceptable levels of air pollution are defined by QAQI [8]. Although, it is worth mentioning that each urban city in the world sets its own criteria. Therefore, it could happen that in other cities of the world, the desirable level of air pollution is only used to say that the level of air pollution due to the concentration of air pollution variables is 0 and that the rest of the levels range from being not harmful to humans to the danger of death.

To the aforementioned, it must be added that, after a preliminary analysis of the data, it was observed that the samples came from heavy-tailed distributions. But, it was possible to discover transformations that allowed to carry out the transformation of the original variables into other variables from Gaussian distributions. However, this was only possible for some types of variables. Therefore, it became necessary to combine various types of statistical analysis to be able to explain in a precise, exhaustive and comprehensive way the behavior of the NO₂ concentration at the Belisario air-quality monitoring station.

Within the part of the research aimed at carrying out the descriptive analysis of the data, the original data were also smoothed to try to reduce the influence that each datum has on the rest of the data. This process revealed that observations show low air pollution values, below half the desirable level of pollution. Therefore, exceeding this level of air pollution is something very specific and does not sustain over time. Thus, at first glance, this confirms that the level of air pollution at Belisario station is not harmful to humans. This result is in agreement with what was said in [8].

In general, this type of comprehensive preliminary analysis of measurements of NO₂ concentrations in urban cities is not very common. For example, in [16,17,18,19] researchers tend to eliminate outliers following certain criteria and then use artificial intelligence or machine learning tools to carry out data analysis. In addition, after the data cleaning process, including the elimination of extreme values, researchers tend to look for linear dependencies between the variables in order to apply linear analysis tools and interpret the data in this way.

However, unlike what was said previously, in this research it was decided to analyze the data taking into account the contribution of extreme values, because these values are the response of the dynamic system under study to certain types of inputs. Therefore, in this research the extreme values were considered as carriers of useful information and allowed to justify the robust analysis of the data.

Nonparametric statistical inference tools have also been used in the analysis of air pollution variables [25,26,27,28,29,30,31,32]. This has been important, because if the data do not follow parametric distributions, then classical statistical inference is unfounded, which is the case of study in this research. In this sense, in [25,26,27,28,29,30,31,32] nonparametric analysis tools have been used to study several variables of air pollution. Nevertheless, nonparametric analysis is much more sensitive to the influence of extreme values than robust analysis. But it also provides relevant information that allows to classify data and determine similarities and differences between them, which also allows the researcher to categorize variables. All of this has been done in this research and has paved the way for applying robust data analysis tools.

In this paper, nonparametric and robust statistical hypothesis testing and nonparametric and robust confidence intervals were also used to compare the results with the results obtained using classical techniques. Furthermore, this was used to justify why not all the variables supported the analyzes using classical techniques. In addition, using the Wilcoxon rank sum test, it was verified that the variables could be grouped into different groups.

Here, the results obtained by using nonparametric confidence intervals for the median were very similar to those obtained by using the classical methods applied to the transformed observations. In short, for the analysis by years, five strata were established: four formed by a single year, 2008, 2012, 2017 and 2019, and another formed by the other years. Each stratum differs from the others by $1 μ g / m^{3}$ .

The classes for the months were five, but could be reduced to four: one class for the maximum, $Y_{10}$ (October), another class for the minimum, $Y_{7}$ (July), the third class for the end of the year and the first quarter of each year, and the last class formed by moments of transition between the three abovementioned classes.

On the other hand, the days of the week are grouped as follows. First, on the central three days of the work week (that is, Tuesday, Wednesday, and Thursday). Second, on Fridays, this part being the highest in terms of NO₂ concentration levels. Third, on Sundays, this part being the one with the lowest concentration values. Fourth, and finally, on Saturdays and Mondays, which are part of the transition from some levels to other levels. Furthermore, the value of the reduction of the NO₂ concentration levels from Friday to Sunday is approximately equal to one third of the NO₂ concentration levels on Friday.

Regarding the analysis by hours, several groupings were also obtained. These groups have a maximum at 9:00 and another at 19:00, and two minimums, one at the end of business hours and another in the early hours of the morning.

Next, as was done in [5,20,21], different location and scale statistics were found, which were used to build robust confidence intervals. At this point, it is worth mentioning that all location estimates were bounded by the median and the 0.2-trimmed mean. In addition, it is clearly observed that the concentration of NO₂ has all its values in the range of desirable values, decreases in the years 2012, 2017 and 2019, and is higher in 2008. But, the difference between the greatest concentration value and the smallest value on each curve representing the location estimates by year is less than $9 μ g / m^{3}$ .

Regarding the scale statistics used, these were the same as those used in [5,20,21]. In addition, for the design of the families of estimators, values that are mentioned in the specialized literature as suitable values were chosen.

All the scale estimates are in one band, where the standard deviation is well above all of them and the rest of the estimates have the least median of squares as the lower bound. Here, there is a parallel between the location and scale estimates, in the sense that the rise (respectively decrease) in the value of the location estimate produces a rise (respectively decrease) in the scale estimates. All this leads to the conclusion that extreme observations, both in quantity and value, are the observations that determine the values of the location and scale estimates.

After analyzing the robust confidence intervals, it was concluded that the most appropriate pairs of estimators for the analysis based on confidence intervals were $(T (0.2), s^{W} (0.2))$ and $(T_{b i} (9), s_{b i} (9))$ . The classifications of the variables made with these pairs of estimators were very similar to those obtained with the nonparametric estimators and with the classical methods applied to the transformed variables, in the case of the years, but with small differences.

On the other hand, in the analysis of the variables by months, days and hours, it is observed that there is a possible periodicity. Specifically, the analysis by months shows notable rises at the beginning of autumn and falls in April and July. Furthermore, stability is observed in the first quarter and in the last months of each year. In the analysis by days of the week, there is a notable difference between the NO₂ concentration on weekdays, with a higher concentration on Fridays, and on weekends. Finally, in the hourly analysis, maximum values are also seen at 9:00 and 19:00. The minimums are found in the early afternoon and early morning, while the rest of the categories are transition categories from one level to another, without periods of stable concentrations.

With all the above, groupings of variables were obtained by comparing the results obtained by years, months, days of the week, and hours. In addition, the differences between categories of variables were found and these differences were quantified using both robust and nonrobust confidence intervals.

A comprehensive study of NO₂ concentration trends at Belisario station has been carried out in this paper. The results of this study showed that the NO₂ concentration at Belisario station is stable when is analyzed by years, it is highly variable when is analyzed by months and hours of the day, and it is slightly changeable when is analyzed by days of the week.

Regarding the behavior of the NO₂ concentration shown in this paper, it is important to say that this behavior bears some similarity to that of other pollution variables in urban cities. For example, in [21] maximums and minimums were also observed in the CO concentration in hours of the day close to those shown in the present paper. The aforementioned is important, because the study carried out in [21] was also carried out at Belisario station during the same time interval that was taken into account in the present paper. Another example that shows the existence of maximums and minimums at different times of the day in the behavior of the concentration of pollutants in urban cities, is the one shown in [24]. Specifically, in [24] it is observed how the concentration of black carbon varies throughout the hours of the day in two air quality monitoring stations located at Mäkelänkatu and Kumpula in Helsinki.

The behavior of the concentration of pollutants observed both in this paper and in the research carried out in [21,24], among others, shows that anthropogenic emissions play a fundamental role in the concentration levels of pollutants in urban cities.

Before finishing this section, it is important to say that the abovementioned possible periodic behavior has also been observed in the behavior of other air pollution variables [21]. However, data analysis showed that this behavior was only observed when the data were analyzed for certain time scales. Therefore, a transient phenomenon or a variable frequency signal could be observed, among other things. Here, this possible periodic behavior was shown by using robust confidence intervals. Additionally, the values of the possible periodic wave were categorized and the differences between categories were found with the measurement precision provided by robust statistical methods. All this constitutes another contribution of this research. Nevertheless, the analysis of the possible periodicity of this signal is beyond the scope of this paper, but it remains as a future research topic.

6. Conclusions

The objective of this paper was to carry out the robust analysis of the behavior of the NO₂ concentration at the Belisario air-quality monitoring station, Quito, Ecuador from 1 January 2008 to 31 December 2019. Here, the NO₂ concentration was decomposed into variables whose behavior was analyzed by years, months, days and hours of the day. Furthermore, after verifying that no set of separate variables came from the same distribution, the differences between parameters that characterized these variables were determined.

This is the first time that an exhaustive statistical analysis of the NO₂ concentration at Belisario station has been carried out. In the report presented in [8], the concentration of several variables of air pollution in Quito was analyzed in a general way, but a robust statistical analysis was not carried out. Specifically, the analysis carried out in [8] only took into account the mean and maximum values of the concentration of the air pollution variables. Therefore, the research presented in this paper could serve as a reference material to comprehensively analyze the NO₂ concentration at Belisario station in the last 12 years. This highlights some of the possible uses of the results of this research.

The results of the study conducted here robustly proved that the NO₂ concentration levels at Belisario station are not harmful to humans. In addition, it was also shown that the behavior of this concentration tends to be stable across the years, changes slightly during the days of the week, and varies greatly when analyzed by months and hours of the day.

In the report presented in [8], the main sources of air pollution in Quito are mentioned, highlighting among them the means of transport, large traffic jams, and all the industries that are located in the capital and its surroundings that use bunker and fuel oil. Additionally, since Quito is a long, narrow city, there are many traffic jams in the city center when traveling from one end of the city to the other. To all this it must be added that the city center is located on the slopes of the Pichincha volcano, and volcanic eruptions are air pollution sources [8].

What has been said in the previous paragraph highlights the need to improve the urban dynamics of the city, in order to contribute to the reduction of NO₂ concentration levels across the city. In this sense, the authors of this research suggest the need to carry out the set of proposals made by themselves in [4,21]. For example, it is important to improve the quality of the main means of transport that are used in the city, build more green areas that serve as effective filters for air pollution, and establish air pollution criteria that are adaptive, being more severe in the regions of the city where there is a greater concentration of people. For more information on things that can be done to improve air quality in urban cities, it is recommended to see [4,21], for the case of Quito, and other scientific publications dedicated only to this topic worldwide.

Finally, depending on the time interval used to represent the data, a possible periodicity in the NO₂ concentration measurements at Belisario station was observed. However, the in-depth mathematical modeling of this signal is a complex issue that falls outside the scope of this research, but remains pending to be performed in a future research of the authors.

Acknowledgments

The authors thank retired chemical engineer Juan Hernandez, former professor at Instituto Politecnico de Quimica e Informatica Martires de Giron, Havana, Cuba, for his valuable comments on Nitrogen dioxide.

Author Contributions

W.H. and A.M. created the methodology of formal data analysis and the tools to implement this methodology. In addition, W.H. and A.M. performed the statistical analysis of the data, the validation of the results, and the writing of the article. It is important to say that the authorship was limited to those who have contributed substantially to the work reported. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by CEDIA-Ecuador (under the research project CEPRA XII-2018-13), Universidad de Las Americas, Quito, Ecuador (under the research project ERa.ERI.WHP.18.01), and Universidad Politecnica de Madrid, Spain.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Brown T.L., LeMay H.E., Jr., Bursten B.E., Murphy C.J., Woodward P.M., Stoltzfus M.W., Lufaso M.W. Chemestry: The Central Science. 14th ed. Pearson Education Limited; London, UK: 2018. [Google Scholar]
2.Centers for Disease Control and Prevention The National Institute for Occupational Safety and Health (NIOSH). Nitrogen Dioxide. [(accessed on 31 August 2020)]; Available online: https://www.cdc.gov/niosh/npg/npgd0454.html.
3.Utell M.J., Frampton M.W., Roberts N.J., Jr., Finkelstein J.N., Cox C., Morrow P.E. Mechanisms of Nitrogen Dioxide Toxicity in Humans. Health Effect Institute; Monpelier, VT, USA: 1991. Research Report Number 43. [PubMed] [Google Scholar]
4.Hernandez W., Mendez A., Diaz-Marquez A.M., Zalakeviciute R. Robust Analysis of PM2.5 Concentration Measurements in the Ecuadorian Park La Carolina. Sensors. 2019;19:4648. doi: 10.3390/s19214648. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Hernandez W., Mendez A., Zalakeviciute R., Diaz-Marquez A.M. Robust Confidence Intervals for PM2.5 Concentration Measurements in the Ecuadorian Park La Carolina. Sensors. 2020;20:654. doi: 10.3390/s20030654. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Delgado-Saborit J.M. Ph.D. Thesis. Universitat Jaume I; Castelló, Spain: 2005. Validación e Implementación de Técnicas de Captación Pasiva Para el Estudio de los Niveles y Efectos de Ozono Troposférico y Dióxido de Nitrógeno en un Área Costera Mediterránea. [Google Scholar]
7.WHO Air Quality Guidelines for Particulate Matter, Ozone, Nitrogen Dioxide and Sulfur Dioxide Global Update 2005. Summary of Risk Assessment. [(accessed on 31 August 2020)]; Available online: https://apps.who.int/iris/bitstream/handle/10665/69477/WHO_SDE_PHE_OEH_06.02_eng.pdf?sequence=1&isAllowed=y.
8.Díaz V. “Informe Calidad del Aire 2017,” Secretaría de Ambiente del Distrito Metropolitano de Quito. [(accessed on 26 March 2020)]; Available online: http://www.quitoambiente.gob.ec/ambiente/index.php/informes#informecalidad-del-aire-017.
9.Belisario, Quito Air Pollution: Real-time Air Quality Index (AQI) [(accessed on 6 October 2020)]; Available online: https://aqicn.org/city/ecuador/quito/belisario/www.quitoambiente.gob.ec.
10.Datos Históricos REMMAQ Descarga 2004–2020. [(accessed on 9 October 2020)]; Available online: http://www.quitoambiente.gob.ec/ambiente/index.php/descarga-datos-historicos.
11.Hollander M., Wolfe D.A., Chicken E. Nonparametric Statistical Methods. 3rd ed. John Wiley & Sons, Inc.; Hoboken, NJ, USA: 2014. [Google Scholar]
12.Gibbons J.D., Chakraborti S. Nonparametric Statistical Inference. 5th ed. Chapman & Hall/CRC Press, Taylor & Francis Group; Boca Raton, FL, USA: 2011. [Google Scholar]
13.Hoaglin D.C., Mosteller F., Tukey J.W. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; Hoboken, NJ, USA: 2000. [Google Scholar]
14.Maronna R.A., Martin R.D., Yohai V.J. Robust Statistics: Theory and Methods. John Wiley & Sons; Chichester, UK: 2006. [Google Scholar]
15.Wilcox R. Introduction to Robust Estimation and Hypothesis Testing. 3rd ed. Academic Press; Waltham, MA, USA: 2012. [Google Scholar]
16.Reece S., Williams R., Colón M., Southgate D., Huertas E., O’Shea M., Iglesias A., Sheridan P. Spatial-Temporal Analysis of PM2.5 and NO2 Concentrations Collected Using Low-Cost Sensors in Peñuelas, Puerto Rico. Sensors. 2018;18:4314. doi: 10.3390/s18124314. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Arnaudo E., Farasin A., Rossi C. A Comparative Analysis for Air Quality Estimation from Traffic and Meteorological Data. Appl. Sci. 2020;10:4587. doi: 10.3390/app10134587. [DOI] [Google Scholar]
18.Wang L., Wang J., Tan X., Fang C. Analysis of NOx Pollution Characteristics in the Atmospheric Environment in Changchun City. Atmosphere. 2020;11:30. doi: 10.3390/atmos11010030. [DOI] [Google Scholar]
19.Kadiyala A., Kumar A. An Examination of the Sensitivity of Sulfur Dioxide, Nitric Oxide, and Nitrogen Dioxide Concentrations to the Important Factors Affecting Air Quality Inside a Public Transportation Bus. Atmosphere. 2012;3:266–287. doi: 10.3390/atmos3020266. [DOI] [Google Scholar]
20.Hernandez W., Mendez A., González-Posadas V., Jiménez-Martín J.L. Robust Analysis of the Information Obtained from a Set of 12 Years of SO2 Concentration Measurements. IEEE Access. 2020;8:144976–144992. doi: 10.1109/ACCESS.2020.3015038. [DOI] [Google Scholar]
21.Hernandez W., Mendez A.M. Robust Estimation of Carbon Monoxide Measurements. Sensors. 2020;20:4958. doi: 10.3390/s20174958. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Cavaliere A., Carotenuto F., Di Gennaro F., Gioli B., Gualtieri G., Martelli F., Matese A., Toscano P., Vagnoli C., Zaldei A. Development of Low-Cost Air Quality Stations for Next Generation Monitoring Networks: Calibration and Validation of PM2.5 and PM10 Sensors. Sensors. 2018;18:2843. doi: 10.3390/s18092843. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Rodríguez-Martín M., Rodríguez-Gonzálvez P. Suitability of Automatic Photogrammetric Reconstruction Configurations for Small Archeological Remains. Sensors. 2020;20:2936. doi: 10.3390/s20102936. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Lun Fung P., Zaidan M.A., Sillanpää S., Kousa A., Niemi J.V., Timonen H., Kuula J., Saukko E., Luoma K., Petäjä T., et al. Input-Adaptive Proxy for Black Carbon as a Virtual Sensor. Sensors. 2020;20:182. doi: 10.3390/s20010182. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hernandez W., Mendez A., Diaz-Marquez A.M., Zalakeviciute R. PM2.5 concentration measurement analysis by using nonparametric statistical inference. IEEE Sens. J. 2020;20:1084–1094. doi: 10.1109/JSEN.2019.2945581. [DOI] [Google Scholar]
26.Hernandez W., Mendez A., Zalakeviciute R., Diaz-Marquez A.M. Analysis of the information obtained from PM2.5 concentration measurements in an urban park. IEEE Trans. Instrum. Meas. 2020;69:6296–6311. doi: 10.1109/TIM.2020.2966360. [DOI] [Google Scholar]
27.Mukherjee A., Brown S.G., McCarthy M.C., Pavlovic N.R., Stanton L.G., Lam Snyder J., D′Andrea S., Hafner H.R. Measuring spatial and temporal PM2.5 variations in Sacramento, California, communities using a network of low-cost sensors. Sensors. 2019;19:4701. doi: 10.3390/s19214701. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Borghi F., Spinazzè A., Campagnolo D., Rovelli S., Cattaneo A., Cavallo D.M. Precision and accuracy of a direct-reading miniaturized monitor in PM2.5 exposure assessment. Sensors. 2018;18:3089. doi: 10.3390/s18093089. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Wang S., Van der A R.J., Stammes P., Wang W., Zhang P., Lu N., Fang L. Carbon dioxide retrieval from TanSat observations and validation with TCCON measurements. Remote Sens. 2020;12:2204. doi: 10.3390/rs12142204. [DOI] [Google Scholar]
30.Shokr M., El-Tahan M., Ibrahim A., Steiner A., Gad N. Long-term, high-resolution survey of atmospheric aerosols over egypt with NASA’s MODIS data. Remote Sens. 2017;9:1027. doi: 10.3390/rs9101027. [DOI] [Google Scholar]
31.Baire M., Melis A., Lodi M.B., Tuveri P., Dachena C., Simone M., Fanti A., Fumera G., Pisanu T., Mazzarella G. A wireless sensors network for monitoring the carasau bread manufacturing process. Electronics. 2019;8:1541. doi: 10.3390/electronics8121541. [DOI] [Google Scholar]
32.Tang C.-S., Wu T.-Y., Chuang K.-J., Chang T.-Y., Chuang H.-C., Candice Lung S.-C., Chang L.-T. Impacts of in-cabin exposure to size-fractionated particulate matters and carbon monoxide on changes in heart rate variability for healthy public transit commuters. Atmosphere. 2019;10:409. doi: 10.3390/atmos10070409. [DOI] [Google Scholar]
33.Model 42C NO-NO2-NOx Analyzer. [(accessed on 28 July 2020)]; Available online: http://www.thermo.com.cn/Resources/200802/productPDF_20998.pdf.
34.Model 42i NO-NO2-NOx Analyzer. [(accessed on 28 July 2020)]; Available online: https://www.thermofisher.com/order/catalog/product/42I#/42I.
35.U.S. Environmental Protection Agency . List of Designated Reference and Equivalent Methods. United States Environmental Protection Agency; Washington, DC, USA: 2020. [Google Scholar]
36.EPA-454/B-17-001 . Ambient Air Quality Monitoring Program. Quality Assurance Handbook for Air Pollution Measurement Systems. Volume 2 Environmental Protection Agency; Washington, DC, USA: 2017. [Google Scholar]
37.Box G.E.P., Jenkins G.M., Reinsel G.C., Ljung G.M. Time Series Analysis: Forecasting and Control. 5th ed. Jhon Wiley & Sons; Hoboken, NJ, USA: 2016. [Google Scholar]
38.Brockwell P.J., Davis R.A. Introduction to Time Series and Forecasting. 2nd ed. Springer; New York, NY, USA: 2002. [Google Scholar]
39.Bryson M.C. Heavy-Tailed Distributions: Properties and Tests. Technometrics. 1974;16:61–68. doi: 10.1080/00401706.1974.10489150. [DOI] [Google Scholar]
40.Papoulis A., Unnikrishna Pillai S. Probability, Random Variables, and Stochastic Processes. 4th ed. McGraw-Hill Higher Education; New York, NY, USA: 2002. [Google Scholar]
41.Hampel F.R. The Influence Curve and its Role in Robust Estimation. J. Am. Stat. Assoc. 1974;69:383–393. doi: 10.1080/01621459.1974.10482962. [DOI] [Google Scholar]
42.Tukey J.W. Exploratory Data Analysis. Addison-Wesley; Reading, MA, USA: 1977. [Google Scholar]
43.Rock N.M.S. ROBUST: An Interactive FORTRAN-77 Package for Exploratory Data Analysis Using Parametric, Robust and Nonparametric Location and Scale Estimates, Data Transformations, Normality Tests, and Outlier Assessment. Comput. Geosci. 1987;13:463–494. doi: 10.1016/0098-3004(87)90051-3. [DOI] [Google Scholar]
44.Croux C., Rouseeuw P.J. A Class of High-Breakdown Scale Estimators Based on Subranges. Commun. Stat. Theory Methods. 1992;21:1935–1951. doi: 10.1080/03610929208830889. [DOI] [Google Scholar]

[B1-sensors-20-05831] 1.Brown T.L., LeMay H.E., Jr., Bursten B.E., Murphy C.J., Woodward P.M., Stoltzfus M.W., Lufaso M.W. Chemestry: The Central Science. 14th ed. Pearson Education Limited; London, UK: 2018. [Google Scholar]

[B2-sensors-20-05831] 2.Centers for Disease Control and Prevention The National Institute for Occupational Safety and Health (NIOSH). Nitrogen Dioxide. [(accessed on 31 August 2020)]; Available online: https://www.cdc.gov/niosh/npg/npgd0454.html.

[B3-sensors-20-05831] 3.Utell M.J., Frampton M.W., Roberts N.J., Jr., Finkelstein J.N., Cox C., Morrow P.E. Mechanisms of Nitrogen Dioxide Toxicity in Humans. Health Effect Institute; Monpelier, VT, USA: 1991. Research Report Number 43. [PubMed] [Google Scholar]

[B4-sensors-20-05831] 4.Hernandez W., Mendez A., Diaz-Marquez A.M., Zalakeviciute R. Robust Analysis of PM2.5 Concentration Measurements in the Ecuadorian Park La Carolina. Sensors. 2019;19:4648. doi: 10.3390/s19214648. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5-sensors-20-05831] 5.Hernandez W., Mendez A., Zalakeviciute R., Diaz-Marquez A.M. Robust Confidence Intervals for PM2.5 Concentration Measurements in the Ecuadorian Park La Carolina. Sensors. 2020;20:654. doi: 10.3390/s20030654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6-sensors-20-05831] 6.Delgado-Saborit J.M. Ph.D. Thesis. Universitat Jaume I; Castelló, Spain: 2005. Validación e Implementación de Técnicas de Captación Pasiva Para el Estudio de los Niveles y Efectos de Ozono Troposférico y Dióxido de Nitrógeno en un Área Costera Mediterránea. [Google Scholar]

[B7-sensors-20-05831] 7.WHO Air Quality Guidelines for Particulate Matter, Ozone, Nitrogen Dioxide and Sulfur Dioxide Global Update 2005. Summary of Risk Assessment. [(accessed on 31 August 2020)]; Available online: https://apps.who.int/iris/bitstream/handle/10665/69477/WHO_SDE_PHE_OEH_06.02_eng.pdf?sequence=1&isAllowed=y.

[B8-sensors-20-05831] 8.Díaz V. “Informe Calidad del Aire 2017,” Secretaría de Ambiente del Distrito Metropolitano de Quito. [(accessed on 26 March 2020)]; Available online: http://www.quitoambiente.gob.ec/ambiente/index.php/informes#informecalidad-del-aire-017.

[B9-sensors-20-05831] 9.Belisario, Quito Air Pollution: Real-time Air Quality Index (AQI) [(accessed on 6 October 2020)]; Available online: https://aqicn.org/city/ecuador/quito/belisario/www.quitoambiente.gob.ec.

[B10-sensors-20-05831] 10.Datos Históricos REMMAQ Descarga 2004–2020. [(accessed on 9 October 2020)]; Available online: http://www.quitoambiente.gob.ec/ambiente/index.php/descarga-datos-historicos.

[B11-sensors-20-05831] 11.Hollander M., Wolfe D.A., Chicken E. Nonparametric Statistical Methods. 3rd ed. John Wiley & Sons, Inc.; Hoboken, NJ, USA: 2014. [Google Scholar]

[B12-sensors-20-05831] 12.Gibbons J.D., Chakraborti S. Nonparametric Statistical Inference. 5th ed. Chapman & Hall/CRC Press, Taylor & Francis Group; Boca Raton, FL, USA: 2011. [Google Scholar]

[B13-sensors-20-05831] 13.Hoaglin D.C., Mosteller F., Tukey J.W. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; Hoboken, NJ, USA: 2000. [Google Scholar]

[B14-sensors-20-05831] 14.Maronna R.A., Martin R.D., Yohai V.J. Robust Statistics: Theory and Methods. John Wiley & Sons; Chichester, UK: 2006. [Google Scholar]

[B15-sensors-20-05831] 15.Wilcox R. Introduction to Robust Estimation and Hypothesis Testing. 3rd ed. Academic Press; Waltham, MA, USA: 2012. [Google Scholar]

[B16-sensors-20-05831] 16.Reece S., Williams R., Colón M., Southgate D., Huertas E., O’Shea M., Iglesias A., Sheridan P. Spatial-Temporal Analysis of PM2.5 and NO2 Concentrations Collected Using Low-Cost Sensors in Peñuelas, Puerto Rico. Sensors. 2018;18:4314. doi: 10.3390/s18124314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17-sensors-20-05831] 17.Arnaudo E., Farasin A., Rossi C. A Comparative Analysis for Air Quality Estimation from Traffic and Meteorological Data. Appl. Sci. 2020;10:4587. doi: 10.3390/app10134587. [DOI] [Google Scholar]

[B18-sensors-20-05831] 18.Wang L., Wang J., Tan X., Fang C. Analysis of NOx Pollution Characteristics in the Atmospheric Environment in Changchun City. Atmosphere. 2020;11:30. doi: 10.3390/atmos11010030. [DOI] [Google Scholar]

[B19-sensors-20-05831] 19.Kadiyala A., Kumar A. An Examination of the Sensitivity of Sulfur Dioxide, Nitric Oxide, and Nitrogen Dioxide Concentrations to the Important Factors Affecting Air Quality Inside a Public Transportation Bus. Atmosphere. 2012;3:266–287. doi: 10.3390/atmos3020266. [DOI] [Google Scholar]

[B20-sensors-20-05831] 20.Hernandez W., Mendez A., González-Posadas V., Jiménez-Martín J.L. Robust Analysis of the Information Obtained from a Set of 12 Years of SO2 Concentration Measurements. IEEE Access. 2020;8:144976–144992. doi: 10.1109/ACCESS.2020.3015038. [DOI] [Google Scholar]

[B21-sensors-20-05831] 21.Hernandez W., Mendez A.M. Robust Estimation of Carbon Monoxide Measurements. Sensors. 2020;20:4958. doi: 10.3390/s20174958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22-sensors-20-05831] 22.Cavaliere A., Carotenuto F., Di Gennaro F., Gioli B., Gualtieri G., Martelli F., Matese A., Toscano P., Vagnoli C., Zaldei A. Development of Low-Cost Air Quality Stations for Next Generation Monitoring Networks: Calibration and Validation of PM2.5 and PM10 Sensors. Sensors. 2018;18:2843. doi: 10.3390/s18092843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23-sensors-20-05831] 23.Rodríguez-Martín M., Rodríguez-Gonzálvez P. Suitability of Automatic Photogrammetric Reconstruction Configurations for Small Archeological Remains. Sensors. 2020;20:2936. doi: 10.3390/s20102936. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24-sensors-20-05831] 24.Lun Fung P., Zaidan M.A., Sillanpää S., Kousa A., Niemi J.V., Timonen H., Kuula J., Saukko E., Luoma K., Petäjä T., et al. Input-Adaptive Proxy for Black Carbon as a Virtual Sensor. Sensors. 2020;20:182. doi: 10.3390/s20010182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25-sensors-20-05831] 25.Hernandez W., Mendez A., Diaz-Marquez A.M., Zalakeviciute R. PM2.5 concentration measurement analysis by using nonparametric statistical inference. IEEE Sens. J. 2020;20:1084–1094. doi: 10.1109/JSEN.2019.2945581. [DOI] [Google Scholar]

[B26-sensors-20-05831] 26.Hernandez W., Mendez A., Zalakeviciute R., Diaz-Marquez A.M. Analysis of the information obtained from PM2.5 concentration measurements in an urban park. IEEE Trans. Instrum. Meas. 2020;69:6296–6311. doi: 10.1109/TIM.2020.2966360. [DOI] [Google Scholar]

[B27-sensors-20-05831] 27.Mukherjee A., Brown S.G., McCarthy M.C., Pavlovic N.R., Stanton L.G., Lam Snyder J., D′Andrea S., Hafner H.R. Measuring spatial and temporal PM2.5 variations in Sacramento, California, communities using a network of low-cost sensors. Sensors. 2019;19:4701. doi: 10.3390/s19214701. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28-sensors-20-05831] 28.Borghi F., Spinazzè A., Campagnolo D., Rovelli S., Cattaneo A., Cavallo D.M. Precision and accuracy of a direct-reading miniaturized monitor in PM2.5 exposure assessment. Sensors. 2018;18:3089. doi: 10.3390/s18093089. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29-sensors-20-05831] 29.Wang S., Van der A R.J., Stammes P., Wang W., Zhang P., Lu N., Fang L. Carbon dioxide retrieval from TanSat observations and validation with TCCON measurements. Remote Sens. 2020;12:2204. doi: 10.3390/rs12142204. [DOI] [Google Scholar]

[B30-sensors-20-05831] 30.Shokr M., El-Tahan M., Ibrahim A., Steiner A., Gad N. Long-term, high-resolution survey of atmospheric aerosols over egypt with NASA’s MODIS data. Remote Sens. 2017;9:1027. doi: 10.3390/rs9101027. [DOI] [Google Scholar]

[B31-sensors-20-05831] 31.Baire M., Melis A., Lodi M.B., Tuveri P., Dachena C., Simone M., Fanti A., Fumera G., Pisanu T., Mazzarella G. A wireless sensors network for monitoring the carasau bread manufacturing process. Electronics. 2019;8:1541. doi: 10.3390/electronics8121541. [DOI] [Google Scholar]

[B32-sensors-20-05831] 32.Tang C.-S., Wu T.-Y., Chuang K.-J., Chang T.-Y., Chuang H.-C., Candice Lung S.-C., Chang L.-T. Impacts of in-cabin exposure to size-fractionated particulate matters and carbon monoxide on changes in heart rate variability for healthy public transit commuters. Atmosphere. 2019;10:409. doi: 10.3390/atmos10070409. [DOI] [Google Scholar]

[B33-sensors-20-05831] 33.Model 42C NO-NO2-NOx Analyzer. [(accessed on 28 July 2020)]; Available online: http://www.thermo.com.cn/Resources/200802/productPDF_20998.pdf.

[B34-sensors-20-05831] 34.Model 42i NO-NO2-NOx Analyzer. [(accessed on 28 July 2020)]; Available online: https://www.thermofisher.com/order/catalog/product/42I#/42I.

[B35-sensors-20-05831] 35.U.S. Environmental Protection Agency . List of Designated Reference and Equivalent Methods. United States Environmental Protection Agency; Washington, DC, USA: 2020. [Google Scholar]

[B36-sensors-20-05831] 36.EPA-454/B-17-001 . Ambient Air Quality Monitoring Program. Quality Assurance Handbook for Air Pollution Measurement Systems. Volume 2 Environmental Protection Agency; Washington, DC, USA: 2017. [Google Scholar]

[B37-sensors-20-05831] 37.Box G.E.P., Jenkins G.M., Reinsel G.C., Ljung G.M. Time Series Analysis: Forecasting and Control. 5th ed. Jhon Wiley & Sons; Hoboken, NJ, USA: 2016. [Google Scholar]

[B38-sensors-20-05831] 38.Brockwell P.J., Davis R.A. Introduction to Time Series and Forecasting. 2nd ed. Springer; New York, NY, USA: 2002. [Google Scholar]

[B39-sensors-20-05831] 39.Bryson M.C. Heavy-Tailed Distributions: Properties and Tests. Technometrics. 1974;16:61–68. doi: 10.1080/00401706.1974.10489150. [DOI] [Google Scholar]

[B40-sensors-20-05831] 40.Papoulis A., Unnikrishna Pillai S. Probability, Random Variables, and Stochastic Processes. 4th ed. McGraw-Hill Higher Education; New York, NY, USA: 2002. [Google Scholar]

[B41-sensors-20-05831] 41.Hampel F.R. The Influence Curve and its Role in Robust Estimation. J. Am. Stat. Assoc. 1974;69:383–393. doi: 10.1080/01621459.1974.10482962. [DOI] [Google Scholar]

[B42-sensors-20-05831] 42.Tukey J.W. Exploratory Data Analysis. Addison-Wesley; Reading, MA, USA: 1977. [Google Scholar]

[B43-sensors-20-05831] 43.Rock N.M.S. ROBUST: An Interactive FORTRAN-77 Package for Exploratory Data Analysis Using Parametric, Robust and Nonparametric Location and Scale Estimates, Data Transformations, Normality Tests, and Outlier Assessment. Comput. Geosci. 1987;13:463–494. doi: 10.1016/0098-3004(87)90051-3. [DOI] [Google Scholar]

[B44-sensors-20-05831] 44.Croux C., Rouseeuw P.J. A Class of High-Breakdown Scale Estimators Based on Subranges. Commun. Stat. Theory Methods. 1992;21:1935–1951. doi: 10.1080/03610929208830889. [DOI] [Google Scholar]

PERMALINK

Twelve-Year Analysis of NO2 Concentration Measurements at Belisario Station (Quito, Ecuador) Using Statistical Inference Techniques

Wilmar Hernandez

Alfredo Mendez

Abstract

1. Introduction

2. Summary Statistics of 12 Years of NO2 Concentration Measurements at Belisario Station

Table 1.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

3. Analysis of NO2 Concentration Measurements Using Nonparametric Methods

Table 2.

Figure 11.

Figure 12.

Figure 13.

Figure 14.

4. Robust Analysis of the NO2 Concentration Measurements

4.1. Estimators of Central Tendency and Scale

Table 3.

Table 4.

Figure 15.

Figure 16.

Figure 17.

Figure 18.

4.2. Confidence Intervals

Figure 19.

Table 5.

Figure 20.

Figure 21.

5. Discussion

6. Conclusions

Acknowledgments

Author Contributions

Funding

Conflicts of Interest

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Twelve-Year Analysis of NO₂ Concentration Measurements at Belisario Station (Quito, Ecuador) Using Statistical Inference Techniques

2. Summary Statistics of 12 Years of NO₂ Concentration Measurements at Belisario Station

3. Analysis of NO₂ Concentration Measurements Using Nonparametric Methods

4. Robust Analysis of the NO₂ Concentration Measurements