Evolution and forecasting of PM10 concentration at the Port of Gijon (Spain)

Fernando Sánchez Lasheras; Paulino José García Nieto; Esperanza García Gonzalo; Laura Bonavera; Francisco Javier de Cos Juez

doi:10.1038/s41598-020-68636-5

. 2020 Jul 16;10:11716. doi: 10.1038/s41598-020-68636-5

Evolution and forecasting of PM10 concentration at the Port of Gijon (Spain)

Fernando Sánchez Lasheras ^1,^✉, Paulino José García Nieto ¹, Esperanza García Gonzalo ¹, Laura Bonavera ², Francisco Javier de Cos Juez ³

PMCID: PMC7366928 PMID: 32678178

Abstract

The name PM₁₀ refers to small particles with a diameter of less than 10 microns. The present research analyses different models capable of predicting PM₁₀ concentration using the previous values of PM₁₀, SO₂, NO, NO₂, CO and O₃ as input variables. The information for model training uses data from January 2010 to December 2017. The models trained were autoregressive integrated moving average (ARIMA), vector autoregressive moving average (VARMA), multilayer perceptron neural networks (MLP), support vector machines as regressor (SVMR) and multivariate adaptive regression splines. Predictions were performed from 1 to 6 months in advance. The performance of the different models was measured in terms of root mean squared errors (RMSE). For forecasting 1 month ahead, the best results were obtained with the help of a SVMR model of six variables that gave a RMSE of 4.2649, but MLP results were very close, with a RMSE value of 4.3402. In the case of forecasts 6 months in advance, the best results correspond to an MLP model of six variables with a RMSE of 6.0873 followed by a SVMR also with six variables that gave an RMSE result of 6.1010. For forecasts both 1 and 6 months ahead, ARIMA outperformed VARMA models.

Subject terms: Environmental impact, Environmental sciences

Introduction

The town of Gijón and its Port

Gijón is a town located on the north coast of Spain, in the Principality of Asturias. It is the most populated municipality of this region, with a total of 273,422 inhabitants according to 2016 census. This town, together with Oviedo (220,648 inhabitants) and Avilés (79,514 inhabitants) and other small towns, forms a metropolitan area with more than 850,000 inhabitants. It was founded in the fifth century B.C. During the twentieth century it underwent significant development due to industry, something which is still of great importance to the local economy.

The weather in Gijón is defined by its proximity to the sea and the low mean altitude. The annual level of precipitation is quite high, with a total of 920 L per square meter and year. Regarding temperature, the coldest month is January, with an average temperature of 8.9 °C, while the hottest is August with 19.7 °C. The average annual temperature is 13.8 °C. Winds are sporadic and seasonal. The wind regime is dominated by two main components¹. During winter it blows from W-WSW, while in summer it comes from E-ENE on the coast.

The Port of Gijón, named El Musel, is one of the main ports of the Atlantic Arc and the leading port in the movement of solid bulk in Spain. It is located in the Cantabrian Sea (43°34′N, 5°41′W). Figure 1a shows its position on the North Atlantic Spanish coast and Fig. 1b is an aerial picture of the town, where the location of the port can be observed.

(a) position of the Port of Gijon on the North Atlantic coast of Spain, (b) aerial picture of Gijón and its Port (inside the red line) including the position of the weather station.

*Source*: Google Maps, Map data©2019 Google; https://www.google.es/maps/@43.5547854,-5.6995551,9849m/data=!3m1!1e3. The map was edited with PowerPoint version: 16.0.12527.20260.

The commercial exploitation of this port started in 1907. In the 1990s there was a development plan that doubled its area and which led to a significant increase in its activity. Its infrastructure is adapted to modern market requirements in terms of drafts, springs and storage areas and a range of services with the best standards of quality. It has 415 hectares of land surface and 7,000 linear meters of dock, structured in areas with the appropriate characteristics to serve each kind of traffic, i.e. specialized terminals for solid bulks, liquids and containers, and multi-purpose facilities for various types of traffic.

In the beginning, the exports were mainly iron ore and coal. Subsequently, the port would expand on its breakwaters and piers, and in the 1940s became the main Spanish port in traffic movement. The industrial activity of the Principality of Asturias has its main ally in the Port of Gijón. Currently, it is the main bulk port in Spain and one of the most important ports of the Atlantic Arc. According to the traffic statistics of the Annual Report of 2018, a total of 18,226 ships entered the port during that year, which meant a total of 79,294 containers and 12.7 millions of tonnes in the dry bulk terminal, of which 6.4 corresponded to iron ore, 3.4 to iron steel and 2.8 to steam coal. The net revenue in 2018 was 42.2 million euros.

Pollution and particulate matter studies

The World Health Organisation has reported that air pollution has an adverse effect on people’s health and development². It is well-known that long-term exposure to high levels of air pollution is linked to decrements in lung function in children³. A Swiss study found increased levels of allergic sensitisation in adults living in proximity to busy roads for periods longer than 10 years⁴. Also, the PM₁₀ pollutant is amongst those regulated under the Air Quality Framework Directive on ambient air quality assessment and management⁵.

A continuous exposure to pollutants such as Carbon Monoxide (CO), Carbon Dioxide (CO₂), oxides of nitrogen (NO_x), and particulate matter is reported to cause health problems in the population living in the affected areas^6,7. Particulate matter is formed by different chemical products, mostly produced by anthropogenic processes⁶ and with significantly variable diameters. Their anthropogenic origin is the reason why they are more present in urban areas⁸ than in unpopulated areas.

Air quality issues are relevant in ports and areas nearby. In general, the duty cycle of marine vessels is longer than that of roadside vehicles. This means that ship engines generally use older technology than cars and due to their engine power they are also much more pollutant⁹. Previous studies have analysed PM₁₀ concentrations in ports and coastal areas like the Bay of Algeciras in Southern Spain¹⁰. Another study analysed the impact of PM_2.5 particles from ship emissions in southern California¹¹. In Turkey, shipping emissions in the regions of Candelari Gulf¹² and Ambarli Port¹³, both with heavy shipping traffic, were investigated. Research carried out in the port¹⁴ of Tarragona, Spain, made use of multi-linear regression models to study the contribution of different harbour activities to the levels of particulate matter in its area. In the same line there is another study,¹⁵performed in Barcelona’s harbour, also located in Spain and about 80 kms. as the crow flies from Tarragona, which has estimated that around 50–55% of PM₁₀ and PM_2.5 concentrations measured at the port could be attributed to harbour activities and that such activities provide about 9–12% of the total PM₁₀ concentration in the air and about 11–15% of PM_2.5 to the metropolitan area of this city Another interesting and innovative study¹⁶ that deals with the problem of particulate matter in ports was performed in the port of Zhejiang. In this research, with the help of an unmanned aerial vehicle that integrated different sensors, authors have been able to create a profile of the vertical distribution of PM_2.5, PM₁₀ and total suspended particles from ground level to a height of 120 m. A study made at the port of Volos¹⁷, in Greece, found that the highest PM₁₀ concentration values were associated with days of calm winds, meaning a wind speed under $0.5 \frac{m .}{s .}$ . The only research into ports that made use of a supervised learning methodology was the one concerning the port of Koper ¹⁸. Koper is the only port in Slovenia and is located at the northern tip of the Adriatic Sea. Researchers made use of hourly PM₁₀ concentrations and employed k-means clustering with Euclidean and city-block distances to cluster days. The results obtained showed the influence of rain intensity and wind speed in the clusters performed but the influence of any other pollutant was not studied. Finally, another study of interest was performed at the Port of Cork, which, like Gijón, is located on the Atlantic coast¹⁹.

Use of machine learning techniques to forecast pollutant concentrations

In general, machine learning can be understood as a subset of methodologies of the artificial intelligence field that are able to learn in an automatic way. In other words, they can learn from data and predict future events. Nowadays, the use of machine learning methodologies has extended to almost all branches of science, including environmental studies. One of the main reasons for the use of machine learning approaches for air quality forecasting is the ability of these methodologies to capture non-linear relationships among variables.

Interest in the forecasting of air pollution in urban area dates back to more than a century ago, when large cities began to have problems with pollution ²⁰. In the 1970s, several statistical models for pollution forecasting were proposed^21,22. The first applications of machine learning methodologies in this field were in the 1990s. In those days most research performed made use of artificial neural networks^23,24.

Since then, the different studies performed have made use of other techniques such as genetic algorithms²⁵, Hierarchical Agglomerative Clustering²⁶, k-means²⁷ or support vector machines as regressors²⁸.

Genetic algorithms have been employed as a supporting methodology for selecting the input variables and designing the high-level architecture of neural networks models. In certain research works²⁵, they were applied to the selection of the architecture and input variables of a multilayer-perceptron model for forecasting of hourly concentrations of NO. One of the limitations found by this technique is that training each neural network model is a time-consuming task and therefore, the number of parameters to be tuned must be limited.

Hierarchical agglomerative clustering is employed to group objects that are similar in subsets called clusters. The agglomerative clustering methodology starts with many small clusters and merges them together to create large ones. It has been successfully applied in order to study ozone exposure and cardiovascular-related mortality in Canada²⁶. The results obtained showed that this methodology is useful for studying the long-term effects of air pollution on cardiovascular diseases.

A recent study has shown how k-means clustering can be employed to categorize different locations in a big and populated city representing the variability of pollution according to the variables employed for the study²⁷. Finally, the use of support vector machines as a regressor has also been reported in some studies^28,29. In one of these²⁸ the support vector machine is employed as a regressor model for the forecast of the daily Beijing air quality index from 1^st January 2014 to mid-2016, while in the others^29,30 they are employed for the forecast of the daily average ${PM}_{10}$ .

The aim of the present research is to forecast the air quality in a port area, specifically in the port area of the city of Gijón. For this purpose, the article applied different machine learning models (multilayer perceptron neural networks, support vector machine as regressor and MARS) and compared the performance of the predictions obtained for different time intervals with those given by two time series methodologies, one of them univariate (ARIMA) and the other multivariate (VARMA). This means that an exhaustive comparison is made of the prediction from 1 to 6 months in advance of the performance of five methods. This provides an interesting framework for the comparison of methodologies. All these methods were employed in the past for pollution forecasting, but never all in the same research, as far as the authors know. Therefore, the relevance of the present research is that it deals with the topic of monitoring air quality in a city, comparing different machine learning methodologies applied to the same data set.

The database

The information employed for this research has been obtained from one of the meteorological stations belonging to the network of Air Quality Monitoring of the Government of the Principality of Asturias, and more specifically from the one closest to the Port of Gijón, which is located at Argentina Avenue. This station records environmental measurements hourly. As is normal in all this kind of databases, about 0.23% of the raw observations taken each 15 min for all variables were missing. They were imputed with the help of the Multivariate Imputation by Chained Equiations (MICE) algorithm³¹.

Table 1 shows the minimum, mean, maximum and standard deviation of the pollutants measured at Gijón Port for the period of study. The values considered for the present research were average monthly measurements from January 2010 to June 2018. Information from January 2010 to December 2017 was employed to forecast values from January to June 2018. Pollutants measured at the Port of Gijón were SO₂, NO, NO₂, CO, O₃ and PM₁₀.

Table 1.

Port of Gijón. Minimum, mean, maximum and standard deviation of the variables of the study: sulfur dioxide (SO₂), nitrogen monoxide (NO), nitrogen dioxide (NO₂), carbon oxide (CO), ozone (O₃) and particulate matter with a diameter less than 10 µm (PM₁₀).

	Minimum	Mean	Maximum	Standard deviation
SO₂ (µg/m³)	4.0000	7.9706	20.0000	3.2379
NO (µg/m³)	4.0000	10.9510	30.0000	6.8091
NO₂ (µg/m³)	7.0000	26.1471	46.0000	8.9159
CO (µg/m³)	0.1800	0.4023	0.8600	0.1362
O₃ (µg/m³)	13.0000	38.0000	64.0000	9.9980
PM₁₀ (µg/m³)	18.0000	31.5196	50.0000	7.6271

Open in a new tab

Materials and methods

The present research calculates predictive models of PM₁₀ concentration by means of autoregressive integrated moving average (ARIMA), vector autoregressive moving-average (VARMA), multilayer perceptron neural networks (MLP), support vector machines as regressor (SVMR) and multivariate adaptive regression splines (MARS) models. In all cases the PM₁₀ values were calculated in two ways: firstly, using the concentration of the six pollutants available as input variables and afterwards employing only four: SO₂, NO, NO₂ and PM₁₀. The main reason why new models using only four variables of the six available are also trained and validated is that many meteorological stations, including some pertaining to the net of Air Quality Monitoring of the Government of the Principality of Asturias are only able to measure these four variables. In other words, the use of only the aforementioned four variables will allow us to compare the model performance according to the input variables employed and will serve as a reference for future studies. Please note that what was said before relates to all the models of the present research except for ARIMA, where only concentration of PM₁₀ are employed for the forecasting. In all cases, for continuous variables minimum, mean, maximum and standard deviation were calculated.

Forecasts are performed from 1 to 6 months in advance. The reason why it might be of interest to perform forecasts 6 months in advance is two-fold. On the one hand, high PM₁₀ concentrations have adverse effects on human health and on the other, having such a forecast would be helpful in order to take measurements that would make it possible to comply with European air quality standards. According to the results obtained, the best forecast of PM₁₀ concentration 1 month ahead is obtained by the SVMR model calculated with six variables. In the case of the forecast 6 months ahead the results of the MLP with six variables are slightly better. In other words, in the short-term the best forecasts are given by SVMR but in the long-term it is outperformed by MLP.

Autoregressive integrated moving average (ARIMA)

ARIMA models can be considered as being an extension of ARMA (autoregressive moving average) known for their ability to provide a parsimonious description of a stationary stochastic process³². ARMA models are composed of two polynomial terms, one for autoregression (AR) and another for moving average (MA). Given a time series of data $X_{t}$ , the ARMA model can be expressed as:

X_{t} = c + ε_{t} + \sum_{i = 1}^{p} φ_{i} X_{t - i} + \sum_{i = 1}^{q} σ_{i} ε_{t - i}

where $c$ is a constant, $ε_{t}$ are white noise error terms, $\sum_{i = 1}^{P} φ_{i} X_{t - i}$ is the autoregressive addend where $φ_{i}$ are parameters and $X_{t - i}$ is the value of variable $X$ in time $t - i$ . $\sum_{i = 1}^{qq} σ_{i} ε_{t - i}$ is the moving-average addend where $σ_{i}$ are the parameters of the model.

ARIMA models are appropriate for those observation sets that are not necessarily generated by a time series, as is the case of the present problem. They considerably improve the empirical description of non-stationary time series²⁹. A stochastic process can be characterized as an ARIMA model if the d-th difference of $X_{t}$ , constitutes an ARMA stationary and invertible process of $p$ , $q$ orders.

In this case, $p$ represents the order of the autoregressive part of the model, $q$ is the order of the weighted moving average and another parameter called $d$ represents the number of differencing required to reach stationarity³³. If the differencing operator is denoted by $\nabla$ , the general ARIMA equation can be written as follows³⁰:

\emptyset_{p} (B) \nabla^{d} (X_{t} - L) = θ_{q} {(B)}_{ε_{i}}

where $\emptyset_{p} (B)$ and $θ_{q} (B)$ are the autoregressive polynomials of weighted moving averages and $ε_{i}$ is the model perturbation.

\begin{matrix} \emptyset_{p} (B) = 1 - \emptyset_{1} B - \emptyset_{2} B^{2} - \dots - \emptyset_{p} B^{p} \\ θ_{q} (B) = 1 - θ_{1} B - θ_{2} B^{2} - \dots - B_{q} B^{q} \end{matrix}

A more in-depth explanation of ARIMA models goes beyond the scope of this research and can be found elsewhere³⁴. All the models employed in the present research were calculated with the help of the statistical software R³⁵. ARIMA models were calculated with the help of the series library³⁶.

Vector autoregressive moving-average (VARMA)

The Vector autoregression Moving-Average (VARMA) method models the next step in each time series using an ARMA model. In other words, it can be considered the generalization of ARMA to multivariate time series. This kind of model makes it possible to compute a set of time series at the same time, obtaining their within-correlations and cross-correlations³². For these models calculus was performed with the help of the MTS library³⁷.

If a k-dimensional time series is represented by $z_{t}$ , the vector autoregressive moving-average VARMA $(p, q)$ process can be expressed as:

ϕ (B) z_{t} = ϕ_{0} + θ (B) a_{t}

where $ϕ_{0}$ is a constant vector

\begin{matrix} ϕ (B) = I_{k} - \sum_{t = 1}^{p} ϕ_{t} B_{t} \\ θ (B) = I_{k} - \sum_{t = 1}^{q} θ_{t} B_{t} \end{matrix}

are two matrix polynomials and $a_{t}$ is a sequence of independent and identically-distributed random vectors with mean zero and positive-definitive covariance matrix $\sum_{a}$ .

A general VARMA $(p, q)$ model is represented as follows³⁷:

z_{t} = ϕ_{0} + \sum_{t = 1}^{p} ϕ_{i} z_{t - 1} + a_{t} - \sum_{t = 1}^{q} θ_{i} a_{t - i}

In this equation $p$ and $q$ are nonnegative integers, $ϕ_{0}$ is a vector of constants, $ϕ_{i}$ and $θ_{j}$ are two constant matrix and $\{a_{t}\}$ is a sequence of independent and identically-distributed random vectors with mean zero and positive definite covariance matrix.

According to Tsay and Wood³⁷, the VARMA model expressed in the previous equation can be rewritten in a more convenient way as follows:

z_{t} = ϕ_{0} + \sum_{t = 1}^{p} ϕ_{i} z_{t - 1} + L b_{t} - \sum_{t = 1}^{q} θ_{j}^{*} b_{t - j}

where $θ_{j}^{*} = θ_{j} L$ where $L$ is a lower triangular matrix with 1 being the diagonal elements. The determination of $p$ and $q$ values was performed following a methodology suggested in previous research³⁸. Akaike information criterion³⁹ (AIC) and Schwarz information criterion⁴⁰ (SIC) were employed to balance the improvement in the value of the log-likelihood function with the loss of degrees of the freedom which results from increasing the lag order of a time series model. With the help of both the maximum $p$ and $q$ values were calculated. All those models with $p$ and $q$ values less or equal to then were calculated and finally, those with the best RMSE were presented in this paper.

Multilayer perceptron neural networks (MLP)

One of the first bio-inspired machine learning models was the one-layer perceptron. This kind of network was proposed by Rosemblatt⁴¹ as a possible modelization of the neuron of the human brain. The rule of the perceptron adaption consists of a supervised iterative method that modifies the neuron weights. The multilayer perceptron is useful as a way in which to modelize a function. In a neural network the outcome is modelled by an intermediary data set of unobservable variables called hidden variables, which are linear combinations of the original predictors. However, this linear combination is typically transformed by a nonlinear function.

Kolmogorov⁴² demonstrated that a two-layer network (one hidden layer and one output layer), with a non-linear differentiable activation function is able to approach any “soft” mapping if the number of neurons in the hidden layer is high enough. If a two-layer network like the one employed in the present research is considered, the operations for a system with $p$ input variables, one output variable and $q$ neurons in the hidden layer can be expressed as:

y (n) = σ (w^{y} \cdot φ (w^{h} \cdot x (n)))

where $y (n)$ and $x (n)$ are the output and input of the net; $σ$ is the activation function of the output layer; $φ$ is the activation function of the hidden layer; $w^{y}$ and $w^{h}$ are the weights matrix for the output and hidden layer respectively.

One main requirement in order to make possible the MLP training⁴³ is that $σ$ and $φ$ be continuously-differentiable functions. Training is performed with the backpropagation method, which is a recursive application of the gradient descent method. For the purposes of this research, the neural network models were trained and validated with the help of the library neuralnet⁴⁴. The activation function employed is the logistic function. A more in-depth explanation of the foundations of neural networks may be found elsewhere⁴⁵.

Support vector machines as regressor (SVMR)

Support Vector Machines were introduced by the work of Vapnik⁴⁶. Although they were created by binary classification, nowadays they are used for different kinds of problems. Those employed for regression problems are called SVMR²⁹.

Let a training data set $S = \{(x_{1}, x_{2}), \dots (x_{n}, y_{n})\}$ , where $x_{i} \in ℜ^{d}$ and $y_{i} \in ℜ$ the regression task involves finding those parameters $w = (w_{1}, \dots, w_{d})$ that make it possible to find the following lineal function²⁷:

f (x) = w_{1} x_{1} + \dots + w_{d} x_{d} + b

As in practice it is not possible to find these parameters with a prediction error equal to zero, a concept called soft margin is employed. For this, variable $ξ_{i}$ is employed and the equation is written as follows:

min \frac{1}{2} w, w + c \sum_{i = 1}^{n} (ξ_{i}^{+} + ξ_{i}^{-})

Please note that $ξ_{i}^{+} > 0$ when the forecast of the model $f (x_{i})$ is larger than its real value $y_{i}$ and $ξ_{i}^{+} < 0$ in other cases.

With the help of the lagrangian function and the Karush–Kuhn–Tucker conditions, the problem can be expressed as follows:

f (x) = \sum_{i = 1}^{n} (α_{i}^{-} - α_{i}^{+}) x, x_{0} + b^{*}

where

\begin{matrix} α_{i}^{+} = C - β_{i}^{+} \\ α_{i}^{-} = C - β_{i}^{-} \\ b^{*} = y_{i} - w^{*}, x_{i} \pm ε \end{matrix}

In those cases where data cannot be adjusted with the help of a linear function, kernels are employed⁴⁷. Kernels transform data into a new space called characteristics space.

The regressor associated to the lineal function in the new space is as follows:

f (x) = \sum_{i = 1}^{n} (α_{i}^{-} - α_{i}^{+}) K (x, x_{i})

please note that $b^{*}$ is not included in the function as it can be included as a constant inside the kernel. The kind of kernel function to be employed depends on the problem to be solved. For example, the radial basis function has been shown to be very effective, but in those cases where the data set comes from a linear regression, the linear kernel function obtains better results⁴⁸. The SVM as regressor models have been implemented with the functionalities of the library e1071⁴⁹. A good explanation of the use of SVM as regressor can be found in the work of Drucker et al.⁵⁰.

Multivariate adaptive regression splines (MARS)

MARS is a non-parametric modelling method driven by the following equation⁵¹:

y_{t} = f (x_{t}) = β_{0} + \sum_{i = 1}^{k} β_{i} \cdot B (x_{it})

where $y_{t}$ is the output variable for each time $t$ and $β_{i}$ are the model parameters for the different $x_{it}$ . $β_{0}$ is the intercept and $B$ represents the model basis functions.

One of the main characteristics of the MARS models is that they do not make use of any a priori hypothesis concerning the relationships among the variables⁵². The basis functions are defined as follows:

\begin{matrix} B^{-} = \{\begin{matrix} {(t - x)}^{q} & i f x < t \\ 0 & otherwise \end{matrix} \\ B^{+} = \{\begin{matrix} {(t - x)}^{q} & i f x \geq t \\ 0 & otherwise \end{matrix} \end{matrix}

$q$ is the power of the basis function as is always a value either equal o larger than zero. In order to adjust a MARS model and decide which basis functions are to be included, MARS makes use of the generalized cross validation (GCV). This represents the root mean squared error divided by a penalty parameter that is defined by the model complexity⁵³. Its equation is as follows:

C (M) = M + 1 + d \cdot M

where $M$ represents the number of basis functions in the equation and $d$ is a penalty parameter for each base function included in the model. For this research, a value of 2 has been assigned to such a parameter, while the maximum number of tracer interaction type base functions is restricted to 3. The MARS models employed in this research are based on those programmed in the library earth⁵⁴. A complete explanation of MARS models can be found in the original work of Friedman⁵¹. Also, an easy-to-read introduction to this methodology can be found in the works of Put et al.⁵⁵.

Results and discussion

Table 2 shows the Pearson’s correlation coefficients of all the variables in the study. The largest correlation coefficient in absolute value corresponds to variables NO and NO₂ with 0.8626, followed by NO and O₃ with − 0.7593 (inverse relationship) and SO₂ and NO₂ and SO₂ and NO with 0.7160 and 0.7090 respectively. Correlation coefficients of variables SO₂, NO, NO₂ and O₃ with PM₁₀ can be considered in absolute value terms as moderate as they range from 0.4320 (CO and PM₁₀) to 0.5251 (NO₂ and PM₁₀).

Table 2.

Pearson’s correlation coefficients of the variables of the study.

	NO	NO₂	CO	O₃	PM₁₀
SO₂	0.7090	0.7160	0.6503	− 0.5483	0.4923
NO		0.8626	0.6587	− 0.7593	0.5068
NO₂			0.6755	− 0.5475	0.5251
CO				− 0.4823	0.4320
O₃					− 0.4663

Open in a new tab

Table 3 shows the results of the ARIMA model using the previous values of PM₁₀ as the input variable. Tables 4, 5, 6, 7 and 8 show the results obtained using the different models of four (SO₂, NO, NO₂ and PM₁₀) and six variables (SO₂, NO, NO₂, CO, O₃ and PM₁₀) employed in the present research. In all cases, the results are presented in the same way. The first line represents the forecast performed using information from January 2010 to December 2017 as training values. This forecast is performed for the following 6 months. The second line shows the forecast performed using information from January 2010 to January 2018 and the forecasts from February 2018 (1 month ahead) to June 2018 (5 months ahead) as training values. For all the cases, and in order to make an easy comparison of real values with forecasting, root mean squared errors (RMSE) forecasting values from 1 to 6 months ahead and 1 month ahead for all models are presented in Table 9. In the case of the ARIMA model (Table 3), the one that only makes use of past PM₁₀ concentrations in order to predict their future values, the RMSE obtained for forecasts performed 1 month ahead was 6.3163 while the RMSE for forecast performed from 1 to 6 months ahead, the RMSE value was 7.6312. Please note that when we speak about the RMSE obtained for a forecast performed 1 month ahead, we refer to the values that are in the diagonal of the table (in the case of Table 3: 22.2217, 32.0564, 19.7957, 22.9000, 34.6428 and 29.6487) as they are the ones calculated 1 month ahead. Regarding the forecast from 1 to 6 months ahead, we compare real values with the forecast of the first row of the table from January 2018 to June 2018 (in the case of Table 3: 22.2217, 31.5194, 19.8269, 20.1082, 37.0095 and 31.8833). Please note that the real monthly averaged values from January to June 2018 were 29, 27, 26, 31, 29 and 24 respectively. These values are included in Tables 3, 4, 5, 6, 7 and 8 make comparisons more direct.

Table 3.

Port of Gijón. Results of the ARIMA models using variable PM₁₀.

	Jan-18	Feb-18	Mar-18	Apr-18	May-18	Jun-18
	22.2217	31.5194	19.8269	20.1082	37.0095	31.8833
		32.0564	21.4559	22.9140	32.8949	29.8194
			19.7957	23.4279	34.4069	30.4898
				22.9000	33.2961	29.6945
					34.6428	29.3402
						29.6487
Avg	29	27	26	31	29	24

Open in a new tab

Table 4.

Port of Gijón. Results of the VARMA models using variables SO₂, NO, NO₂ and PM₁₀.

p	q	Jan-18	Feb-18	Mar-18	Apr-18	May-18	Jun-18
4	2	39.6108	42.0017	21.0807	39.9202	21.7535	40.9830
4	2		43.1236	24.5948	39.9053	22.4729	41.6383
4	2			21.4770	40.4344	23.8317	34.4105
4	2				40.8564	24.0001	35.0403
4	2					21.8562	33.3678
4	2						32.4333
4	1	40.4407	42.6933	21.5210	40.0030	22.7195	41.5106
4	1		43.8059	24.7208	40.2494	23.4349	41.8195
4	1			21.9345	41.2505	23.9012	34.6527
4	1				41.7434	24.9623	35.1930
4	1					21.9919	33.8758
4	1						32.5900
2	1	40.1493	43.0113	21.3623	39.8731	23.0342	41.7924
2	1		43.9369	25.4061	40.0493	23.9055	41.4342
2	1			22.2033	41.6250	24.4226	34.7485
2	1				41.8263	24.8753	34.8986
2	1					21.7138	34.4839
2	1						32.9979
1	2	39.6597	43.3128	20.8822	39.8591	22.3459	41.5172
1	2		44.7840	25.2376	40.2805	24.0309	40.5961
1	2			21.5377	41.7663	24.4224	34.6140
1	2				41.8746	25.0023	34.8847
1	2					21.5321	34.5682
1	2						32.5428
Avg		29	27	26	31	29	24

Open in a new tab

Table 5.

Port of Gijón. Results of the VARMA models using variables SO₂, NO, NO₂, CO, O₃ and PM₁₀.

p	q	Jan-18	Feb-18	Mar-18	Apr-18	May-18	Jun-18
4	2	37.8290	44.0486	25.6528	40.2896	25.9201	39.4256
4	2		42.8897	29.4979	36.6604	29.4432	39.6890
4	2			24.6193	36.6633	30.2776	31.0831
4	2				37.7013	29.0341	33.1441
4	2					27.4516	33.9081
4	2						31.8559
4	1	38.7335	43.5777	25.5632	40.6248	26.2742	40.3105
4	1		43.6040	28.9298	37.4593	28.6250	39.3706
4	1			24.2618	37.2412	30.0657	31.7601
4	1				38.8584	28.0238	33.7027
4	1					27.9027	35.1452
4	1						32.9623
2	1	39.0075	44.2329	24.5744	41.0882	26.1770	40.7826
2	1		44.0753	28.3803	39.1346	29.0462	38.8954
2	1			24.7798	37.6661	28.4491	32.0509
2	1				39.2261	27.1545	34.1648
2	1					26.8998	34.9791
2	1						32.8426
1	2	39.6361	44.3558	25.1907	41.3853	25.3394	41.8425
1	2		43.5959	27.5150	39.5201	27.3642	39.6072
1	2			24.0832	39.2072	28.1761	33.7224
1	2				40.5151	26.5538	34.6911
1	2					27.1467	34.9188
1	2						33.0177
Avg		29	27	26	31	29	24

Open in a new tab

Table 6.

Port of Gijón. Results of the MLP models with variables SO₂, NO, NO₂ and PM₁₀ and with variables SO₂, NO, NO₂, CO, O₃ and PM₁₀.

	Jan-18	Feb-18	Mar-18	Apr-18	May-18	Jun-18
Model with variables SO₂, NO, NO₂ and PM₁₀
	22.8982	28.3153	26.0853	20.4793	35.1029	30.9771
		30.2043	26.3742	23.2469	30.1220	31.0470
			25.1822	24.2686	32.2976	29.8388
				25.4043	31.2319	27.8622
					33.6755	27.8836
						29.1743
Model with variables SO₂, NO, NO₂ CO, O3 and PM₁₀
	23.9208	29.9514	21.0074	22.0775	34.2918	31.4352
		30.3128	19.3318	24.6153	30.3982	30.5587
			24.2857	25.6399	31.9302	29.8496
				26.3989	30.9463	29.5339
					33.3901	29.3019
						29.7334
Avg	29	27	26	31	29	24

Open in a new tab

Table 7.

Port of Gijón. Results of the SVMR models with variables SO₂, NO, NO₂ and PM₁₀ and with variables SO₂, NO, NO₂, CO, O₃ and PM₁₀.

	Jan-18	Feb-18	Mar-18	Apr-18	May-18	Jun-18
Model with variables SO₂, NO, NO₂ and PM₁₀
	22.5224	29.4214	21.7977	21.7465	35.8175	29.2032
		31.2132	19.4056	23.6688	32.2857	28.5922
			24.9580	25.5375	32.5812	29.4920
				26.0547	34.0507	28.3719
					34.4723	28.9101
						29.5071
Model with variables SO₂, NO, NO₂ CO, O3 and PM₁₀
	23.6383	30.0879	21.4260	21.5572	34.4579	30.7213
		30.9299	19.7200	24.4384	32.0464	29.6021
			25.1539	25.5781	33.4582	30.0453
				26.6899	32.6893	29.9452
					32.9072	28.2090
						29.5126
Avg	29	27	26	31	29	24

Open in a new tab

Table 8.

Port of Gijón. Results of the MARS models with variables SO₂, NO, NO₂ and PM₁₀ and with variables SO₂, NO, NO₂, CO, O₃ and PM₁₀.

	Jan-18	Feb-18	Mar-18	Apr-18	May-18	Jun-18
Model with variables SO₂, NO, NO₂ and PM₁₀
	31.7247	26.2609	27.9016	21.0456	21.7012	41.1329
		25.9874	28.0799	22.6011	22.2278	39.4284
			29.0392	24.4142	23.5083	39.5599
				24.4118	23.5069	39.5599
					23.4797	39.5665
						41.8199
Model with variables SO₂, NO, NO₂ CO, O3 and PM₁₀
	29.3314	25.8768	32.2319	30.7817	27.1559	39.4833
		25.8768	31.3865	29.9750	26.4461	39.4833
			31.4188	30.0142	26.5028	39.4833
				30.7211	27.1826	39.4833
					27.0815	39.4833
						39.4833
Avg	29	27	26	31	29	24

Open in a new tab

Table 9.

RMSE values 1 and up to 6 months ahead of all the models employed in the present study.

Model and variables number	RMSE
Model and variables number	One month ahead	Up to 6 months ahead
ARIMA	6.3162	7.6312
VARMA (p = 4 q = 2) 4 variables	10.1021	11.4189
VARMA (p = 4 q = 1) 4 variables	10.5529	11.7214
VARMA (p = 2 q = 1) 4 variables	10.6211	11.7832
VARMA (p = 1 q = 2) 4 variables	10.7767	11.8007
VARMA (p = 4 q = 2) 6 variables	8.5767	10.8202
VARMA (p = 4 q = 1) 6 variables	9.2802	11.0743
VARMA (p = 2 q = 1) 6 variables	9.5173	11.4786
VARMA (p = 1 q = 2) 6 variables	9.7252	11.9347
MLP 4 variables	4.6209	6.2661
MLP 6 variables	4.3402	6.0873
SVMR 4 variables	4.9249	6.1191
SVMR 6 variables	4.2649	6.1010
MARS 4 variables	8.2575	8.7319
MARS 6 variables	6.7605	6.8725

Open in a new tab

The RMSE values achieved 1 and up to 6 months ahead for all the models trained in the present research are shown in Table 9. For forecasting 1 month ahead, the best results are obtained for the six variables of SVMR and MLP models, followed by the same models including only four variables. These results give us the idea that all the variables included in the study have a certain relevance in terms of performing an accurate PM₁₀ prediction. After the MLP and SVMR models, according to RMSE values the next best in forecasting 1 month ahead is the ARIMA model, the only one that makes exclusive use of past PM₁₀ values in order to forecast future concentrations. The ARIMA model is followed by MARS with six and four variables, while VARMA are the models that give the worst performance.

In the case of a forecast of up to 6 months ahead, the best performance according to RMSE value is also achieved by 6 variables MLP and SVMR models followed by the same models using only four variables. A remarkable change when compared with the forecast 1 month ahead is that the MARS model that includes 6 variables performs better than the ARIMA model. Finally, and as also happened with the forecasts 1 month ahead, the worst performance was shown by the VARMA models.

From our point of view, a remarkable fact is that the model performance in terms of RMSE in both 1- and 6-month ahead models is not only linked to the number of variables considered in it, but also to the kind of model selected. In other words, it is possible to find a model of only one variable (ARIMA) that performs better than others that include six variables in both 1- and 6-month ahead predictions (VARMA). Finally, the importance of a variable is very easy to assess with the help of a MARS model. The importance order found for the prediction of PM₁₀ was as follows: PM₁₀ value in the previous moments, followed by the previous measurements of CO, NO, O₃, SO₂ and NO₂.

The main limitation of this study is that although original data is taken each 15 min, forecasts are performed for average monthly values. The reason why average monthly values were forecasted is that the results obtained by the authors when daily or hourly forecasts were performed were not as stable as the average monthly values. This is due to the influence of the port traffic in the pollution area, which does not follow a fixed cycle like urban traffic. Another limitation to be overcome in future studies is that in order to improve the results obtained it would be of interest to introduce some meteorological variables such as temperature, humidity, pressure, sun radiation, rainfall and wind speed and direction.

Conclusions

The results obtained in this research allow us to say that it is possible to predict PM₁₀ concentration with the help of the value of this variable and the concentration of other pollutants by means of statistical and machine learning models. Also, another interesting issue is that as had already been found in previous studies,⁵⁶ the use of the concentration of other pollutants helps to obtain a more accurate prediction. In fact, the most accurate results were obtained for two kind of machine learning models, SVMR and MLP, when they made use of the values of the six available variables. The results obtained show how regression-based models like SVMR, MARS and MLP outperform univariate and multivariate time series-based models (ARIMA and VARMA). According to the findings of this paper and other previous ones²⁹, this is because the short-term relationships among pollutants are stronger than temporal relationships of PM₁₀ concentration values with itself and with other variables. In other words, although it is possible to find certain seasonal patterns in monthly average pollutant values, the relationship of PM₁₀ with the concentration of other pollutants is more important than the seasonal pattern.

Finally, this research affords the reader the opportunity to compare different machine learning and time series methodologies applied to the same data set to establish whether they are useful for PM₁₀ concentration forecasting. If the average monthly values of PM₁₀ from January to June 2018 are compared with those corresponding to the same months of the previous year, the RMSE result is 6.8557. This means that in forecasts 1 month ahead, MLP and SVM models of four and six variables and MARS of six variables outperform it. When forecasts are performed 6 months ahead MLP models of four and six variables and SVM of six variables outperform it. Although the proposed methodologies do not always outperform the mere use of the average values of PM₁₀ concentrations of the same months of the previous year, they are a useful complementary tool for planning and taking decisions in advance.

Author contributions

F.S.L. conceived the ideas, F.S.L. and P.J.G.N. designed the study and retrieved the information. F.S.L. and F.J.C.J. trained and validated the machine learning models. F.S.L., L.B. and E.G.G. wrote the draft of the manuscript. L.B. revised the manuscript.

Competing interests

Laura Bonavera acknowledges the PGC 2018 project PGC2018-101948-B-I00 (MINECO/FEDER) and PAPI-19-EMERG-11 (UNIOVI). The rest of authors declare no other competing financial interest.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.González-Marco D, Pau Sierra J, Fernández de Ybarra O, Sánchez-Arcilla A. González-Marco, D., Pau Sierra, J., Fernández de Ybarra, O. & Sánchez-Arcilla, A. Implications of long waves in harbour management. Ocean Coast. Manag. 2018;51:180–201. doi: 10.1016/j.ocecoaman.2007.04.001. [DOI] [Google Scholar]
2.World Health Organization. Effects of air pollution on children’s health and development: A review of the evidence. (2005).
3.Gauderman WJ, Avol E, Gilliland F, Vora H, Thomas D, Berhane K, et al. The effect of air pollution on lung development from 10 to 18 years of age. New Engl. J. Med. 2004;351(11):1057–1067. doi: 10.1056/NEJMoa040610. [DOI] [PubMed] [Google Scholar]
4.Wyler C, Braun-Fahrländer C, Künzli N, Schindler C, Ackermann-Liebrich U, Perruchoud AP. Exposure to motor vehicle traffic and allergic sensitization. Epidemiology. 2000;11(4):450–456. doi: 10.1097/00001648-200007000-00015. [DOI] [PubMed] [Google Scholar]
5.European Commission. Council Directive 1996/62/EC of 27 September 1996 on ambient air quality assessment and management. Official Journal of the European Communities, 55–63 (1996).
6.Ganguly R, Sharma D, Kumar P. Trend analysis of observational PM10 concentrations in Shimla city, India. Sustain. Cities Soc. 2019;51:101719. doi: 10.1016/j.scs.2019.101719. [DOI] [Google Scholar]
7.Grange SK, Salmond JA, Trompetter WJ, Davy PK, Ancelet T. Effect of atmospheric stability on the impact of domestic wood combustion to air quality of a small urban township in winter. Atmos. Environ. 2013;70:28–38. doi: 10.1016/j.atmosenv.2012.12.047. [DOI] [Google Scholar]
8.Yadav R, Sahu LK, Jaaffrey SNA, Beig G. Temporal variation of particulate matter (PM) and potential sources at an urban station of Udaipur in Western India. Aerosol. Air Qual. Res. 2014;14:1613–1629. doi: 10.4209/aaqr.2013.10.0310. [DOI] [Google Scholar]
9.Mueller D, Uibel S, Takemura M, Klingelhoefer D, Groneberg DA. Ships, ports and particulate air pollution—an analysis of recent studies. J. Occup. Med. Toxicol. 2011;5:6–31. doi: 10.1186/1745-6673-6-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Pandolfi, M., Gonzalez-Castanedo, Y., Alastuey, A., de la Rosa, J. D., Mantilla, E., de la Campa, A. S., Querol, X., Pey, J., Amato, F. & Moreno, T. Source apportionment of PM(10) and PM(2.5) at multiple sites in the strait of Gibraltar by PMF: impact of shipping emissions. Environ. Sci. Pollut. R. Int. 18(2), 260–269. doi: 10.1007/s11356–010–0373–4 (2011). [DOI] [PubMed]
11.Agrawal H, Eden R, Zhang X, Fine PM, Katzenstein A, Miller JW, Ospital J, Teffera S, Cocker DR. Primary particulate matter from ocean-going engines in the Southern California Air Basin. Environ. Sci. Technol. 2009;43:5398–5402. doi: 10.1021/es8035016. [DOI] [PubMed] [Google Scholar]
12.Deniz C, Kilic A, Civkaroglu G. Estimation of shipping emissions in Candarli Gulf, Turkey. Environ. Monit. Assess. 2010;17(1–4):219–228. doi: 10.1007/s10661-009-1273-2. [DOI] [PubMed] [Google Scholar]
13.Deniz C, Kilic A. Estimation and assessment of shipping emissions in the region of Ambarli Port, Turkey. Environ. Prog. Sustain. 2009;29(1):107–115. [Google Scholar]
14.Alastuey A, Moreno N, Querol X, Viana M, Artíñano B, Luaces JA, Basora J, Guerra A. Contribution of harbour activities to levels of particulate matter in a harbour area: Hada Project-Tarragona Spain. Atmos. Environ. 2007;41(30):6366–6378. doi: 10.1016/j.atmosenv.2007.03.015. [DOI] [Google Scholar]
15.Pérez N, Pey J, Reche C, Cortés J, Alastuey A, Querol X. Impact of harbour emissions on ambient PM10 and PM2.5 in Barcelona (Spain): evidences of secondary aerosol formation within the urban area. Sci. Total Environ. 2016;571:237–250. doi: 10.1016/j.scitotenv.2016.07.025. [DOI] [PubMed] [Google Scholar]
16.Shen J, Feng X, Zhuang K, Lin T, Zhang Y, Wang P. Vertical distribution of particulates within the near-surface layer of dry bulk port and influence mechanism: a case study in China. Sustainability. 2019;11(24):1–16. doi: 10.3390/su11247135. [DOI] [Google Scholar]
17.Manoli E, Chelioti-Chatzidimitriou A, Karageorgou K, Kouras A, Voutsa D, Samara C, Kampanos I. Polycyclic aromatic hydrocarbons and trace elements bounded to airborne PM10 in the harbor of Volos, Greece: Implications for the impact of harbor activities. Atmos. Environ. 2017;167:61–72. doi: 10.1016/j.atmosenv.2017.08.001. [DOI] [Google Scholar]
18.Žibert J, Pražnikar J. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations. Atmos. Environ. 2012;57:1–12. doi: 10.1016/j.atmosenv.2012.04.034. [DOI] [Google Scholar]
19.Healy RM, O’Connor IP, Hellebust S, Allanic A, Sodeau JR, Wenger JC. Characterisation of single particles from in-port ship emissions. Atmos. Environ. 2009;43:6408–6414. doi: 10.1016/j.atmosenv.2009.07.039. [DOI] [Google Scholar]
20.Meisner Rosen C. Businessmen against pollution in late nineteenth century Chicago. Bus. Hist. Rev. 1995;69(3):351–397. doi: 10.2307/3117337. [DOI] [Google Scholar]
21.Desalu A, Gould L, Schweppe F. Dynamic estimation of air pollution. IEEE Trans. Automat. Contr. 1974;19(6):904–910. doi: 10.1109/TAC.1974.1100742. [DOI] [Google Scholar]
22.Lamb RG, Neiburger M. An interim version of a generalized urban air pollution model. Atmos. Environ. 1971;5:239–264. doi: 10.1016/0004-6981(71)90093-X. [DOI] [Google Scholar]
23.Roadknight CM, Balls GR, Mills GE, Palmer-Brown D. Modeling complex environmental data. IEEE Trans. Neural Netw. 1997;8(4):852–862. doi: 10.1109/72.595883. [DOI] [PubMed] [Google Scholar]
24.Spellman G. An application of artificial neural networks to the prediction of surface ozone concentrations in the United Kingdom. Appl. Geogr. 1999;19(2):123–136. doi: 10.1016/S0143-6228(98)00039-3. [DOI] [Google Scholar]
25.Niska H, Hiltunen T, Karppinen A, Ruuskanen J, Kolehmainen M. Evolving the neural network model for forecasting air pollution time series. Eng. Appl. Artif. Intell. 2004;17(2):159–167. doi: 10.1016/j.engappai.2004.02.002. [DOI] [Google Scholar]
26.Cakmak S, Hebbern C, Vanos J, Crouse DL, Burnett R. Ozone exposure and cardiovascular-related mortality in the Canadian Census Health and Environment Cohort (CANCHEC) by spatial synoptic classification zone. Environ. Pollut. 2016;214:589–599. doi: 10.1016/j.envpol.2016.04.067. [DOI] [PubMed] [Google Scholar]
27.Govender P, Sivakumar V. Application of k-means and hierarchical clustering techniques for analysis of air pollution: a review (1980–2019) Atmos. Pollut. Res. 2020;11(1):40–56. doi: 10.1016/j.apr.2019.09.009. [DOI] [Google Scholar]
28.Liu BC, Binaykia A, Chang PC, Tiwari MK, Tsao CC. Urban air quality forecasting based on multi-dimensional collaborative support vector regression (SVR): a case study of Beijing–Tianjin–Shijiazhuang. PLoS ONE. 2017;12(7):1–17. doi: 10.1371/journal.pone.0179763. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.García Nieto PJ, Sánchez Lasheras F, García-Gonzalo E, de Cos Juez FJ. Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch. Env. Res. Risk A. 2018;32(11):3287–3298. doi: 10.1007/s00477-018-1565-6. [DOI] [Google Scholar]
30.Riesgo García MV, Krzemień A, del Campo M, García-Miranda CE, Sánchez Lasheras F. Rare earth elements price forecasting by means of transgenic time series developed with ARIMA models. Resour. Policy. 2018;59:95–102. doi: 10.1016/j.resourpol.2018.06.003. [DOI] [Google Scholar]
31.Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R . J. Stat. Softw. 2011;45:1–67. doi: 10.18637/jss.v045.i03. [DOI] [Google Scholar]
32.Ruey ST. Multivariate Time Series Analysis with R and Financial Applications. New York: Wiley; 2014. [Google Scholar]
33.Ordóñez C, Sánchez Lasheras F, Roca-Pardiñas J, de Cos Juez FJ. A hybrid ARIMA–SVM model for the study of the remaining useful life of aircraft engines. J. Comput. Appl. Math. 2019;346:184–191. doi: 10.1016/j.cam.2018.07.008. [DOI] [Google Scholar]
34.Peter JB, Davis RA. Introduction to Time Series and Forecasting. New York: Springer; 2002. [Google Scholar]
35.R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing (Vienna, Austria, 2019). https://www.R-project.org/.
36.Trapletti, A, & Hornik, K. tseries: Time Series Analysis and Computational Finance. R package version 0.10-47.
37.Ruey, S.T. & Wood, D. MTS: All-Purpose Toolkit for Analyzing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models. R package version 1.0. https://CRAN.R-project.org/package=MTS (2018).
38.Martin V, Hurn S, Harris D. Econometric Modelling with Time Series. Specification, Estimation and Testing. Cambridge: Cambridge University Press; 2013. [Google Scholar]
39.Akaike H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974;19:716–723. doi: 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]
40.Schwarz G. Estimating the dimension of a model. Ann. Stat. 1978;6:461–464. doi: 10.1214/aos/1176344136. [DOI] [Google Scholar]
41.Rosenblatt F. Principles of Neurodynamics. Washington: Spartan Books; 1962. [Google Scholar]
42.Kolmogorov AN. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR. 1957;114(5):953–956. [Google Scholar]
43.García-Nieto PJ, Martínez Torres J, de Cos Juez FJ, Sánchez Lasheras F. Using multivariate adaptive regression splines and multilayer perceptron networks to evaluate paper manufactured using Eucalyptus globulus. Appl. Math. Comput. 2012;219(2):755–763. [Google Scholar]
44.Fritsch, S., Guenther, F. & Wright, M.N. neuralnet: Training of Neural Networks. R package version 1.44.2. https://CRAN.R-project.org/package=neuralnet (2019).
45.Haykin S. Neural Networks: A Comprehensive Foundation. Upper Saddle River: Prentice Hall; 1998. [Google Scholar]
46.Vapnik V. The Nature of Statistical Learning Theory. Berlin: Springer; 2000. [Google Scholar]
47.Suárez Sánchez A, Riesgo Fernández P, Sánchez Lasheras F, de Cos Juez FJ, García Nieto PJ. Prediction of work-related accidents according to working conditions using support vector machines. Appl. Math. Comput. 2011;218(7):3539–3552. [Google Scholar]
48.Kuhn M, Johnson K. Applied Predictive Modeling. New York: Springer; 2013. [Google Scholar]
49.Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-2. https://CRAN.R-project.org/package=e1071 (2019).
50.Drucker H, Burges C, Kaufman L, Smola A, Vapnik V. Support Vector Regression Machines. Adv. Neural Inf. 1997;9:155–161. [Google Scholar]
51.Friedman JH. Multivariate adaptive regression splines. Ann. Stat. 1991;19(1):1–67. doi: 10.1214/aos/1176347963. [DOI] [PubMed] [Google Scholar]
52.Sánchez Lasheras F, García Nieto PJ, de Cos Juez F, Mayo Bayón R, González Suárez V. A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft engines. Sensors. 2015;15(3):7062–7083. doi: 10.3390/s150307062. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.de Andrés Suárez J, Lorca Fernández P, Sánchez Lasheras F. Bankruptcy forecasting: a hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS) Expert Syst. Appl. 2011;38(3):1866–1875. doi: 10.1016/j.eswa.2010.07.117. [DOI] [Google Scholar]
54.Milborrow, S. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's leaps wrapper. earth: Multivariate Adaptive Regression Splines. R package version 5.1.1. https://CRAN.R-project.org/package=earth (2019).
55.Put R, Xu QS, Massart DL, Vander Heyden Y. Multivariate adaptive regression splines (MARS) in chromatographic quantitative structure–retention relationship studies. J. Chromatogr. A. 2004;1055(1–2):11–19. doi: 10.1016/j.chroma.2004.07.112. [DOI] [PubMed] [Google Scholar]
56.García Nieto PJ, Sánchez Lasheras F, García-Gonzalo E, de Cos Juez FJ. PM10 concentration forecasting in the metropolitan area of Oviedo (Northern Spain) using models based on SVM, MLP, VARMA and ARIMA: a case study. Sci. Total Environ. 2018;621:753–761. doi: 10.1016/j.scitotenv.2017.11.291. [DOI] [PubMed] [Google Scholar]

[CR1] 1.González-Marco D, Pau Sierra J, Fernández de Ybarra O, Sánchez-Arcilla A. González-Marco, D., Pau Sierra, J., Fernández de Ybarra, O. & Sánchez-Arcilla, A. Implications of long waves in harbour management. Ocean Coast. Manag. 2018;51:180–201. doi: 10.1016/j.ocecoaman.2007.04.001. [DOI] [Google Scholar]

[CR2] 2.World Health Organization. Effects of air pollution on children’s health and development: A review of the evidence. (2005).

[CR3] 3.Gauderman WJ, Avol E, Gilliland F, Vora H, Thomas D, Berhane K, et al. The effect of air pollution on lung development from 10 to 18 years of age. New Engl. J. Med. 2004;351(11):1057–1067. doi: 10.1056/NEJMoa040610. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Wyler C, Braun-Fahrländer C, Künzli N, Schindler C, Ackermann-Liebrich U, Perruchoud AP. Exposure to motor vehicle traffic and allergic sensitization. Epidemiology. 2000;11(4):450–456. doi: 10.1097/00001648-200007000-00015. [DOI] [PubMed] [Google Scholar]

[CR5] 5.European Commission. Council Directive 1996/62/EC of 27 September 1996 on ambient air quality assessment and management. Official Journal of the European Communities, 55–63 (1996).

[CR6] 6.Ganguly R, Sharma D, Kumar P. Trend analysis of observational PM10 concentrations in Shimla city, India. Sustain. Cities Soc. 2019;51:101719. doi: 10.1016/j.scs.2019.101719. [DOI] [Google Scholar]

[CR7] 7.Grange SK, Salmond JA, Trompetter WJ, Davy PK, Ancelet T. Effect of atmospheric stability on the impact of domestic wood combustion to air quality of a small urban township in winter. Atmos. Environ. 2013;70:28–38. doi: 10.1016/j.atmosenv.2012.12.047. [DOI] [Google Scholar]

[CR8] 8.Yadav R, Sahu LK, Jaaffrey SNA, Beig G. Temporal variation of particulate matter (PM) and potential sources at an urban station of Udaipur in Western India. Aerosol. Air Qual. Res. 2014;14:1613–1629. doi: 10.4209/aaqr.2013.10.0310. [DOI] [Google Scholar]

[CR9] 9.Mueller D, Uibel S, Takemura M, Klingelhoefer D, Groneberg DA. Ships, ports and particulate air pollution—an analysis of recent studies. J. Occup. Med. Toxicol. 2011;5:6–31. doi: 10.1186/1745-6673-6-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Pandolfi, M., Gonzalez-Castanedo, Y., Alastuey, A., de la Rosa, J. D., Mantilla, E., de la Campa, A. S., Querol, X., Pey, J., Amato, F. & Moreno, T. Source apportionment of PM(10) and PM(2.5) at multiple sites in the strait of Gibraltar by PMF: impact of shipping emissions. Environ. Sci. Pollut. R. Int. 18(2), 260–269. doi: 10.1007/s11356–010–0373–4 (2011). [DOI] [PubMed]

[CR11] 11.Agrawal H, Eden R, Zhang X, Fine PM, Katzenstein A, Miller JW, Ospital J, Teffera S, Cocker DR. Primary particulate matter from ocean-going engines in the Southern California Air Basin. Environ. Sci. Technol. 2009;43:5398–5402. doi: 10.1021/es8035016. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Deniz C, Kilic A, Civkaroglu G. Estimation of shipping emissions in Candarli Gulf, Turkey. Environ. Monit. Assess. 2010;17(1–4):219–228. doi: 10.1007/s10661-009-1273-2. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Deniz C, Kilic A. Estimation and assessment of shipping emissions in the region of Ambarli Port, Turkey. Environ. Prog. Sustain. 2009;29(1):107–115. [Google Scholar]

[CR14] 14.Alastuey A, Moreno N, Querol X, Viana M, Artíñano B, Luaces JA, Basora J, Guerra A. Contribution of harbour activities to levels of particulate matter in a harbour area: Hada Project-Tarragona Spain. Atmos. Environ. 2007;41(30):6366–6378. doi: 10.1016/j.atmosenv.2007.03.015. [DOI] [Google Scholar]

[CR15] 15.Pérez N, Pey J, Reche C, Cortés J, Alastuey A, Querol X. Impact of harbour emissions on ambient PM10 and PM2.5 in Barcelona (Spain): evidences of secondary aerosol formation within the urban area. Sci. Total Environ. 2016;571:237–250. doi: 10.1016/j.scitotenv.2016.07.025. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Shen J, Feng X, Zhuang K, Lin T, Zhang Y, Wang P. Vertical distribution of particulates within the near-surface layer of dry bulk port and influence mechanism: a case study in China. Sustainability. 2019;11(24):1–16. doi: 10.3390/su11247135. [DOI] [Google Scholar]

[CR17] 17.Manoli E, Chelioti-Chatzidimitriou A, Karageorgou K, Kouras A, Voutsa D, Samara C, Kampanos I. Polycyclic aromatic hydrocarbons and trace elements bounded to airborne PM10 in the harbor of Volos, Greece: Implications for the impact of harbor activities. Atmos. Environ. 2017;167:61–72. doi: 10.1016/j.atmosenv.2017.08.001. [DOI] [Google Scholar]

[CR18] 18.Žibert J, Pražnikar J. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations. Atmos. Environ. 2012;57:1–12. doi: 10.1016/j.atmosenv.2012.04.034. [DOI] [Google Scholar]

[CR19] 19.Healy RM, O’Connor IP, Hellebust S, Allanic A, Sodeau JR, Wenger JC. Characterisation of single particles from in-port ship emissions. Atmos. Environ. 2009;43:6408–6414. doi: 10.1016/j.atmosenv.2009.07.039. [DOI] [Google Scholar]

[CR20] 20.Meisner Rosen C. Businessmen against pollution in late nineteenth century Chicago. Bus. Hist. Rev. 1995;69(3):351–397. doi: 10.2307/3117337. [DOI] [Google Scholar]

[CR21] 21.Desalu A, Gould L, Schweppe F. Dynamic estimation of air pollution. IEEE Trans. Automat. Contr. 1974;19(6):904–910. doi: 10.1109/TAC.1974.1100742. [DOI] [Google Scholar]

[CR22] 22.Lamb RG, Neiburger M. An interim version of a generalized urban air pollution model. Atmos. Environ. 1971;5:239–264. doi: 10.1016/0004-6981(71)90093-X. [DOI] [Google Scholar]

[CR23] 23.Roadknight CM, Balls GR, Mills GE, Palmer-Brown D. Modeling complex environmental data. IEEE Trans. Neural Netw. 1997;8(4):852–862. doi: 10.1109/72.595883. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Spellman G. An application of artificial neural networks to the prediction of surface ozone concentrations in the United Kingdom. Appl. Geogr. 1999;19(2):123–136. doi: 10.1016/S0143-6228(98)00039-3. [DOI] [Google Scholar]

[CR25] 25.Niska H, Hiltunen T, Karppinen A, Ruuskanen J, Kolehmainen M. Evolving the neural network model for forecasting air pollution time series. Eng. Appl. Artif. Intell. 2004;17(2):159–167. doi: 10.1016/j.engappai.2004.02.002. [DOI] [Google Scholar]

[CR26] 26.Cakmak S, Hebbern C, Vanos J, Crouse DL, Burnett R. Ozone exposure and cardiovascular-related mortality in the Canadian Census Health and Environment Cohort (CANCHEC) by spatial synoptic classification zone. Environ. Pollut. 2016;214:589–599. doi: 10.1016/j.envpol.2016.04.067. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Govender P, Sivakumar V. Application of k-means and hierarchical clustering techniques for analysis of air pollution: a review (1980–2019) Atmos. Pollut. Res. 2020;11(1):40–56. doi: 10.1016/j.apr.2019.09.009. [DOI] [Google Scholar]

[CR28] 28.Liu BC, Binaykia A, Chang PC, Tiwari MK, Tsao CC. Urban air quality forecasting based on multi-dimensional collaborative support vector regression (SVR): a case study of Beijing–Tianjin–Shijiazhuang. PLoS ONE. 2017;12(7):1–17. doi: 10.1371/journal.pone.0179763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.García Nieto PJ, Sánchez Lasheras F, García-Gonzalo E, de Cos Juez FJ. Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch. Env. Res. Risk A. 2018;32(11):3287–3298. doi: 10.1007/s00477-018-1565-6. [DOI] [Google Scholar]

[CR30] 30.Riesgo García MV, Krzemień A, del Campo M, García-Miranda CE, Sánchez Lasheras F. Rare earth elements price forecasting by means of transgenic time series developed with ARIMA models. Resour. Policy. 2018;59:95–102. doi: 10.1016/j.resourpol.2018.06.003. [DOI] [Google Scholar]

[CR31] 31.Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R . J. Stat. Softw. 2011;45:1–67. doi: 10.18637/jss.v045.i03. [DOI] [Google Scholar]

[CR32] 32.Ruey ST. Multivariate Time Series Analysis with R and Financial Applications. New York: Wiley; 2014. [Google Scholar]

[CR33] 33.Ordóñez C, Sánchez Lasheras F, Roca-Pardiñas J, de Cos Juez FJ. A hybrid ARIMA–SVM model for the study of the remaining useful life of aircraft engines. J. Comput. Appl. Math. 2019;346:184–191. doi: 10.1016/j.cam.2018.07.008. [DOI] [Google Scholar]

[CR34] 34.Peter JB, Davis RA. Introduction to Time Series and Forecasting. New York: Springer; 2002. [Google Scholar]

[CR35] 35.R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing (Vienna, Austria, 2019). https://www.R-project.org/.

[CR36] 36.Trapletti, A, & Hornik, K. tseries: Time Series Analysis and Computational Finance. R package version 0.10-47.

[CR37] 37.Ruey, S.T. & Wood, D. MTS: All-Purpose Toolkit for Analyzing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models. R package version 1.0. https://CRAN.R-project.org/package=MTS (2018).

[CR38] 38.Martin V, Hurn S, Harris D. Econometric Modelling with Time Series. Specification, Estimation and Testing. Cambridge: Cambridge University Press; 2013. [Google Scholar]

[CR39] 39.Akaike H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974;19:716–723. doi: 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]

[CR40] 40.Schwarz G. Estimating the dimension of a model. Ann. Stat. 1978;6:461–464. doi: 10.1214/aos/1176344136. [DOI] [Google Scholar]

[CR41] 41.Rosenblatt F. Principles of Neurodynamics. Washington: Spartan Books; 1962. [Google Scholar]

[CR42] 42.Kolmogorov AN. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR. 1957;114(5):953–956. [Google Scholar]

[CR43] 43.García-Nieto PJ, Martínez Torres J, de Cos Juez FJ, Sánchez Lasheras F. Using multivariate adaptive regression splines and multilayer perceptron networks to evaluate paper manufactured using Eucalyptus globulus. Appl. Math. Comput. 2012;219(2):755–763. [Google Scholar]

[CR44] 44.Fritsch, S., Guenther, F. & Wright, M.N. neuralnet: Training of Neural Networks. R package version 1.44.2. https://CRAN.R-project.org/package=neuralnet (2019).

[CR45] 45.Haykin S. Neural Networks: A Comprehensive Foundation. Upper Saddle River: Prentice Hall; 1998. [Google Scholar]

[CR46] 46.Vapnik V. The Nature of Statistical Learning Theory. Berlin: Springer; 2000. [Google Scholar]

[CR47] 47.Suárez Sánchez A, Riesgo Fernández P, Sánchez Lasheras F, de Cos Juez FJ, García Nieto PJ. Prediction of work-related accidents according to working conditions using support vector machines. Appl. Math. Comput. 2011;218(7):3539–3552. [Google Scholar]

[CR48] 48.Kuhn M, Johnson K. Applied Predictive Modeling. New York: Springer; 2013. [Google Scholar]

[CR49] 49.Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-2. https://CRAN.R-project.org/package=e1071 (2019).

[CR50] 50.Drucker H, Burges C, Kaufman L, Smola A, Vapnik V. Support Vector Regression Machines. Adv. Neural Inf. 1997;9:155–161. [Google Scholar]

[CR51] 51.Friedman JH. Multivariate adaptive regression splines. Ann. Stat. 1991;19(1):1–67. doi: 10.1214/aos/1176347963. [DOI] [PubMed] [Google Scholar]

[CR52] 52.Sánchez Lasheras F, García Nieto PJ, de Cos Juez F, Mayo Bayón R, González Suárez V. A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft engines. Sensors. 2015;15(3):7062–7083. doi: 10.3390/s150307062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.de Andrés Suárez J, Lorca Fernández P, Sánchez Lasheras F. Bankruptcy forecasting: a hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS) Expert Syst. Appl. 2011;38(3):1866–1875. doi: 10.1016/j.eswa.2010.07.117. [DOI] [Google Scholar]

[CR54] 54.Milborrow, S. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's leaps wrapper. earth: Multivariate Adaptive Regression Splines. R package version 5.1.1. https://CRAN.R-project.org/package=earth (2019).

[CR55] 55.Put R, Xu QS, Massart DL, Vander Heyden Y. Multivariate adaptive regression splines (MARS) in chromatographic quantitative structure–retention relationship studies. J. Chromatogr. A. 2004;1055(1–2):11–19. doi: 10.1016/j.chroma.2004.07.112. [DOI] [PubMed] [Google Scholar]

[CR56] 56.García Nieto PJ, Sánchez Lasheras F, García-Gonzalo E, de Cos Juez FJ. PM10 concentration forecasting in the metropolitan area of Oviedo (Northern Spain) using models based on SVM, MLP, VARMA and ARIMA: a case study. Sci. Total Environ. 2018;621:753–761. doi: 10.1016/j.scitotenv.2017.11.291. [DOI] [PubMed] [Google Scholar]

PERMALINK

Evolution and forecasting of PM10 concentration at the Port of Gijon (Spain)

Fernando Sánchez Lasheras

Paulino José García Nieto

Esperanza García Gonzalo

Laura Bonavera

Francisco Javier de Cos Juez

Abstract

Introduction

The town of Gijón and its Port

Figure 1.

Pollution and particulate matter studies

Use of machine learning techniques to forecast pollutant concentrations

The database

Table 1.

Materials and methods

Autoregressive integrated moving average (ARIMA)

Vector autoregressive moving-average (VARMA)

Multilayer perceptron neural networks (MLP)

Support vector machines as regressor (SVMR)

Multivariate adaptive regression splines (MARS)

Results and discussion

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Conclusions

Author contributions

Competing interests

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Evolution and forecasting of PM10 concentration at the Port of Gijon (Spain)

Fernando Sánchez Lasheras

Paulino José García Nieto

Esperanza García Gonzalo

Laura Bonavera

Francisco Javier de Cos Juez

Abstract

Introduction

The town of Gijón and its Port

Figure 1.

Pollution and particulate matter studies

Use of machine learning techniques to forecast pollutant concentrations

The database

Table 1.

Materials and methods

Autoregressive integrated moving average (ARIMA)

Vector autoregressive moving-average (VARMA)

Multilayer perceptron neural networks (MLP)

Support vector machines as regressor (SVMR)

Multivariate adaptive regression splines (MARS)

Results and discussion

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Conclusions

Author contributions

Competing interests

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases