Abstract
A simple data-based advection-reaction (reactive transport) model applicable to both rivers and aquifers monitoring networks is proposed. It is built on (a) available monitoring data, and (b) graph-theoretical concepts, specifically making use of the Laplacian matrix to capture the network topology and the advection process. The method yields useful information regarding the dynamic spatial behavior of the variables monitored, expressed in terms of quantitative parameters like characteristic length, entropy, first-order decay constants, synchronization between sites, and the external inputs/outputs to the system.
The model was tested in an unconfined shallow aquifer located in the lower Besòs River (Spain), in which 37 pharmaceutical compounds were monitored at 7 sites, alongside two campaigns (February and May 2021). Characteristic lengths were, on average, of the same order (24.5 m) as the mean distance between consecutive monitoring sites (33.6 m), thus reflecting an adequate monitoring network design. From an estimated mean advection velocity (0.24 m·h−1), first-order decay constants were calculated for each compound and campaign, with mean values of 0.025 h−1 (February) and 0.005 h−1 (May). Whereas entropy was generally slightly larger values in February than in May (mean values of 1.02 and 0.9 entropy units respectively), synchronization showed the opposite trend (mean values of 62.4% and 68.8% respectively). The input/output profiles were generally site-dependent, regardless of the compound, and campaign considered. • A new advection-reaction modeling approach directly based on experimental data obtained from monitoring campaigns together with the network topology is proposed. • The method yields new quantitative information regarding the dynamic behavior of the variables monitored, useful for both research and management purposes.
Keywords: Monitoring network, Emerging contaminants modelling, Graph theory, Reactive transport, Ground/surface water
Method name: Data-driven advection-reaction modeling based on chemical monitoring data and network topology
Graphical abstract
List of Notations and Abbreviations
- A
Advection matrix
- D
Degree matrix
- L
Laplacian matrix
- x
n-dimension vector (x1,x2,…,xi,…,xn)T of measurements of variable x (n: number of sites)
- ẋ
n-dimension vector of velocities dx/dt of variable x (n: number of sites)
- xi
variable x measured at site i
- v
mean advection velocity (m·h−1)
- ℓ
characteristic length (m)
- ρ
Rayleigh quotient of x respect matrix L. It is equal to k/v and 1/ℓ
- k
first-order decay kinetic constant (h−1)
- S
Entropy
- λi
ith eigenvalue of L
- λ1
first eigenvalue of L. It is equal to 0
- ui
ith eigenvector of L
- u1
first eigenvector of L. It is equal to (1,1,…,1)T
- ci
projection of x on eigenvector ui . It is equal to xTui
- c12
Synchronization contribution (%)
- ε
n-dimension vector of external inputs/outputs to the system
- rij
distance between sites i and j
- LOD
limit of detection
- OLS
ordinary least squares
Specifications table
| Subject area | Environmental Science |
| More specific subject area | Surface and ground freshwater bodies |
| Name of your method | Data-driven advection-reaction modeling based on chemical monitoring data and network topology |
| Name and reference of original method | n.a. |
| Resource availability | n.a. |
Related research article
F. Labad, A. Ginebreda, L, R. Criollo, E. Vázquez-Suñé, S. Pérez, A. Jurado. Occurrence, data-based modelling and risk assessment of emerging contaminants in an alluvial aquifer polluted by river recharge. Environ. Pollut. 316 (2023) 120504.
Method details
Introduction
Water bodies –either ground or surface– extend more or less continuously through space and time under the influence of their catchment area. Contrastingly, our knowledge of the waterbody's qualitative status typically relies on discrete spatial and temporal observations of a set of physical, chemical, or biological parameters, organized under what is commonly known as a “monitoring network”. Monitoring networks are typically constituted by several sites deployed throughout the surface or ground catchment area which are sampled at a certain time frequency. This is a current practice used either for research or management purposes and has given rise to large databases. Indeed, monitoring networks are a key aspect of the implementation of environmental regulations like the European Water Framework Directive (Directive 2000/60/EC) [1]. The accuracy of the “picture” obtained depends on both the spatial (number of sites) and temporal (frequency) “resolution” of the network used, which is often constrained by the economic cost, of the whole sampling and analytical process. This is particularly relevant for monitoring networks devoted to routine purposes, becoming its optimization a relevant management issue.
To cope with the inherent limitations of discrete monitoring networks, dynamic modeling of chemicals’ fate and transport processes was raised as a complementary alternative [2]. Modeling allows predicting the spatial-temporal variable's profile at any resolution at an affordable computing cost as compared to costly monitoring campaigns. However, it involves high uncertainty in the parametrization, and requires, in the end, experimental validation as well. Both monitoring and modeling have their respective pros and cons and must be regarded as complementary [3]. In this methodological contribution, we have sought to balance the two approaches. Therefore, the objective of the present study was to explore new possibilities for the exploitation of experimental data available from waterbodies’ monitoring networks and their interpretation in the light of a simple steady-state reactive-transport (or advection-reaction) model that can be readily derived therefrom. To do so, we make use of simple graph-theoretical concepts, which take into consideration the topology of the waterbody monitored, and more specifically that of the monitoring network deployed for such purpose. This is specifically done through the use of the Laplacian matrix, which captures the network topology and the interaction between adjacent sites through the advection process. The use of spatial (topological) relationships is not new, notably, the well-known Moran [4] and Geary [5] spatial correlation coefficients, as well as, their recent updated developments [6,7] have been widely used in many domains, including the aquatic environment [8], [9], [10], [11], [12], [13], [14]. Our approach allows obtaining relevant quantitative information relative to the dynamics of the measured contaminants such as the relative importance of the two main processes involved, i.e., the conservative transport across the aquifer vs. the local depletion, which gives a glimpse into their persistence. Noteworthy, the methodology allows deriving quantities like entropy, or system synchronization, that offer new insights into the aquatic systems through the variables measured in a monitoring network. Indeed, it provides relevant criteria useful for setting the spatial design of the monitoring network sites that can be useful for management purposes. The model described below can be applied to any waterbody, whose monitoring sites are linked by a known topology, such as aquifers [15,16] and rivers [17].
Method details
Network theory concepts
The studied aquifer can be described as a set of spatially distributed connected sites (nodes) in which we carry out measurements of a variable x (say, the concentration of a contaminant) at a time t, which constitutes the monitoring network. Its structure is conveniently described by a graph G (E, V), where V = {1, …, n} denotes the set of n nodes representing the monitoring sites, and E the set of edges between nodes. The network structure of nodes and links is captured by the adjacency matrix A. It is an n × n square matrix defined as:
We define the degree di of node i (i = 1, ..., n) as (Eq. 1):
| (1) |
where denotes the set of nodes adjacent (connected) to node i. The weighted degree matrix D can be thus defined as
The so-called graph Laplacian matrix (https://en.wikipedia.org/wiki/Laplacian_matrix) is thus defined as (Eq. 2):
| (2) |
The Laplacian matrix L is real, symmetric, and semi-definite positive, with all the eigenvalues nonnegative, being always the first eigenvalue λ1 = 0 the smallest one, and its associated eigenvector . This vector corresponds to the fully synchronized state, in which the variable studied x has the same value in all the nodes (i.e., x1 = x2=… xi …= xn-1= xn).
The network advection-reaction model
The model furnishes a simple and general description that captures the network dynamics. Briefly, let the state of node i be xi so that x: (x1, ...., xn) T is the n-dimensional state vector of measurements of variable x for all the nodes. It is assumed that the time evolution of xi can be described by the following simple kinetic equation (Eq. 3):
| (3) |
where the first term on the right side of the equation reflects a first-order decay process with a rate constant k, and the second term captures the advection process between the sites connected, characterized by a mean advection velocity (see comment below). The terms δi are local external inputs or sinks at each site i. We assume that for every compound x, the rate constants k and the advection coefficients are equal throughout all the space monitored (i.e., alongside the measurement sites), though they may change at each sampling time. These are common assumptions in many modeling approaches.
Written in compact vector form (Eq. 3) yields (Eq. 4):
| (4) |
Assuming that the measurements correspond to a stationary state (ẋ = 0), rearranging the above equation, and dividing both sides by k, it becomes (Eq. 5):
| (5) |
Numerically, the parameter can be assimilated to the slope of the regression line of Lx over x with the intercept set equal to 0. Its calculation can be readily done by ordinary least squares (OLS), and the vector ε = δ/k of errors (ε = x ‒ ·Lx) (Eq. 6) captures the local input/output variations (under the OLS assumptions).
It can be shown that the inverse of the OLS calculated slope () is equal to (Eq. 7):
| (7) |
Noting that the right-hand expression in the above equation is the Rayleigh quotient (https://en.wikipedia.org/wiki/Rayleigh_quotient) of the Laplacian matrix L, for any vector x, it is bounded between its minimum and maximum eigenvalues, so that
Furthermore, expanding x in terms of the orthonormalized eigenvectors u of L (graph spectral decomposition of L), and considering that ci = uiTx we have the following expression that quantifies the contribution of the different eigenstates λi to ρ (Eq. 8):
| (8) |
with . This allows defining an entropy S (Eq. 9):
| (9) |
Entropy S provides an insight into how the ρ is allocated among the different eigenstates. In turn, the terms capture (Eq. 8) the respective contribution of each eigenstate i. For our purposes, the first one is particularly relevant, namely, c12 the eigenstate associated with the first eigenvalue λ1 = 0, which quantifies the weight of the synchronized state (note, however, that the contribution of the synchronized state to ρ is zero since λ1 = 0).
Definition of the “weighted” Laplacian for practical use
Without loss of generality, the foregoing definitions of the Adjacency and Laplacian matrices can be extended to “weighted” analogs, that capture better real problems [4]. Here, the weight associated with the edge (i, j) of the aquifer weighted adjacency matrix elements was set equal to the inverse of the distance rij between connected sites i and j: (dimension L−1). Defined in this way, it is worth noting that the terms (xi – xj)/rij in Eq. 1 can be regarded as the discrete counterparts of a gradient . Hence, the term has the dimension L and can be interpreted as a characteristic length (referred to hereafter as ℓ).
Case Study
The above method can be applied to any surface or ground freshwater body providing that its topology i.e., the location of the sampling points and their interconnections, is well defined. As an example, it is illustrated with data of pharmaceutical compounds monitored in an unconfined shallow alluvial aquifer, located in the lower part of the Besòs River Delta (Barcelona, NE Spain) (Fig 1) [15,16]. This aquifer is hydraulically connected to a polluted river (Besòs River) that is the major recharge source and controls the chemical characteristics of the groundwater. The concentration values of the pharmaceutical compounds (37) were measured at 7 sites (Fig 1) in February and May 2021 (Table S1, Supplementary material). Values below the limit of detection (<LOD) were considered equal to 0 for calculation purposes.
Fig. 1.
Overview of the studied aquifer and the monitoring network showing sampling sites and topology (distances between adjacent sites are in m).
Using the foregoing outlined method we calculated for each of the compounds measured and sampling campaign: (a) the characteristic length; (b) the first-order decay kinetic constant; (c) entropy, (d) the percentage of synchronization, and (e) local input/output from external sources/sinks. Though these quantitative indicators are not fully independent, they furnish somewhat complementary information useful for different interpretation purposes. Results are shown in Figs 2A-B and 3, and are briefly discussed below. A more detailed discussion of the present example can be found in [15].
Fig. 2.
Values distribution per compound along the 2 sampling campaigns monitored (February and May 2021): characteristic length (m); first-order decay constant k (h−1); entropy, and the contribution of the synchronized state (%). (A) boxplots showing the values distribution of the whole set of compounds per campaign; (B) Values per single compound.
Characteristic length
Characteristic length () may be interpreted as a measure of the relative relevance of the two processes involved, i.e. advection (between neighbor nodes) and decay (local at each node), and provides a quantitative measure of the distance through which the advection process is effective with respect the local decay. In general, values in both campaigns were similar for all the compounds, except for alprazolam and lamotrigine-N2-oxide, whose values in February were unusually high. Considering both campaigns, on average, the characteristic lengths of the whole compound set was 24.7 m with a median of 11 m. These values are of the same order as the distance between adjacent monitoring sites (mean 33.6 m; max: 68.4 m; min: 0.8 m; median 27.4 m), meaning that the monitoring network design fits reasonably well the requirements.
First-order decay kinetic constant
From the definition of characteristic length given above as the ratio (), if the numerator (mean advection velocity ) is known, it is possible to estimate the value of k, the first-order decay constant as . Considering that the total distance traveled was 200 m and a previously known residence time of ca. 35 days, a mean advection velocity v of 0.24 m·h−1 was estimated. Using this value, k (h−1) was obtained for each compound and campaign. In general k values were larger in May than in February (average values 0.05 and 0.025 h−1 respectively) (Fig. 2A).
Entropy
Entropy captures the system ‘complexity’ offering a quantitative measure of how the different eigenstates of the Laplacian matrix contribute to the description of the system Eqs. 8 and (9). Entropy takes its maximum value when all the states are equally allocated. High entropy values thus reflect a heterogenous hydrological environment and would be indicative of a ‘fragmented’ situation, as happens under water scarcity, typical of Mediterranean basins. In the present case study, with few exceptions (i.e., sulfathiazole, sulfamethazine, metoprolol, furosemide, and acetaminophen) entropy values for the different compounds monitored appear systematically lower in May than in February, though the mean values of the whole set of compounds are similar (1.02 and 0.9 entropy units respectively) (Fig. 2A).
Synchronization
Specifically, the contribution of the synchronized state (i.e., equal values of x in all nodes) vs. higher states is relevant, considering that a fully synchronized state would have ρ = 0 (Eq. 7), and thus an infinite characteristic length. In that sense, as one may expect, synchronization was aligned with characteristic length. Synchronization was slightly higher in May than in February, with mean values of 68.8% and 62.4% respectively (Fig. 2A). Concerning the specific compounds, synchronization ranged from ca. 98% to 15%, with alprazolam, carbamazepine, and desamino-oxo-lamotrigine showing the highest values (>90%).
Local input/output from external sources
Relevant information regarding the input/output pollution load throughout the aquifer can be obtained from the vector ε = δ/k [concentration dimension] (Eq. 6), whose components’ signs provide insight into each sampling site's behavior, indicating if it is a net receiver (positive) or a sink (negative) of pollution. Fig 3 shows as representative examples, two compounds, namely carbamazepine and diclofenac, both characterized by different behavior regarding their persistence along the water flow reflected in low and medium-high first-order decay constants respectively (Fig. 2B). Regardless of the compound, and campaign considered, the input/output profiles per site were similar, being SAP-2b the only local net output site (negative value), and the remaining acting as input sites (positive values), notably site SAP-1.
Fig. 3.
Net input/output (expressed in ng·L−1·h−1) of carbamazepine and diclofenac along the 2 sampling campaigns (February and May) at each sampling site (note that the y-axis has different scales).
Final Comments
In the present contribution, a data-based advection-reaction (reactive transport) model is described, applicable to aquifers and river monitoring networks with known topology. The methodology is directly built on (a) available monitoring data, and (b) graph-theoretical concepts, specifically making use of the Laplacian matrix to capture the network topology and the advection process taking place between adjacent sites. The method yields useful information regarding the dynamic spatial behavior of the variables monitored, expressed in terms of quantitative parameters such as the distance of influence of the neighbor sites (quantified as a characteristic length), the external inputs/outputs, the relative contribution of the different network modes or states (quantified as an entropy), and that of the fully synchronized state (i.e., state in which the concentration of the measured variable is equal for all sites). Specifically, the latter two which were derived from the Laplacian matrix spectral analysis offer new interpretations in the hydrological context. Finally, the methodology provides criteria useful for the assessment (design and optimization) of monitoring networks that may be relevant for management use.
Method Limitations
The main limitations of our model are related to the use of some approximations, which are commonly found in many modeling approaches. These assumptions include the steady-state, as well as, the equal value of the local first-order decay constant k, and the advection velocity for all the sites monitored. A further limitation of the model described relies on the fact that, in its present form, it is univariate, meaning that every variable measured has to be separately modeled, and this can be tedious when many variables (e.g, many measured contaminants) need to be treated at a time.
Future work
Even though the method and results presented in this study were specifically concerned with pharmaceuticals in groundwater, they can be easily extended as well to (a) other water body monitoring networks with known topology (i.e, rivers [17]); (b) other site-measured variables such as nutrients, inorganic constituents, heavy metals, microplastics, or even biological (microbiological and ecological) parameters, and (c) other environmental matrixes, such as solid-transport in rivers. Finally, a further research line aiming at extending the method from univariate to multivariate is envisaged to deal with many simultaneously measured variables. To do so, a previous data treatment step using chemometric multivariate methods (i.e., Principal Component Analysis, Multivariate Curve Resolution, etc.) is envisaged, so that new composite variables (linear combinations of single variables) explaining most of the variance can be advantageously used instead of many individual ones. Further work in these research lines is in progress.
Ethics statements
NA.
CRediT authorship contribution statement
Antoni Ginebreda: Conceptualization, Methodology, Writing – review & editing. Anna Jurado: Data curation, Writing – review & editing. Estanislao Pujades: Data curation, Writing – review & editing. Damià Barceló: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research has been supported through the grant CEX2018-000794-S funded by MCIN/AEI/ 10.13039/501100011033.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.mex.2022.101948.
Appendix. Supplementary materials
Data availability
Data will be made available on request.
References
- 1.European Commission Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 establishing a framework for community action in the field of water policy. Off. J. Eur. Communities. 2000 2000. [Google Scholar]
- 2.Carrera J., Saaltink M.W., Soler-Sagarra J., Wang J., Valhondo C. Reactive Transport: A Review of Basic Concepts with Emphasis on Biochemical Processes. Energies. 2022;15:925. doi: 10.3390/en15030925. [DOI] [Google Scholar]
- 3.Johnson AC, Ternes T, Williams RJ, Sumpter JP. Assessing the concentrations of polar organic microcontaminants from point sources in the aquatic environment: measure or model? Environ. Sci. Technol. 2008;42:5390–5399. doi: 10.1021/es703091r. [DOI] [PubMed] [Google Scholar]
- 4.Moran PAP. The interpretation of statistical maps. J. R. Stat. Soc. Ser. B. 1948;37(2):243–251. [Google Scholar]
- 5.Geary RC. The contiguity ratio and statistical mapping. Inc. Stat. 1954;5(3):115–145. doi: 10.2307/2986645. [DOI] [Google Scholar]
- 6.Chen Y. New Approaches for Calculating Moran's Index of Spatial Autocorrelation. PLoS ONE. 2013;8(7):e68336. doi: 10.1371/journal.pone.0068336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yamada H. Geary's c and Spectral Graph Theory. Mathematics. 2021;9:2465. doi: 10.3390/math9192465. [DOI] [Google Scholar]
- 8.Ginebreda A, Sabater-Liesa L, Rico A, Focks A, Barceló D. Reconciling monitoring and modeling: An appraisal of river monitoring networks based on a spatial autocorrelation approach – emerging pollutants in the Danube River as a case study. Sci. Total Environ. 2018;618:323–335. doi: 10.1016/j.scitotenv.2017.11.020. C. [DOI] [PubMed] [Google Scholar]
- 9.Ort C, Hollender J, Schaerer M, Siegrist H. Model-Based Evaluation of Reduction Strategies for Micropollutants from Wastewater Treatment Plants in Complex River Networks. Environ. Sci. Technol. 2009;43:3214–3220. doi: 10.1021/es802286v. [DOI] [PubMed] [Google Scholar]
- 10.Sabater-Liesa L, Ginebreda A, Barceló D. Shifts of environmental and phytoplankton variables in a regulated river: A spatial driven analysis. Sci. Total Environ. 2018;642:968–978. doi: 10.1016/j.scitotenv.2018.06.096. [DOI] [PubMed] [Google Scholar]
- 11.Mainali J, Chang H, Chun Y. A Review of Spatial Statistical Approaches to Modeling Water Quality. Prog. Phys. Geogr. Earth Environ. 2019;43:801–826. doi: 10.1177/0309133319852003. [DOI] [Google Scholar]
- 12.Rashid A, Amin M, Li Y, Ashfaq M, Zeng Q, Hu A, Li S, Sun Q. Reconciliation of Spatiotemporal Influences on Two-Dimensional Distribution and Fate of Emerging Contaminants in a Subtropical River. EST Water. 2021;1:2305–2317. doi: 10.1021/acsestwater.1c00153. [DOI] [Google Scholar]
- 13.Sebestyén V, Czvetkó T, Abonyi J. Network-Based Topological Exploration of the Impact of Pollution Sources on Surface Water Bodies. Front. Environ. Sci. 2021;9 doi: 10.3389/fenvs.2021.723997. [DOI] [Google Scholar]
- 14.Zhang Y, Rashid A, Guo S, Jing Y, Zeng Q, Li Y, Adyari B, Yang J, L.Tang L, Chang-Ping Yu, Sun Q. Spatial autocorrelation and temporal variation of contaminants of emerging concern in a typical urbanizing river. Water Research. 2022;212 doi: 10.1016/j.watres.2022.118120. [DOI] [PubMed] [Google Scholar]
- 15.Labad F., Ginebreda A., Criollo L.R., Vázquez-Suñé E., Pérez S., Jurado A. Occurrence, data-based modelling, and risk assessment of emerging contaminants in an alluvial aquifer polluted by river recharge. Environ. Pollut. 2023;316 doi: 10.1016/j.envpol.2022.120504. [DOI] [PubMed] [Google Scholar]
- 16.A. Jurado, F. Labad, L. Scheiber, R. Criollo, S. Pérez, A. Ginebreda. Occurrence and fate of pharmaceuticals in groundwater. EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7595. doi: 10.5194/egusphere-egu22-7595. [DOI]
- 17.Ginebreda A, Barceló D. Data-based interpretation of emerging contaminants occurrence in rivers using a simple advection-reaction model. Water Emerg. Contam. Nanoplast. 2022;1:12. doi: 10.20517/wecn.2022.07. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data will be made available on request.




