Skip to main content
Water Research X logoLink to Water Research X
. 2024 Nov 22;26:100280. doi: 10.1016/j.wroa.2024.100280

Leak detection and localization in water distribution systems via multilayer networks

Daniel Barros a,, Ariele Zanfei b, Andrea Menapace c, Gustavo Meirelles d, Manuel Herrera e, Bruno Brentan b
PMCID: PMC11647635  PMID: 39687507

Highlights

  • Multilayer network analysis applied for leak detection and localization.

  • Detecting leaks by creating a temporal graph based on pressure data and vertex classification with Page ranking.

  • Leak location based on monitored and simulated data similarity behavior via Dynamic Time Warping algorithm.

Keywords: Water distribution networks, Graph theory, Leak detection, Leak localization

Abstract

The continuous increase of water distribution networks (WDNs) in size and complexity poses significant management challenges, including a high risk of failures. Due to the intrinsic interconnected feature of water flow, including losses, this study proposes a methodology based on graph correlation and multilayer network analysis for leak detection and localization in WDNs with multiple components (infrastructure, control devices, hydraulic sensors). The detection process involves correlating monitored data to create a temporal graph and classify vertices. The classification values are then analyzed by the z-score and interquartile range algorithms to detect anomalies. The localization process uses a multi-graph approach that combines sensor data and network topology to determine the sensor coverage area. The Dynamic Time Warping algorithm calculates the similarity between monitored and simulated leak data, identifying likely leak locations. The results demonstrate the methodology’s effectiveness, detecting anomalies 15 minutes after the start of the leak and locating them within a 50-meter range from the actual location of the leak. Furthermore, the research highlights the advantages of using a method based on multilayer networks, which offers insights into leak location, sensor coverage, and reduction of the network’s sample space. Furthermore, the approach presents a proposal to reduce exhaustive hydraulic simulations.

1. Introduction

Water distribution networks (WDNs), essential for supplying water to urban areas, are gradually becoming larger and more complex, making management challenging and increasing the likelihood of failures. It is worth noting that globally, an estimated 126 billion cubic meters of unbilled water are lost yearly, representing significant financial and water resources losses (Liemberger and Wyatt, 2019). For instance, a survey conducted by the Brazilian National Sanitation Information System showed that approximately 38% of treated water is lost due to leaks, measurement errors, and frauds (Oliveira et al., 2020). These estimates highlight the need to implement and develop measures to mitigate these losses, focusing on efficient and accurate approaches.

Strategies to reduce losses, especially those related to leaks, have been subject to constant development (De Vries, Groenestein, Schröder, Hoogmoed, Sukkel, Koerkamp, De Boer, 2015, Rajabi, Komeilian, Wan, Farmani, 2023, Vairavamoorthy, Lumbers, 1998). Conventional approaches often involve the use of acoustic devices to detect the noises associated with leaks (Hunaidi et al., 2004). However, the effectiveness of these methods has been increasingly compromised due to the presence of underground utilities, such as gas, electricity, and internet cables, as well as the need for technical teams to physically inspect the networks, making the process more time-consuming and costly (Sagnard et al., 2016). Therefore, the automation of leak detection and localization processes has become a frequent research topic. In this context, hydraulic data acquired by sensors are analyzed using mathematical and statistical approaches to water demand prediction (Pu et al., 2024), and to identify anomalies and potential leak locations (Choudhary et al., 2021).

Monitored pressure and flow data are applied in different leak detection and localization approaches. In particular, genetic algorithms and artificial intelligence have gained attention in recent decades (Liu, Ma, Li, Tie, Zhang, Gao, 2019, Mashhadi, Shahrour, Attoue, El Khattabi, Aljer, 2021). Perez et al. (2009) use pressure data, a calibrated network model, and a genetic algorithm to identify leaks based on the discrepancy between the monitored and simulated data. Meanwhile, Romano et al. (2010) use pressure and flow data to detect leaks in real-time, employing Artificial Neural Network (ANN) techniques to predict the values of the monitored data and subsequently analyze the difference between the observed and predicted data. However, these approaches often require extensive data for training and validation, which may not be readily available for all WDNs. Furthermore, they may be sensitive to noise and uncertainties in the monitored data, potentially leading to inaccurate leak detection and localization.

The results of machine learning and optimization algorithms applied to leak detection and localization can bring a set of uncertainties not only associated with the measurements but also with the representation of errors related to the model conceptualization and approximation errors related to early stopping on the optimization process. To overcome those limitations, the application of methods purely based on monitoring data processing is a growing trend. Usually, those methods aim to identify failures and represent the network based on mathematical abstractions (Kirstein et al., 2019). Kaghazchi et al. (2021), for example, model a WDN for irrigation using Hybrid Bayesian Networks for hydraulic simulations and operational performance evaluation. Wu et al. (2021) represent a WDN through game theory in order to consider the water network’s operating characteristics and seek to minimize the worst-case disruption impacts. Yu et al. (2023) model a WDN as a graph to evaluate the resilience of networks through the individual importance of nodes and the proportion of indispensable nodes.

Graph theory has been used in studies focused on WDN as a method for representing and analyzing the relationships among infrastructure assets. Graph theory is a branch of mathematics that explores connections between objects and plays an important role in network analysis in multiple research domains (Beeler and Beeler, 2015). A graph contains a set of vertices that represent objects and these vertices are connected by edges indicating the relationship between the objects. The relationship between vertices can embed different features, such as physical (Munikoti et al., 2021), temporal (Shinkuma et al., 2019), and similarity on their relationships (Xu et al., 2019), among others. Edge weights are, then, responsible for representing the intensity of relationships between vertices. In a water distribution system analysis context, graph theory usually has been applied to better understand the behavior of water systems with little hydraulic information (Giudicianni et al., 2020). In this field, the vertices represent the nodes (demand, tanks, and reservoirs) of the networks, and the edges represent the pipes, pumps, and valves, while the weight follows different approaches, such as using the flow rate, diameters, and length of pipes (Sitzenfrei, 2021, Tzatchkov, Alcocer-Yamanaka, Bourguett Ortíz, 2008).

The application of graph theory allows the evaluation of WDN through approaches linked to complex network analysis, such as, for example, identifying critical vertices through centrality metrics (Agathokleous et al., 2017) or evaluating the relationship between vertices to detect leaks (Barros et al., 2023b). Furthermore, this theory allows the representation of correlations in monitored data (Kalofolias, 2016) and in the networks themselves as graphs (Giudicianni, Herrera, Di Nardo, Oliva, Scala, 2021, Sitzenfrei, 2021). Recently, some works in the literature have focused on applied graph theory methodologies for leak detection. For instance, this is the case of Shekofteh et al. (2020), who employed the Girvan-Newman algorithm for graph partitioning based on edge weights. However, a single-layer network representation often fails to add into the model the relationship between physical infrastructure and monitored data. This limitation can affect the accuracy of leak detection processes, especially considering changes in demand or pressure.

Graph-based methods are increasingly being used to manage real-world networks and minimize leaks across different sectors. For instance, Zhang et al. (2023) developed a technique to detect natural gas leaks in complex pipeline systems using a deep probabilistic graph neural network. This approach combines variational Bayesian inference with an attention-based graph neural network, allowing them to identify leaks by observing changes in the dependency weights between sensors. Similarly, Zanfei et al. (2022) introduced a new method for detecting ruptures in water distribution networks using graph convolutional neural networks. Their model achieves high accuracy and is capable of real-time rupture detection. Guan et al. (2024) developed a technique to identify anomalies in business processes using graph neural networks, specifically graph attention networks. They transform each sequence of events into multiple graphs, with each event attribute represented by its own graph. These graphs are built from a global graph of the entire event log. In monitored WDN, the representation of the network can consider both the physical infrastructure network and the correlation network between the monitored data. This implies the existence of two distinct graph representations sharing some of the vertices. This situation opens possibilities for the application of approaches related to multilayer networks or multi-graphs, which represent graphs sharing vertices that consider multiple types of relationships, and each relationship between vertices results in a new perspective or aspect of the graph (Liu et al., 2022). In a general context of network management, Herrera et al. (2023) explored performance indicators of a communication network as layers in a multi-graph, allowing the analysis and evaluation of network performance in different aspects. Additionally, the authors highlighted how inter-layer information can be helpful in monitoring, control, and problem classification within the network.

Multilayer networks are increasingly used to represent complex systems with diverse relationships, going beyond simple one-to-one interactions. One example is the work of Stahl et al. (2019), planning the trajectory of racing vehicles where the layers are related to the actions to be taken, the costs of taking different paths, and the speed ratio between competitors in the race. This approach allows for the simultaneous consideration of multiple factors and their interdependencies, mapping the relationships between layers (Bredereck et al., 2019). Although this approach has been widely explored in other research areas, the use of multilayer networks to analyze water distribution systems is still emerging. For example, it is possible to explore the relationship between monitored data and infrastructure system relationships, or to explore subgraphs by relating sensors to their coverage areas as layers in a multigraph. Consequently, adopting a multigraph representation offers a robust and flexible approach for classical water engineering challenges such as leak detection and localization.

This paper proposes the use of multilayer networks, or multi-graphs, to approach the complexities of WDNs and representing not only the topological features of water systems but also the connectivity among monitored data and physical infrastructure. This is done by considering the importance of developing more accurate and efficient methods for leak detection, able to work in near real-time, and the need to reduce the inspection area in the field. In addition, multi-graphs aid in the representation of the water assets in regionalization as well as taking into account their interconnectivity. Given the lack of literature to model WDNs as multi-graphs and the possibility of exploring this approach for improving the management of water systems, the main objective of this research is to develop a methodology for leak detection and localization that utilizes multiple graphs within a multilayer structure. To achieve this purpose, the paper proposes two main processes. (1) The creation of a graph through the correlation between monitored data that classifies the vertices of this graph and uses the classification information to detect leaks. (2) The construction of a multi-graph in which a topological graph of the entire network is considered a layer, the correlation graph of monitored data is another layer, while an interlayer relationship is given by the sensors’ coverage area. The benchmark presented in The Battle of Leakage Detection and Isolation Methods (BattLeDIM) is used to evaluate the methodology.

2. Methodology

This work proposes a novel methodology for modeling a WDN using three interconnected graphs to detect and localize leaks. A first graph captures the network’s topology and elements based on existing hydraulic model data. A second graph represents the temporal variations in network topology derived from sensor data, focusing on the strength of correlations between monitoring points. Finally, a third graph integrates the first two into a multilayer structure, reflecting sensor coverage and interactions between elements and data. This multilayer structure aids in ascertaining potential leak locations by combining information both from sensor readings and the network’s physical layout. After constructing the three graphs, the leak detection process begins. First, we utilize the graph built from hydraulic data to identify vertices exhibiting the most significant temporal changes. These changes are quantified using two statistical techniques: z-score (Altman et al., 2017) and Interquartile Range (IQR) Wan et al. (2014). Both measurement data and graph-related metrics are evaluated using these techniques.

Vertices with the highest anomalies are then incorporated into the network topology graph, forming distinct layers on a network. This multi-graph serves as the foundation for the leak localization process. Finally, the sensor coverage information embedded within the multilayer structure is leveraged. Hyper-edge analysis identifies edges that are shared by multiple vertices, which delimits the area of leaks occurrence. Leak simulations are conducted in this area, and the similarity between the monitored and simulated data serves as an indicator of the location of the leak. Fig. 1 illustrates the steps involved in applying this methodology.

Fig. 1.

Fig. 1

Flowchart methodology concerning leak detection and localization processes.

2.1. Graph theory application

Graph theory is a branch of discrete mathematics that focuses on studying mathematical structures called graphs. These graphs are often used to model relationships and interactions between objects. A graph (G) can be expressed through a square matrix, called an adjacency matrix (A). This matrix denotes the interconnections between vertices (vV) through edges (eE). However, the adjacency matrix does not represent the strength of these connections. To model the intensity of connections, we can use a weight matrix (W). This matrix associates specific information with connected vertices through edges.

Based on graphs, Complex Network Theory can support tools for leak detection. Among them, this work proposes building a graph based on correlation analysis between monitored data (2.1.1), creating a temporal graph. This graph reveals anomalies that might signal a leak. Another method models the Water Distribution Network (WDN) topology as a weighted graph (2.1.2), with edge weights representing typical pipe flows. Deviations from these expected flows can then pinpoint potential leak locations. Finally, a more comprehensive approach combines these methods into a multi-graph (2.1.3). This multi-graph integrates a correlation graph based on sensor data and a topological graph reflecting the network structure. By considering both sensor data and physical layout, this interconnected graph allows for a more precise leak detection strategy.

2.1.1. Monitored data based graph

Each monitoring point can be represented as a vertex in a graph. The relationships among these vertices, along with their corresponding connection weights, can be established using statistical analysis of the time series data collected from each sensor. Kalofolias (2016) present the concepts for creating a sparse graph, that is a graph with a reduced number of vertices that maintains the structural properties of the complete graph. This method is partially adopted in this work as the reduction of vertice numbers is not considered herein.

Following the proposed methodology of Kalofolias (2016), a graph can be built from a matrix of monitoring data X, where the columns represent different signal sources (nodes monitored in a WDN), and the rows correspond to temporal data t. Each column of X is modeled as a vertex in the graph (xv1,xv2,...,xvi), and edges are defined based on the distances between data pairs. The method proposed by Kalofolias (2016) calculates a pairwise distance matrix (Z) for each t of the data and between all vertices, which also results in a square matrix, and it is used to create the graph. This matrix Z is determined by Eq. (1),

Zij=xixj2. (1)

where, xi and xi represent the data vectors associated, respectively, with vertices i and j in the graph; corresponding to columns of the matrix X. Thus, creating matrix Z results in a distance matrix for all vertices. This amount of information can hinder the capability to understand the graph’s behavior and the underlying tasks (Yan et al., 2006). This limitation is demonstrated by Bezerra et al. (2022) where a similar process for building matrix Z doesn’t prioritize strong connections. To address this issue, our work proposes a threshold analysis to focus only on strong edges between vertices. This analysis generates matrix Ws, derived from a threshold-based filtering process applied to the original Z as Eq. (1) shows.

ifWijmean(Z)thenWij=0ifWij>mean(Z)thenWij=Zij (2)

The Ws matrix is used to create a graph (Gs), which is later used for anomaly detection on the monitored data. Given the WDN dynamics, the most representative graph from monitored data can be achieved only if the topology of Gs is updated every time step when new information arrives. To this end, the calculation of Ws matrix is repeated every t. This results in an application that can be used in a database and can also be applied to real-time data.

In this context, each vertex in Gs represents a monitoring point or signal source (sensor) in a WDN, and the edges represent connections or relationships between these monitoring points based on the similarity (or distance) of their respective time-series data. The edges in Gs are determined by focusing only on the strength of the connections between vertices, pairwise distance Zij is greater than the mean of the distances in matrix Z Weaker connections, where Zij is less than or equal to the mean, are removed by setting the corresponding Wij value to zero.

2.1.2. Topological based graph

A second graph is created based on the topological information used to create hydraulic models. This topological-based graph (GT) represents the N model’s junctions (demand nodes, tanks, and reservoirs) as a set of vertices (V). The elements interconnecting junctions (pipes, pumps, and valves) are represented by a set of edges (E). These connections are mathematically represented by an adjacency matrix (ANxN), where the elements ai,j describe the network topology (Sitzenfrei, 2021, Tzatchkov, Alcocer-Yamanaka, Bourguett Ortíz, 2008).

Each edge eE of the graph GT can have different weights, according to the analysis to be performed. In this study of leak localization, the edge weights are determined by the maximum pipe flows in the WDN. In this work, to obtain the edge weights, a simulation of the network is performed using the library Water Network Tool for Resilience (WNTR) (Klise et al., 2017). Through this simulation, information about the amount of water flowing through each pipe of the network is obtained, allowing for appropriate weights to be assigned to the GT edges.

The approach makes the representation more realistic by incorporating information about the flow capacity of each pipe. When considering the edge weights in the construction of the WGT matrix, the specific transport capacity along the connections is reflected, making the modeling more reliable to real network conditions. This inclusion of details about the flow capacity of the pipes enhances the analysis by considering the physical constraints of the system, which in turn contributes to more accurate and applicable results in identifying potential leak locations. In essence, when considering the practical aspect of flow capacity, the graph representation comes closer to the effective operational dynamics of the water network, providing more realistic and practically useful results (Anchieta et al., 2023).

2.1.3. Multilayer network construction

A multilayer network or multi-graph is a structure in which each layer represents different types of interactions or relationships between system elements. These layers can be interconnected through hyper-edges, which can be weighted by the strength of the interactions, and can also be considered directed or undirected, depending on their nature (Kivelä et al., 2014). This approach also allows the individual analysis of each layer, understanding the specific dynamics between the elements in each context. Overall, the joint analysis of a multilayer network provides a holistic view of the system, revealing patterns of interaction between and within the layers (Liu et al., 2022).

This proposal leverages a dual-layered graph structure to analyze WDNs. The first layer, represented by graph GT, captures the network’s topological characteristics, while the second layer, graph Gs, incorporates hydraulic data. The proximity between nodes and sensors is quantified by Dijkstra’s algorithm using edge weights that represent the maximum pipe flow rates. Proximity refers to the sensor’s ability to detect changes in flow or pressure within a node’s operational range. Sensor coverage typically varies between 8% and 25% of the total nodes (Bezerra, Souza, Meirelles, Brentan, 2022, Zhao, Zhang, Liu, Fu, Wang, 2020), depending on budget and sensor reachability. This allows operators to define a realistic proximity threshold for leak detection. The integration of these layers is achieved by establishing connections based on sensor coverage areas, where the proximity between nodes and sensors determines the strength of these connections. A ’network regionalization’ process is implemented to assess and define these coverage areas. An antecedent of this approach can be found at Gao et al. (2017), where the Dijkstra algorithm (Dijkstra, 1959) is employed to compute shortest paths between nodes, effectively partitioning the WDN. However, in this context, sensors are the primary focus, as they define coverage regions on the WDN. Dijkstra’s algorithm is used to calculate the shortest paths between sensors and all other nodes, and it is shown to be useful to support metering the distance between a sensor and a node, and the corresponding sensitivity to detect flow changes that may happen at that node. This is mathematically defined by Eq. (3), which is the fundamental equation in Dijkstra’s algorithm:

ifd[i]+w(i,j)<d[j]thend[j]=d[i]+w(i,j), (3)

where d[i] represents the estimated distance between vertex i and the source vertex (a sensor in this case) j, while w(i,j) is the weight of the edge connecting vertices (i,j). These weights are the maximum daily flows acquired through the hydraulic simulation of the analyzed network. The application of this algorithm results in the construction of matrix WCr with dimensions N×Xs, where N corresponds to the number of vertices in the graph GT. The WCr matrix presents the distances between the sensor and all network nodes and this information is used as a method of determining the sensor coverage rate by defining a threshold distance. Nodes with a distance to a sensor below this threshold are considered to be within the sensor’s coverage area.

The matrices WGT, WCr and Ws are used for building the extended matrix, (Wex), used for the creation of the multilayer network (Gml) (Sunita and Garg, 2021). Fig. 2 presents the ordering of the matrices for the matrix Wex creation.

Fig. 2.

Fig. 2

Extended matrix (Wex) composition based on topological, data collection and relationship measurement between measurement points and remaining nodes of the network.

Fig. 2 shows the matrix WGT that represents the graph of the first layer, based on the topological information of the network, Ws the graph in the second layer, based on the correlation among acquired data and WCr are the hyperedges representing the inter-layer connections. These inter-layer connections are the search spaces for the anomaly location, so they will be considered only for the coverage area of the most affected sensors.

2.2. Leak detection approach

The methodology for detecting leaks follows three consecutive steps. Initially, the temporal graph Gs is constructed, followed by the classification of the vertices using the graph presented in Section 2.1.1. The processed signal on the graph is then analyzed by the z-score and Interquartile Range (IQR) algorithms, which search for anomalous features in the data. During this search step, the algorithms not only detect anomalies but also point out which vertices of the monitored data graph present the most changes in the behavior of centrality values. Finally, the vertices identified with the greatest changes are then used in the second stage of the process, consisting of leak localization.

2.2.1. PageRank

PageRank was originally developed to rank web pages, but it can also be useful for detecting leaks by evaluating how important each sensor is within a network. The metric works based on a random walk, meaning that sensors with higher PageRank scores are more likely to be visited during the walk. This reflects their influence on the overall flow of the network. In leak detection, if a sensor’s PageRank suddenly changes, it could indicate something unusual is happening, such as a leak in the system PageRank PR(vi) is calculated by using a recursive equation expressed by Eq. (4),

PR(vi)=(1d)i+d*(PR(v1)L(v1)+PR(v2)L(v2)+), (4)

where L(vi) represents the number of edges outgoing from vertex vi, and d is a damping factor (usually set to 0.85) (Yan and Ding, 2011). Initially, each vertex is assigned to an equal PageRank value. Through an iterative equation, the relative relevance of the vertices is successively updated by considering the importance of their neighbors. This process is repeated until the scores converge, indicating that the vertex relevance has stabilized. The steady state reflects the final rankings of vertices based on their PageRank scores, evidencing their importance and influence on the structure of the graph. Vertices with higher PageRank scores are considered more important or influential in the network (Gu et al., 2022). The iterative process simulates a random walk, with the damping factor representing the probability of continuing the walk. Changes in PageRank values over time can be used to identify sensors that exhibit unusual behavior, potentially indicating the presence of a leak.

To detect leaks, the process keep track of how each sensor’s PageRank changes over time. If a sensor’s PageRank suddenly jumps up or drops significantly, it might reflect a shift in the network’s flow patterns, potentially signaling a leak. Then, the process continues by applying a fixed threshold to identify these anomalies. When a sensor’s PageRank deviates from its normal range by more than this threshold, it is triggered a leak alert. The selection of the right threshold depends on the network’s specific characteristics and how sensitive we want the detection to be. In the tests carrying on herein, a threshold at 15% effectively identified leaks while minimising the presence of false alarms.

2.2.2. Degree centrality

Degree centrality measures the number of connections that a vertex has with other vertices (Yustiawan et al., 2015).In the context of leak detection, a sensor with a high degree centrality might be more sensitive to changes in network flow patterns caused by a leak. This is because it receives information from a larger number of neighboring sensors, increasing its ability to detect anomalies. This metric is based on the number of edges that a vertex has and is normally used as an indicator of the relevance and importance of a vertex within the graph. The degree centrality, Dc, value is determined by Eq. (5).

Dc(vi)=d(vi), (5)

where d(vi) is the amount of edges incidents on the vertex vi (Zhang and Luo, 2017). Tracking the degree centrality of sensors enables the detection of potential leak locations by revealing changes in the network’s connectivity patterns. When the connections of certain sensors undergo significant modifications, it may indicate disruptions caused by a leak. This method facilitates the identification of affected areas by highlighting alterations in the network structure, thereby improving the accuracy and effectiveness of leak detection.

2.2.3. Betweenness centrality

Betweenness centrality, Bc, leverages the mediating role that the vertices play in a graph (Brandes, 2001). Hence, this metric assigns importance values to a vertex located on communication paths between the other vertices of the graph. A sensor with high betweenness centrality plays an important role in information flow within the network. Regarding leak detection, a change in the betweenness centrality of a sensor could indicate a disruption in the expected, regular flow patterns that may be caused by a leak. The reason is that a leak can alter the shortest paths between sensors (basis of the betweenness algorithm), as it changes distances and paths between nodes. The value of the Bc of a vertex vi is given by the sum of the fraction of the shortest paths of all pairs that pass through vi and expressed by Eq. (6).

Bc(vi)=vs,vtVσ(vs,vtvi)σ(vs,vt), (6)

where σ(vs,vt) is the number of shortest (vs,vt)-paths, and σ(vs,vtvi) is the number of those paths passing through the node vi other than (vs,vt). If vs=vt, σ(vs,vt)=1 (Zhang and Luo, 2017).

2.2.4. Communicability betweenness centrality

Communicability betweenness centrality, CBc, is an extension of betweenness centrality that considers not only the shortest paths between nodes but also all shortest paths between the set of vertices (Estrada et al., 2009). This is particularly relevant for leak detection in WDNs, where water can flow through multiple paths. A high CBc value for a sensor indicates its importance in facilitating communication and flow throughout the network. A leak can disrupt these communication patterns, leading to changes in CBc values. This metric provides a more comprehensive view of the importance of a node in facilitating general communication, typically used in graphs in which multiple paths are relevant. To determine the CBc values, a subgraph G(r) is obtained by removing all edges from the vertices vi. Thus, G(r)=(V,E(r)) is the subgraph in which E(r) is a matrix that has non-zero elements only in the row and column vi. Eq. (7) shows the mathematical expression of the CBc.

CBc(vi)=vi,vj(eA)vi,vj(eA+E(r)))vi,vj(n1)2(n1), (7)

where (eA)vi,vj is the number of paths from vi to vj in the adjacency matrix A. (eA+E(r)))vi,vj is the number of path from vi to vj in the modify adjacency matrix A+E(r). n is the total number of vertices in G. The results in CBc are between 0 and 1, with 0 being an unreachable limit since this would mean that removing the edges of vi would not affect any path. On the other hand, limit 1 is only reached in star graphs, which have a central node connected to all other nodes, and its removal would disconnect the entire graph (Estrada et al., 2009). Changes in CBc values can reveal sensors that are significantly affected by a leak, as the leak alters the network’s overall flow and communication patterns.

2.2.5. Vertex ranking process

The vertex ranking process is conducted sequentially, using the best centrality metric among those presented. More than processing on graphs the acquired signals from the network, the centrality metric plays a key role in this methodology for assessing the importance and influence of each vertex within the network. For leakage detection, the monitored data-based graph (Gs) can change with each step t due to the calculation of correction between the monitored data, and the importance of vertices can also be changed. Therefore, this dynamic application of the centrality metrics can quantify the importance of vertices as the data is employed, ensuring a continuous and adaptive evaluation of the structural importance of the graph. Using this process helps reduce noise in the data and reveals more significant changes when anomalies occur, making the application of the detection algorithm more effective due to the data noise reduction (Barros et al., 2023a).

Despite the data noise reduction obtained through the calculation of centrality metrics, an automated process is necessary to reliably point out the anomalies and which sensors have the greatest anomalies. Therefore, the centrality values for each sensor over time are analyzed using two approaches, which are jointly applied to detect anomalies and identify the most affected sensors. This process identifies anomalies in the data, which suggests potential problems in the system, and pinpoints the sensors with the most significant deviations. The first stage uses a statistical measure known as the z-score (Zs), which quantifies how deviant a data point is from the mean in terms of standard deviations. The z-score is calculated following Eq. (8).

Zs=C(vi,t)μσ, (8)

where C(vti) is the centrality value for vertex Vi in time t, μ is the average value of the variable, and σ is the standard deviation of the variable (Kızılöz et al., 2022).

The second stage of this process uses the IQR (Wan et al., 2014) to evaluate the centrality of each sensor. The IQR is a statistical measure that indicates the dispersion of values around the median. It can be calculated by subtracting the first quartile (Q1) from the third quartile (Q3) and represents a measure of data variability. Some authors consider values greater than 1.5 times the IQR as possible outliers (Fraser, Gamble, Rose, 2015, Kim, Park, Koo, Kim, Kim, Nam, Park, Kim, Park, 2016), a value also considered in this methodology. This can be expressed as in Eq. (9),

IQR=Q3Q1, (9)

and the outliers are consequently expressed as those elements exceeding the IQR distance as follows: Outliers=Q3+1.5×IQR and Outliers=Q11.5×IQR. The exceed value (1.5) is usually an empirical choice that aims to balance the detection of significant outliers without removing many values that can be normal variations. This approach allows the identification of anomalies that deviate significantly from the expected signal values. By automating this process, the anomaly detection methodology based on IQR can be applied to real-time data analysis without relying on operators’ manual inspection.

The z-score helps identify data points that deviate significantly from the mean, while the IQR method assists in detecting outliers in the dataset (Chikodili et al., 2020). The z-score considers all values in the calculation, making it sensitive to outliers, while the IQR focuses on quartiles and data variability, making it more robust against the influence of such values (Chikodili et al., 2020). By using a combination of z-score and IQR, the process leverages the sensitivity of the z-score in identifying deviations in average and standard deviation while benefiting from the robustness of the IQR in handling extreme values.

The leak detection process takes place when the data exhibits significant changes based on the defined thresholds for z-score and IQR. In the leak detection process, the time-varying centrality data for each vertex is evaluated, and if a data point exceeds the criteria set by the z-score and IQR, it is considered a deviation, and the vertex is identified with a potential anomaly. Once all the vertices with deviations are identified, a ranking process is initiated to rank the vertices, using the maximum z-score value and the IQR value as the primary criteria. In this way, a set of vertices Xs is generated, representing the potential nodes where anomalies are detected detected. The vertices belonging to Xs also belong to graphs based on the topology of the network (Gt) and to the graph built by sensor correlations (Gs). The sensors’ coverage area from sensors belonging to Xs are then used in a leak simulation process carried out with the WNTR package. Then, the behavior of the simulated data is compared with the monitored data, and a similarity method is applied to indicate the location of the leak.

2.3. Leak regionalization approach

After the preliminary identification of the potential leakage region, i.e., sensors with the most significant alterations, this work proposes a refinement of the localization process. This refining process has two main objectives: locating the leak by comparing monitored and simulated data and narrowing down the search space, as simulating and comparing all possible points demands substantial computational effort.

In the leakage simulation stage, a flow of approximately 3% of the network’s total demand is individually added to each demand node within the coverage area of sensors Xs. This percentage is based on the recommendation by Quiñones-Grueiro et al. (2019) for simulating large leaks. This leakage magnitude is significant enough to cause detectable changes in the sensor data, enabling an accurate assessment of the similarity between the simulated and monitored data. Although the 3% threshold serves as a reference value, the simulated leakage magnitude can be adjusted according to the network’s specific characteristics and the sensors’ sensitivity. In our study, a 3% flow was deemed appropriate for detection by the network’s sensors. The chosen flow rate is intended to have a strong impact on the sensors, revealing patterns in the monitored data that may indicate leaks. During the simulation, only data from sensors Xs are recorded and compared to the monitored values (presented in the monitored data matrix X) to evaluate the similarity between the temporal data sequences of both monitored and simulated data.

To conduct such a comparison process, the dynamic time warping (DTW) algorithm (Sakoe and Chiba, 1978) is employed. DTW is an algorithm used for comparing time series with different lengths or temporal changes (Burstyn et al., 2021). It enables the comparison of data sequences even when they are not temporally aligned. The DTW algorithm is used to calculate the similarity between time series and identify anomaly points based on discrepancies in behavior patterns between simulated and monitored data. The application of this algorithm involves several steps, with the first being the creation of the cost matrix. Initially, the algorithm creates a cost matrix (or distance matrix) of size gxh, where h is the size of the simulated data sequence, and g is the size of the monitored data sequence. The matrix is filled iteratively, with each value determined by Eq. (10).

D(x,z)=|s(x)m(z)|2, (10)

where D(x,z) represents the cost of aligning point x in the simulated sequence with point z in the monitored sequence. s(x) is the value at position x in the simulated sequence, and m(z) is the value at position z in the monitored sequence.

After this step, the values in the matrix D are also updated iteratively, based on the costs of adjacent positions, aiming to find the minimum cost from the starting position (0x0) to the ending position (hxg) (Keogh and Ratanamahatana, 2005). To this end, the recurrent Eq. (11) is applied.

D(x,z)=|s(x)m(z)|2+min{D(x1,z),D(x,z1),D(x1,z1)}, (11)

The total alignment cost between the data sequences is determined by the value in the last cell of the cost matrix, i.e., D(hxg). The subsequent step in the algorithm’s application reconstructs the path of the minimum cost traversed in the matrix by tracking the positions that minimized the cost in the previous step. This step indicates the alignment between simulated and monitored data. Finally, the algorithm assigns a similarity measure to each simulated node, utilizing the average similarity measure calculated by Eq. (12).

Sim(n)=1D(n,g)g, (12)

where Sim(n) is the measure of similarity between the nth simulated node, D(n,g) is the alignment cost of the nth simulated node with the last position of the monitored sequence (g), and g is the size of the monitored data sequence. Thus, nodes with higher percentage values of similarity are identified as potential leak locations. The effectiveness and accuracy of the proposed methodology are determined by comparing the identified leakage locations with the actual leakage site. Reference data and networks are employed to validate and assess the methodology, as outlined in the subsequent sections of the study.

3. Application: battle of leakage detection and isolation methods (BattLeDIM)

The proposed methodology is evaluated using the benchmark problem presented in the Battle of Leakage Detection and Isolation Methods (BattLeDIM), described in detail by Vrachimis et al. (2022). The benchmark utilizes the L-Town WDN, which is designed to resemble a real network. The L-Town network is characterized by variations in flows and pressures over time, including changes in demand, pump and valve operations, as well as the occurrence of multiple leaks introduced at different moments, intensities, and durations. L-Town network consists of 905 pipes, totaling approximately 42 km in length, 782 junctions, 3 pressure-reducing valves, 2 reservoirs, and 1 pump. Additionally, the network has 33 pressure monitoring points distributed at different locations. The data provided for evaluation spans a period of two years, with accurate measurements taken every 5 minutes without delays.

Fig. 3 presents the topology of the L-Town network, including its structural elements such as pipes, junctions, reservoirs, and the tank. This visual representation provides an overview of the complexity and size of the network, which serves as a challenging scenario for leak detection and localization. The L-Town benchmark scenario offers a realistic and comprehensive benchmark for evaluating the effectiveness and performance of leakage detection and localization methods.

Fig. 3.

Fig. 3

Topology of L-Town water network highlighting pressure sensors and district metered areas.

Based on the data provided with the WDN, simulations can be conducted, leakage detection and localization algorithms can be applied, and the obtained results can be compared to the actual locations of leaks in the benchmark. This analysis validates the methodology’s ability to correctly identify the leak locations and assesses its precision in relation to the expected results. Furthermore, the data is processed considering the behavior of the control area, already defined in the case study, that follows similar patterns. Similarly, since the areas in the L-Town network are separated by a pressure control valve (Area B in Fig. 3) and a tank (Area C in Fig. 3), the pressures in these areas exhibit distinct characteristics. Hence, the methodology is applied individually per each control area.

3.1. BattLeDIM evaluation metrics

To evaluate the methodological performance proposals for detecting and locating BattLeDIM leaks, the case study presents a database in which the start time, duration, and size of leaks are known. Four initial criteria are considered in the evaluation: detection time after the start of the leak, the flow rate of the leak at the time of detection, the maximum leak flow, and the location distance to the actual leak location. In addition to these criteria, the assessment approaches presented by BattLeDIM are also used in the present study (Vrachimis et al., 2022).

The first evaluation metric is denominated True Positive and considers leak detection the following condition of Eq. (13) is met.

tstltdhtendl, (13)

where tdh is the detection time, tstl and tendl is the start and end time of leakage l. The organizers also present the Profit from Water Saved related to value pwh (euro) of water saved due to correct leak detection, described by Eq. (14).

pwh=(k=tdhtendlql(k)Δt)cw, (14)

where l, ql(k) is the flow rate of leakage, l, at each discrete time step k. Δt is the duration of the discrete time step and cw is the cost (in euros) of water per cubic meter. Finally, a Total Score Ts is determined to evaluate the entire detection and localization set. For this, the total score is determined by Eq. (15).

Ts=hDsh=hD(pwh+chr), (15)

where sh is the score per given detection l and chr is the repair crew cost (Vrachimis et al., 2022).

4. Results and discussion

The proposed methodology enables real-time analysis of monitoring data from BattLeDIM, which is evaluated immediately after issuance at each monitoring interval. The initial step in applying this methodology is representing the L-town network as a weighted graph, where the weights correspond to the maximum throughput of the pipes (Fig. 4).

Fig. 4.

Fig. 4

L-town model as a weighted graph based on maximum flow rate.

The weighted graph presented in Fig. 4 serves as a layer in the multi-graph. In the first methodological step, we create graph Gs to explore correlations between monitored pressure data and evaluate centrality metrics. Matrix X, representing the monitoring data, is utilized in the second step. The entire database is analyzed sequentially, time step by time step. As the case study comprises three distinct areas, the proposed methodology is applied to each area individually.

The first analysis focuses on applying the methodology using sensors in Area A and highlights the first 24 days of 2019 from the monitored data. Additional analyses are performed with three sensors in Area C and the sole sensor in Area B. However, the first stage of the methodology is not applicable to Area B because the presence of only one sensor precludes the creation of the Z matrix for this area.

The dynamic graph Gs is evaluated at each time step, and a centrality value is assigned to each vertex. Among the analyzed centrality measures, PageRank performed best in terms of sensitivity to leaks and associated small changes in graph structure. Fig. 5 shows the normalized PageRank values for the first 24 days of data, specifically for the 29 sensors in Area A. Other centrality measures can be found in the supplemental material.

Fig. 5.

Fig. 5

PageRank values - Area A.

Fig. 5 illustrates that PageRank values for certain sensors fluctuate after a few days. The PageRank metric demonstrates greater sensitivity to the presence of leaks compared to pressure signals, capturing subtle changes in the graph structure. This improved sensitivity stems from the nature of PageRank, which considers the global connectivity of the network, while pressure signals reflect only the local state at a specific point. Leaks, even of small magnitude, alter the flow in the network, impacting the connectivity between nodes and, consequently, PageRank. In contrast, pressure signals may not undergo significant changes in the early stages of a leak, especially if the leak is far from the measurement point.

The PageRank values variation are more pronounced in some instances, and the anomaly detection method employing the z-score and IQR algorithm highlighted anomalous points and identified the sensors exhibiting the most significant behavioral changes within the data. Fig. 6, Fig. 7 presents the five sensors with the most substantial anomalies identified, along with the corresponding pressure readings monitored by these sensors.

Fig. 6.

Fig. 6

Data behaviour.

Fig. 7.

Fig. 7

Sensors coverage rate.

Fig. 6 a presents an interval of monitored pressure data provided by BattLeDIM for Area A. Visual inspection reveals similarities in the data patterns. However, starting on January 15, 2019, a leak in pipe p523 emerged in the network. The resulting changes in pressure are not readily apparent from the monitored data alone. Conversely, PageRank values exhibit similar repetitive behavior, but this behavior changes noticeably when the leak begins (Fig. 6b). Our detection method successfully identifies moments of anomaly and pinpoints the leak location in each case. All detection instances are detailed in Table 1. In addition, the z-score and IQR methods highlight the sensors with the most significant changes for each case, initiating the leak localization process. To assess the methodology’s ability to detect anomalies in graph-based process data, the IQR and z-score approaches are also applied directly to the pressure signal. For this analysis, we consider Area A with graph GT and select the 5 sensors exhibiting the highest anomalies to create graph Gs. This number of sensors is chosen due to their significantly larger discrepancies compared to other quantities. The Dijkstra algorithm is applied, considering approximately 60 nodes as covered by the sensors, representing roughly 8% of the vertices in graph GT as the sensor coverage area.

Table 1.

Detection and localization results.

Detec. (h:m) Flow detec. (LPS) Max flow (LPS) Local. dist. (m) Report time Pressure detec. (h:m) Porcentagem (%)
p123* 1016:20 3.2 9.19 237 2019-10-20 12:25 1069:45 1.47
p142 04:40 26.88 27.04 132 2019-06-13 05:45 5:15 14.13
p193* 869:30 3.31 10.36 378 2019-07-08 20:10 912:05 2.74
p257 Undetected
p277* 1023:00 2.74 7.36 290 2019-07-20 21:55 1101:15 0.86
p280 00:20 5.16 5.26 46 2019-02-10 13:25 00:35 2.89
p331 00:35 10.65 10.93 327 2019-04-20 10:45 00:45 4.25
p426 08:40 13.25 13.56 224 2019-10-26 22:05 10:15 4.21
p427 Undetected
p455* 885:00 3.1 11.05 75 2019-11-17 05:00 920:25 1.2
p514 00:25 15.38 15.58 192 2019-04-02 21:05 00:40 5.35
p523 00:15 28.6 28.39 43 2019-01-15 23:15 00:15 12.64
p586* 296:35 3.13 20.52 127 2019-08-22 07:15 346:55 0.95
p653* 48:30 3.29 18.28 164 2019-03-05 13:40 50:05 0.02
p654 Undetected
p680 00:45 5.36 5.37 204 2019-07-10 09:30 01:25 1.48
p710 02:55 5.56 5.58 42 2019-03-24 17:15 03:45 2.44
p721* 763:15 5.12 13.18 147 2019-09-08 22:15 901:45 1.59
p762* 314:50 1.03 15.71 238 2019-11-14 01:05 352:35 0.31
p800* 193:20 3.11 21.95 48 2019-08-25 03:20 251:10 1.36
p810 Undetected
p827 01:35 26.05 26.46 152 2019-01-24 20:05 02:55 9.14
p879* 501:30 3.02 10.93 286 2019-12-17 19:25 580:20 1.14

* - Leaks with increasing start.

The nodes within the sensors’ coverage area are used as simulated leak sources. This primarily reduces the search space and processing time of the simulations. The flow rate used in the simulations was approximately 3 L/s, and simulated data was saved only from the monitoring points. This leakage value is selected based on the mode of leakage flow from the tested dataset. The DTW algorithm then determines the similarity value between the monitored and simulated data. The results of these processes, considering the leak in pipe p523, can be seen in Fig. 8.

Fig. 8.

Fig. 8

Methodology application - Area A.

Fig. 8 a shows the resulting multi-graph from the proposed methodology. It can be observed that there is a dispersion among the sensors with the highest detected anomalies. However, the three most indicated sensors (green, blue, and grey edges) show higher alterations compared to the other sensors. Therefore, the vertices that have edges with both sensors are used to simulate leaks, and the similarity between the simulated and monitored data is determined. Fig. 8b shows the vertices with the highest similarity values determined by the DTW method. It is worth noting that the vertices with the highest similarities surround the actual leak location (black box) at about 50m. This distance is calculated using the coordinates of the node identified as the leak source and the average of the coordinates of the initial and final nodes of the leaking pipe.

The application of the methodology related to Area C revealed a leak in node n280. The leak, which began on February 10, 2019, is detailed in Fig. 9, along with the monitored data and PageRank values for the sensors in this area. To automatically detect anomalies in data, the z-score and IQR methods are used, which consider significant changes when the PageRank values of each sensor deviate substantially from typical behavior.

Fig. 9.

Fig. 9

Data behaviour.

In Fig. 9a, it can be observed that the behavior of the monitored pressures remains largely unchanged after the leak begins. Although less pronounced, PageRank values do change over time, and this change was detected by the proposed algorithms (Fig. 9b). Consequently, the leak localization process was initiated using the three sensors in the area to create the Gs graph, primarily due to the limited number of sensors in this region. The coverage area was then determined using Dijkstra’s algorithm, and the nodes within this area were used in the leak simulation process.

The resulting multi-graph and the locations with the highest probability of leaks are shown in Fig. 10.

Fig. 10.

Fig. 10

Methodology application - Area C.

Fig. 10 a illustrates the multi-graph associated with Area C. An anomaly is detected, and sensors at nodes n31 (red) and n4 (green) are identified as exhibiting the most significant changes. The nodes with the highest behavioral similarity values to the real data (Fig. 10b) are located approximately 60 meters from the actual leak location. Fig. 10b reveals that all nodes exhibit high similarity values, potentially due to the similar pressures controlled by the tank in this network. Consequently, a leak could impact the water level in the tank, and this change in level could affect the pressure at all nodes within area C.

The third application of the methodology is performed in Area B (Fig. 11), with the localization process based solely on the determination of similarity values.

Fig. 11.

Fig. 11

Leak Localization - Area B p680.

The accuracy of locating the leak in the p680 pipe, shown in Fig. 11, is not as high as for other leaks. However, this approach can identify anomalies in the data within 45 minutes of the leak starting. The methodology is applied to all leaks identified by BattLeDIM. Table 1 shows the following for each leak:

  • Time elapsed from the actual leak start to detection.

  • Leak flow at the time of detection.

  • Maximum leak flow rate.

  • Distance between the indicated location and the reported detection location.

  • Percentage of detected leak flow rate compared to total inlet flow rate in the network (pipe 227).

Table 1 presents the results of the application, prompting relevant discussions. The effectiveness of the detection process is immediately evident, revealing the identification of anomalies in the data just 15 minutes after the start of the leak p523. Leaks were detected at nodes p280, p514, p331, and p680 within 45 minutes of their onset. All of these cases involve leaks that began abruptly. However, it is important to note that in some cases, detection occurred several hours after the leak started. These cases involve leaks characterized by a gradual increase in flow, such as those observed in leaks p123, p277, p455, and p721. Leak detection in these pipes occurs more than 700 hours after the start of the leak. However, because these leaks have a gradual increase in flow rates, the monitored data is also gradually changed. Such behavior introduces complexity into data analysis, as minimal temporal changes affect the data uniformly. This complexity is even more apparent in the localization process, where establishing similarities between data points yields less pronounced results. Consequently, there are increasing discrepancies between the location suggested by the method and the actual location of the leak.

Table 1 additionally presents the flow rate of the leak at the time of detection, highlighting the difficulty in promptly detecting leaks with gradual flow increases. It is observed that in most cases of leaks with gradual growth, detection occurs when the flow reaches approximately 3 L/s (p123, p193, p277, p455, p586, p653, P800, p879). Two exceptions are leak p721, where detection occurs when the flow reaches 5.12 L/s, and leak p762, where it is detected when the flow is about 1 L/s. However, it is important to note that these detection times occur before the maximum flow rates are reached. For example, the leak in p762 was detected at 1.03 L/s, while its maximum flow value reached 15.71 L/s. It is worth noting that applying anomaly detection algorithms, such as IQR and z-score, directly to the pressure signal acquired by the sensor resulted in longer detection times. The best detection time when applying IQR and z-score to pressure data resulted in the same detection time as the proposed methodology, 15 minutes after the start of the leakage. Nevertheless, this occurs for the largest leakage in the tested dataset, around 28 L/s. For the smallest leak, around 1 L/s, the detection time increased by 38 hours. This difference corresponds to almost 140 m3, a significant amount of water that could supply around 950 consumers for one day.

As mentioned previously, the process of locating some leaks is not as precise, but in some cases, the distance is less than 100 meters (p455, p800). Of particular note are the leaks in pipes p523 and p710, with distances to the exact leak location of 43 and 42 meters, respectively. Conversely, the leak in pipe p193 results in the greatest distance (378 meters), but this particular leak exhibits a gradual increase in flow. Additionally, the leak in pipe p331 stands out, as it has an abrupt start and was indicated 327 meters from the actual leak location. This situation occurs in a region where the pressures show little variation and do not significantly impact the data during the simulation of leaks. These greater distances are also reflected in the evaluation process presented by the BattLeDIM. The approach identifies 17 leaks as True Positives and 2 as False Positives. This means that out of the 23 leaks, 17 are detected correctly, and two others are detected but in the wrong locations. These values corroborate the results presented in Table 1 in two main factors: regarding the 4 leaks that are not detected (p257, p427, p654, and p810); and the leaks in pipes p193 and p331 where the indicated distance from the actual leak location is greater than 300 meters. Locations greater than 300 meters are considered failures by BattLeDIM organizers. Even so, the Total score presented by the methodology proposed in this work is 43,491, a higher value close to the highest achieved by BattLeDIM solutions (64,873 and 60,562). The perfect Total Score, if all leaks were detected and located immediately, would be 23,154. The Report file indicating the locations and times of the reports is presented as an appendix; this file can be used in the BattLeDIM evaluation software.

Furthermore, Table 1 presents the results on the percentage of the leak flow detected in relation to the total inlet flow in the network. This is because only two pipes have flow monitoring. Based on the percentage column, we can see that, in general, detections occur when the leak flow reaches a significant value in relation to the total inlet flow. In some cases, this percentage reaches 28%, highlighting the importance of early detection to avoid significant water losses. It is essential to note that early detection of leaks, even when the flow has not yet reached its maximum value, is crucial to avoid water waste and minimize operating costs.

Other research has explored leak detection and localization using the BattLeDIM benchmark. Mohan Doss et al. (2024) reports near-instantaneous detection for some abrupt leaks, similar to Table 1. However, it also mentions cases with longer detection times, especially for incipient leaks. However, the researchers use deep neural networks and require a high processing time and a large amount of data for training. Cholewa et al. (2024) focuses on localization with minimal sensors and achieves good accuracy (average distance of 180 m) with only three sensors, which is comparable to the localization accuracy achieved in some cases by the methodology proposed here. However, the methodology presented by the authors relies on the relocation of a mobile sensor to achieve good accuracy in localizing the leak. This can be challenging in real-world scenarios, where sensor relocation can be complex or time-consuming. Unlike these studies, this methodology combines leak detection and localization into a single integrated framework, simplifying the process and potentially reducing the time to identify and treat leaks. Furthermore, the methodology strategically narrows the leak search space by considering sensor coverage areas and correlations, leading to fewer hydraulic simulations and faster processing compared to methods that require extensive simulations or sensor relocation.

5. Conclusions

The paper presents a novel methodology based on multilayer network analysis for leak detection and localization in WDN that demonstrates promising results when applied to the benchmark problem presented in BattLeDIM. Particularly, the work bases its methodological fundamentals on graph analysis and monitoring data to detect anomalies in the network pressure data, being capable to indicate the presence of leaks within 15 minutes of their onset. However, it is important to highlight that the effectiveness of leak detection and localization varies depending on the nature of the leaks. Leaks with a gradual flow increase present additional challenges, as changes in data over time are smaller, resulting in delayed detection and less precise localization. Despite these limitations, the overall score achieved by the proposed methodology is competitive compared to other participants in BattLeDIM, showing its efficiency and performance in comparison to alternative approaches. The joint approach to detect and localize leaks on a single framework based on a multilayer graph approach is an important advance in the water field, allowing time reduction in the maintenance process for leaks. While the leak detection step is totally hydraulic-model free, bringing the advantage of working directly with databases, the localization step is partially dependent on hydraulic simulation. Even that, the regionalization of the leak is entirely based on data sets, also allowing a reduction in the search process in the field for maintenance. The proposed approach stands out mainly for favoring a broad view of the processes for detecting and locating leaks, making a correlated analysis between the monitored data, the sensor coverage areas, and the locations with leaks.

In this regard, the presented methodology constitutes a significant advancement in leak detection and localization in water distribution networks, particularly in scenarios where changes in data are smoother. Relevant points of this research include the approaches that can be considered when working with multilayer graphs, as they offer several advantages, such as determining the sensor coverage area by different methods and reducing the graph using subgraphs. Another important point is the determination of similarity through the application of DTW. While it is a fast process, it becomes impractical when the sampling space is too large. This problem was addressed by reducing the sample space to only the nodes covered by the sensors indicated in the simulation process. However, even with positive results, there is still room for improvement, especially because this process identifies the most impacted sensors, which influences the search space for the leak. Future works can explore new complex network theory metrics to refine the data processing on graphs even more, allowing even better results, mainly for small leaks or those that grow in time. Furthermore, an analysis of related systems can also be carried out together, considering other layers, for example, the energy, internet, and road networks, that bring new information for water companies for maintenance campaigns.

CRediT authorship contribution statement

Daniel Barros: Conceptualization, Methodology, Software. Ariele Zanfei: Writing – review & editing, Validation, Supervision, Funding acquisition, Conceptualization. Andrea Menapace: Writing – review & editing, Supervision, Funding acquisition, Conceptualization. Gustavo Meirelles: Conceptualization, Methodology, Software. Manuel Herrera: Conceptualization, Methodology, Software. Bruno Brentan: Conceptualization, Methodology, Software.

Declaration of competing interest

The authors declare that there are no conflicts of interest

Contributor Information

Daniel Barros, Email: dbezerra@unicamp.br.

Ariele Zanfei, Email: a.zanfei@aiaqua.tech.

Andrea Menapace, Email: Andrea.Menapace@unibz.it.

Gustavo Meirelles, Email: gustavo.meirelles@ehr.ufmg.br.

Manuel Herrera, Email: manuel.herrera@newcastle.ac.uk.

Bruno Brentan, Email: brentan@ehr.ufmg.br.

Appendix A. Additional results

Appendix A compress the following contents:

  • Report file resulting from the proposed methodology in Table 2.

    Table 2.

    Report file.
    # linkID startTime
    p257 2019-12-31 00:05
    p427 2019-12-31 00:05
    p810 2019-12-31 00:05
    p654 2019-12-31 00:05
    p523 2019-01-15 23:15
    p824 2019-01-24 20:05
    p251 2019-02-10 13:25
    p633 2019-03-05 13:40
    p710 2019-03-24 17:15
    p553 2019-04-02 21:05
    p325 2019-04-20 10:45
    p845 2019-07-08 20:10
    p16 2019-07-20 21:55
    p139 2019-06-13 05:45
    p147 2019-07-10 09:30
    p124 2019-08-22 07:15
    p161 2019-09-08 22:15
    p179 2019-08-25 03:20
    p578 2019-10-20 12:25
    p453 2019-11-17 05:00
    p750 2019-11-14 01:05
    p683 2019-10-26 22:05
    p884 2019-12-17 19:25
  • Normalize pressure Area A in Fig. 12.

    Fig. 12.

    Fig. 12

    Normalize pressure Area A.

Data Availability

  • Data will be made available on request.

References

  1. Agathokleous A., Christodoulou C., Christodoulou S., et al. Robustness and vulnerability assessment of water networks by use of centrality metrics. Eur. Water Resour. Assoc. 2017;58:489–495. [Google Scholar]
  2. Altman E.I., Iwanicz-Drozdowska M., Laitinen E.K., Suvas A. Financial distress prediction in an international context: A review and empirical analysis of altman’s z-score model. J. Int. Financ. Manag.Account. 2017;28(2):131–171. [Google Scholar]
  3. Anchieta T., Meirelles G., Carpitella S., Brentan B., Izquierdo J. Water distribution network expansion: an evaluation from the perspective of complex networks and hydraulic criteria. J. Hydroinf. 2023;25(3):628–644. [Google Scholar]
  4. Barros D., Almeida I., Zanfei A., Meirelles G., Luvizotto Jr E., Brentan B. An investigation on the effect of leakages on the water quality parameters in distribution networks. Water. 2023;15(2):324. [Google Scholar]
  5. Barros D.B., Souza R.G., Meirelles G., Brentan B. Leak detection in water distribution networks based on graph signal processing of pressure data. J. Hydroinf. 2023 [Google Scholar]
  6. Beeler, R. A., Beeler, R. A., 2015. Application: Graph theory. How to Count: An Introduction to Combinatorics and Its Applications, 309–343.
  7. Bezerra D., Souza R., Meirelles G., Brentan B. Int. Jt. Conf. Water Distrib. Syst. Anal. Vol. 86. 2022. Leak detection in water distribution networks based on graph signal processing of pressure data; p. 14073. [Google Scholar]
  8. Brandes U. A faster algorithm for betweenness centrality. J. Math. Sociol. 2001;25(2):163–177. [Google Scholar]
  9. Bredereck R., Komusiewicz C., Kratsch S., Molter H., Niedermeier R., Sorge M. Assessing the computational complexity of multilayer subgraph detection. Netw. Sci. 2019;7(2):215–241. [Google Scholar]
  10. Burstyn Y., Gazit A., Dvir O. Hierarchical dynamic time warping methodology for aggregating multiple geological time series. Comput. Geosci. 2021;150 [Google Scholar]
  11. Chikodili N.B., Abdulmalik M.D., Abisoye O.A., Bashir S.A. International Conference on Information and Communication Technology and Applications. Springer; 2020. Outlier detection in multivariate time series data using a fusion of k-medoid, standardized euclidean distance and z-score; pp. 259–271. [Google Scholar]
  12. Cholewa M., Romaszewski M., Głomb P., Kołodziej K., Gorawski M., Koral J., Koral W., Madej A., Musioł K. ‘Just one more sensor is enough’–iterative water leak localization with physical simulation and a small number of pressure sensors. IEEE Sens. J. 2024;24(15) [Google Scholar]
  13. Choudhary P., Modi A., Botre B., Akbar S. AIP Conference Proceedings. Vol. 2335. AIP Publishing; 2021. Leak detection in smart water distribution network; p. 050007. [Google Scholar]
  14. De Vries J., Groenestein C., Schröder J., Hoogmoed W., Sukkel W., Koerkamp P.G., De Boer I. Integrated manure management to reduce environmental impact: Ii. Environmental impact assessment of strategies. Agric. Syst. 2015;138:88–99. [Google Scholar]
  15. Dijkstra E. A note on two problems in connexion with graphs. Numer. Math. 1959;1:269–271. [Google Scholar]
  16. Estrada E., Higham D.J., Hatano N. Communicability betweenness in complex networks. Physica A Stat. Mech. Appl. 2009;388(5):764–774. [Google Scholar]
  17. Fraser A.G., Gamble G., Rose T. Sa1441 the addition of histology to continuous audit has significantly increased adenoma detection rate in a private endoscopy unit. Gastrointestinal Endoscopy. 2015;81(5) [Google Scholar]
  18. Gao J., Yao F., Xu Y., Sun G., Zheng C., Qi S., Cui F. Pma partition method of water distribution network combined with gragh theory. Procedia Eng. 2017;186:278–285. [Google Scholar]
  19. Giudicianni C., Herrera M., Di Nardo A., Greco R., Creaco E., Scala A. Topological placement of quality sensors in water-distribution networks without the recourse to hydraulic modeling. J. Water Resour. Plann. Manag. 2020;146(6) [Google Scholar]
  20. Giudicianni C., Herrera M., Di Nardo A., Oliva G., Scala A. The faster the better: on the shortest paths role for near real-time decision making of water utilities. Reliab. Eng. Syst.Saf. 2021;212 [Google Scholar]
  21. Gu X.-M., Lei S.-L., Zhang K., Shen Z.-L., Wen C., Carpentieri B. A hessenberg-type algorithm for computing pagerank problems. Numer. Algorithms. 2022;89(4):1845–1863. [Google Scholar]
  22. Guan W., Cao J., Gu Y., Qian S. Gama: a multi-graph-based anomaly detection framework for business processes via graph neural networks. Inf. Syst. 2024;124 [Google Scholar]
  23. Herrera M., Sasidharan M., Cassidy S., Parlikad A.K. Performance assessment of a communication infrastructure with redundant topology: a complex network approach. Comput. Netw. 2023;228 [Google Scholar]
  24. Hunaidi O., Wang A., Bracken M., Gambino T., Fricke C. International Conference on Water Demand Management. Citeseer; 2004. Acoustic methods for locating leaks in municipal water pipe networks; pp. 1–14. [Google Scholar]
  25. Kaghazchi A., Shahdany S.M.H., Roozbahani A. Simulation and evaluation of agricultural water distribution and delivery systems with a hybrid Bayesian network model. Agric. Water Manag. 2021;245 [Google Scholar]
  26. Kalofolias V. Artificial Intelligence and Statistics. PMLR; 2016. How to learn a graph from smooth signals; pp. 920–929. [Google Scholar]
  27. Keogh E., Ratanamahatana C.A. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 2005;7:358–386. [Google Scholar]
  28. Kim H.S., Park S., Koo J.S., Kim S., Kim J.Y., Nam S., Park H.S., Kim S.I., Park B.-W. Risk factors associated with discordant ki-67 levels between preoperative biopsy and postoperative surgical specimens in breast cancers. PloS one. 2016;11(3) doi: 10.1371/journal.pone.0151054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kirstein J.K., Høgh K., Rygaard M., Borup M. A semi-automated approach to validation and error diagnostics of water network data. Urban Water J. 2019;16(1):1–10. [Google Scholar]
  30. Kivelä M., Arenas A., Barthelemy M., Gleeson J.P., Moreno Y., Porter M.A. Multilayer networks. J. Complex Netw. 2014;2(3):203–271. [Google Scholar]
  31. Kızılöz B., Şişman E., Oruç H.N. Predicting a water infrastructure leakage index via machine learning. Util. Policy. 2022;75 [Google Scholar]
  32. Klise K.A., Bynum M., Moriarty D., Murray R. A software framework for assessing the resilience of drinking water systems to disasters with an example earthquake case study. Environ. Model. Softw. 2017;95:420–431. doi: 10.1016/j.envsoft.2017.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Liemberger R., Wyatt A. Quantifying the global non-revenue water problem. Water Supply. 2019;19(3):831–837. [Google Scholar]
  34. Liu L., Kang Z., Ruan J., He X. Multilayer graph contrastive clustering network. Inf. Sci. 2022;613:256–267. [Google Scholar]
  35. Liu Y., Ma X., Li Y., Tie Y., Zhang Y., Gao J. Water pipeline leakage detection based on machine learning and wireless sensor networks. Sensors. 2019;19(23):5086. doi: 10.3390/s19235086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Mashhadi N., Shahrour I., Attoue N., El Khattabi J., Aljer A. Use of machine learning for leak detection and localization in water distribution systems. Smart Cities. 2021;4(4):1293–1315. [Google Scholar]
  37. Mohan Doss P., Rokstad M.M., Tscheikner-Gratl F. The performance of encoder–decoder neural networks for leak detection in water distribution networks. Water Supply. 2024;24(8):2750–2764. [Google Scholar]
  38. Munikoti S., Lai K., Natarajan B. Robustness assessment of hetero-functional graph theory based model of interdependent urban utility networks. Reliab. Eng. Syst. Saf. 2021;212 [Google Scholar]
  39. Oliveira G., Marcato F., Scazufca P., B N.M. Technical Report. Trata Brasil; 2020. Perdas Dde água 2020 (SNIS 2019): Desafios para disponibilidade hídica e avanço da eficiência do saneamiento básico. [Google Scholar]
  40. Perez R., Puig V., Pascual J., Peralta A., Landeros E., Jordanas L. Pressure sensor distribution for leak detection in Barcelona water distribution network. Water Sci. Technol. Water Supply. 2009;9(6):715–721. [Google Scholar]
  41. Pu Z., Han D., Yan H., Tao T., Xin K. Enhancing accuracy and interpretability of multi-steps water demand prediction through prior knowledge integration in neural network architecture. Water Res. X. 2024 [Google Scholar]
  42. Quiñones-Grueiro M., Verde C., Llanes-Santiago O. 2019 4th Conference on Control and Fault Tolerant Systems (SysTol) IEEE; 2019. Multi-objective sensor placement for leakage detection and localization in water distribution networks; pp. 129–134. [Google Scholar]
  43. Rajabi M.M., Komeilian P., Wan X., Farmani R. Leak detection and localization in water distribution networks using conditional deep convolutional generative adversarial networks. Water Res. 2023;238 doi: 10.1016/j.watres.2023.120012. [DOI] [PubMed] [Google Scholar]
  44. Romano M., Kapelan Z., Savić D. Water Distribution Systems Analysis 2010. 2010. Real-time leak detection in water distribution systems; pp. 1074–1082. [Google Scholar]
  45. Sagnard F., Norgeot C., Derobert X., Baltazart V., Merliot E., Derkx F., Lebental B. Utility detection and positioning on the urban site sense-city using ground-penetrating radar systems. Measurement. 2016;88:318–330. [Google Scholar]
  46. Sakoe H., Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 1978;26(1):43–49. [Google Scholar]
  47. Shekofteh M., Jalili Ghazizadeh M., Yazdi J. A methodology for leak detection in water distribution networks using graph theory and artificial neural network. Urban Water J. 2020;17(6):525–533. [Google Scholar]
  48. Shinkuma R., Sugimoto Y., Inagaki Y. Weighted network graph for interpersonal communication with temporal regularity. Soft Comput. 2019;23:3037–3051. [Google Scholar]
  49. Sitzenfrei R. Using complex network analysis for water quality assessment in large water distribution systems. Water Res. 2021;201 doi: 10.1016/j.watres.2021.117359. [DOI] [PubMed] [Google Scholar]
  50. Stahl T., Wischnewski A., Betz J., Lienkamp M. 2019 IEEE Intelligent Transportation Systems Conference (ITSC) IEEE; 2019. Multilayer graph-based trajectory planning for race vehicles in dynamic scenarios; pp. 3149–3154. [Google Scholar]
  51. Sunita, Garg D. Dynamizing dijkstra: A solution to dynamic shortest path problem through retroactive priority queue. J. King Saud Univ.-Comput. Inf. Sci. 2021;33(3):364–373. [Google Scholar]
  52. Tzatchkov V.G., Alcocer-Yamanaka V.H., Bourguett Ortíz V. Water Distribution Systems Analysis Symposium 2006. 2008. Graph theory based algorithms for water distribution network sectorization projects; pp. 1–15. [Google Scholar]
  53. Vairavamoorthy K., Lumbers J. Leakage reduction in water distribution systems: optimal valve control. J. Hydraul. Eng. 1998;124(11):1146–1154. [Google Scholar]
  54. Vrachimis S.G., Eliades D.G., Taormina R., Kapelan Z., Ostfeld A., Liu S., Kyriakou M., Pavlou P., Qiu M., Polycarpou M.M. Battle of the leakage detection and isolation methods. J. Water Resour. Plann. Manag. 2022;148(12) [Google Scholar]
  55. Wan X., Wang W., Liu J., Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med. Res. Methodol. 2014;14:1–13. doi: 10.1186/1471-2288-14-135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wu Y., Chen Z., Gong H., Feng Q., Chen Y., Tang H. Defender–attacker–operator: Tri-level game-theoretic interdiction analysis of urban water distribution networks. Reliab. Eng. Syst.Saf. 2021;214 [Google Scholar]
  57. Xu Y., Yu T., Yang B. Reliability assessment of distribution networks through graph theory, topology similarity and statistical analysis. IET Gener. Transm. Distrib. 2019;13(1):37–45. [Google Scholar]
  58. Yan E., Ding Y. Discovering author impact: a pagerank perspective. Inf. Process. Manag. 2011;47(1):125–134. [Google Scholar]
  59. Yan S., Xu D., Zhang B., Zhang H.-J., Yang Q., Lin S. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2006;29(1):40–51. doi: 10.1109/TPAMI.2007.12. [DOI] [PubMed] [Google Scholar]
  60. Yu X., Wu Y., Zhou X., Liu S. Resilience evaluation for water distribution system based on partial nodes’ hydraulic information. Water Res. 2023 doi: 10.1016/j.watres.2023.120148. [DOI] [PubMed] [Google Scholar]
  61. Yustiawan Y., Maharani W., Gozali A.A. Degree centrality for social network with opsahl method. Procedia Comput. Sci. 2015;59:419–426. [Google Scholar]
  62. Zanfei A., Menapace A., Brentan B.M., Righetti M., Herrera M. Novel approach for burst detection in water distribution systems based on graph neural networks. Sustain. Cities Soc. 2022;86 [Google Scholar]
  63. Zhang J., Luo Y. 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017) Atlantis press; 2017. Degree centrality, betweenness centrality, and closeness centrality in social network; pp. 300–303. [Google Scholar]
  64. Zhang X., Shi J., Huang X., Xiao F., Yang M., Huang J., Yin X., Usmani A.S., Chen G. Towards deep probabilistic graph neural network for natural gas leak detection and localization without labeled anomaly data. Expert Syst. Appl. 2023;231 [Google Scholar]
  65. Zhao M., Zhang C., Liu H., Fu G., Wang Y. Optimal sensor placement for pipe burst detection in water distribution systems using cost–benefit analysis. J. Hydroinf. 2020;22(3):606–618. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

  • Data will be made available on request.


Articles from Water Research X are provided here courtesy of Elsevier

RESOURCES