HAO-AVP: An Entropy-Gini Reinforcement Learning Assisted Hierarchical Void Repair Protocol for Underwater Wireless Sensor Networks

Lijun Hao; Chunbo Ma; Jun Ao

doi:10.3390/s26020684

. 2026 Jan 20;26(2):684. doi: 10.3390/s26020684

HAO-AVP: An Entropy-Gini Reinforcement Learning Assisted Hierarchical Void Repair Protocol for Underwater Wireless Sensor Networks

Lijun Hao ¹, Chunbo Ma ^1,^*, Jun Ao ¹

Editor: Wei Yi¹

PMCID: PMC12846151 PMID: 41600479

Abstract

Wireless Sensor Networks (WSNs) are pivotal for data acquisition, yet reliability is severely constrained by routing voids induced by sparsity, uneven energy, and high dynamicity. To address these challenges, the Hybrid Acoustic-Optical Adaptive Void-handling Protocol (HAO-AVP) is proposed to satisfy the requirements for highly reliable communication in complex underwater environments. First, targeting uneven energy, a reinforcement learning mechanism utilizing Gini coefficient and entropy is adopted. By optimizing energy distribution, voids are proactively avoided. Second, to address routing interruptions caused by the high dynamicity of topology, a collaborative mechanism for active prediction and real-time identification is constructed. Specifically, this mechanism integrates a Markov chain energy prediction model with on-demand hop discovery technology. Through this integration, precise anticipation and rapid localization of potential void risks are achieved. Finally, to recover damaged links at the minimum cost, a four-level progressive recovery strategy, comprising intra-medium adjustment, cross-medium hopping, path backtracking, and Autonomous Underwater Vehicle (AUV)-assisted recovery, is designed. This strategy is capable of adaptively selecting recovery measures based on the severity of the void. Simulation results demonstrate that, compared with existing mainstream protocols, the void identification rate of the proposed protocol is improved by approximately 7.6%, 8.4%, 13.8%, 19.5%, and 25.3%, respectively, and the void recovery rate is increased by approximately 4.3%, 9.6%, 12.0%, 18.4%, and 24.2%, respectively. In particular, enhanced robustness and a prolonged network life cycle are exhibited in sparse and dynamic networks.

Keywords: underwater wireless sensor networks, routing void, hybrid acoustic-optical communication, reinforcement learning, Gini coefficient

1. Introduction

As a key technology for large-scale information acquisition, Wireless Sensor Networks (WSNs) possess immense potential in fields such as resource exploration and security monitoring [1,2,3]. However, in practical applications, characteristics such as node sparsity and dynamic topological changes frequently induce the severe problem of Routing Voids [4]. This phenomenon occurs when a node fails to locate a next-hop relay, leading to link interruption and the formation of communication “dead zones.” This not only causes data loss and a drastic increase in latency but also triggers routing holes or even network paralysis [5,6]. It is noteworthy that although this issue has been studied in traditional static networks, due to fundamental differences in communication media and mobility patterns, existing schemes based on static topologies cannot be directly applied to highly dynamic and complex environments [7,8,9]. Furthermore, most current research fails to adequately address the reliability requirements of heterogeneous networks under resource-constrained conditions [10,11]. Therefore, the design of a specialized and efficient routing void handling mechanism for such complex dynamic networks is urgently required [12].

Specifically, as a critical extension of WSNs into the marine domain, Underwater Wireless Sensor Networks (UWSNs) have garnered significant attention, yet their harsh communication environment renders the void problem even more critical [13]. Unlike terrestrial environments, the underwater channel is characterized by high latency, severe attenuation, and continuous node mobility driven by ocean currents [14]. These unique constraints, coupled with the limited energy of underwater nodes that are difficult to replace, make the network topology highly dynamic and unstable [15]. Consequently, UWSNs face more severe challenges in connectivity and void handling compared to terrestrial networks [16]. In recent years, substantial progress has been achieved in research concerning UWSN routing voids. Early research on pure acoustic networks was mostly based on improved vector forwarding mechanisms. Shi and Zhang et al. [17] proposed the Energy-Aware Vector-Based Forwarding (EA-VBF) protocol, in which the next-hop selection was optimized by introducing a residual energy factor, effectively balancing node energy consumption and relay counts. Subsequently, Saleh et al. [18] proposed an Enhanced Vector-Based Forwarding (VBF) protocol, where a void avoidance mechanism was integrated to bypass communication blind spots by adjusting the routing vector. Although network lifetime has been extended and packet loss reduced to a certain extent by these methods, they were designed primarily for relatively ideal static or low-speed networks. The challenges of dynamic voids in complex underwater environments were not fully considered, leading to performance degradation when nodes are sparse or topology changes frequently.

To overcome the limitations of basic vector forwarding in complex topologies, various avoidance strategies based on density perception and intelligent optimization have been proposed. Ullah et al. [19] proposed a routing scheme based on node density, utilizing local density information to dynamically adjust forwarding strategies, thereby improving the survival rate of data packets in sparse regions. Yang et al. [20] applied a greedy discrete Particle Swarm Optimization (PSO) algorithm for cluster-based routing planning, avoiding potential routing traps through global optimization. Similarly, Mahdi et al. [21] proposed a traffic-aware routing protocol (PG-RES) based on pressure gradient and resistance functions. By comprehensively evaluating node depth, load, and residual energy to dynamically adjust transmission paths, this method effectively alleviates network hotspot issues and ensures the reliability of data forwarding. Meanwhile, Liu et al. [22] pointed out the issue of propagation errors in underwater mobile target localization, indirectly corroborating the difficulty of precise location perception. Despite the avoidance effects being improved by the aforementioned methods through intelligent algorithms, high overhead and latency are incurred by complex calculations and frequent signaling interactions when facing drastically changing topologies, and the success rate of avoidance is difficult to guarantee, leading to avoidance failures. However, preventative avoidance is not always sufficient; when avoidance fails, effective repair techniques become the key to ensuring communication. Khan et al. [23] designed an energy-efficient clustering protocol based on arithmetic progression, in which link robustness was enhanced by optimizing cluster head selection. Ahmad et al. [24] proposed a cooperative routing protocol, while Ye et al. [25] designed a routing strategy based on a multi-dimensional trust model and a void-avoidance algorithm. By, respectively, leveraging node cooperation mechanisms and the capability to identify malicious and void nodes, these methods significantly enhance the efficiency of path repairing and the security of data forwarding. Furthermore, regarding severe network segmentation, the utilization of AUVs for auxiliary repair has become a research hotspot. Wang et al. [26] and Chu et al. [27] explored load balancing based on Q-learning and adaptive reward shaping mechanisms, respectively, providing intelligent support for AUV path planning. The effectiveness of utilizing AUVs as mobile relays or for path repair was further demonstrated by Zhu et al. [28], who proposed the Two-Step Adaptive Path Repair (T-SAPR) protocol, and by the research of Kaiser et al. [29,30,31]. However, these methods mostly belong to passive triggering or rely on expensive external resources; consequently, high scheduling costs and slow response speeds are incurred, making it difficult to adequately cope with the rigorous requirements for real-time performance in high-dynamic environments.

With the increasing demand for network intelligence and high bandwidth, the research paradigm has gradually shifted towards the integration of Reinforcement Learning (RL) technology and hybrid acoustic-optical architectures. RL technology has been widely introduced to endow networks with intelligent decision-making capabilities [32]. He et al. [33] proposed a trust update mechanism based on RL, enhancing the security adaptability of the network. Gao et al. [34] designed a Q-learning load balancing protocol that effectively prolonged the network life cycle. Wang et al. [35] and Nandyala et al. [36] proposed depth-information-based RL opportunistic routing and the Q-learning based Trust Aware Routing (QTAR) protocol, respectively, where multi-dimensional state information was further integrated to realize adaptive perception of dynamic topologies [37]. Although the intelligence level of networks has been significantly improved by these RL protocols, most focus on the optimization of conventional routing metrics, and there is still room for exploration in active void prediction and avoidance. Moreover, in the field of hybrid acoustic-optical networks, relevant research is still in its infancy. Zhu et al. [38] proposed the Priority-based Hybrid Void Prediction (PHVP) protocol, which is considered a representative work in this field, attempting to solve the void problem through packet grading and acoustic-optical collaborative processing. Although the potential of cross-medium transmission was preliminarily demonstrated, a prediction mechanism for void formation is lacking in this method, and the repair means are relatively single-modal, failing to fully leverage the intelligent collaborative advantages of hybrid networks [39].

Despite multi-perspective explorations in existing research, three severe challenges are still faced when applied to complex hybrid acoustic-optical networks: (1) Existing routing protocols are limited to local energy sensing when solving the routing void problem, lacking macro control over overall network load balancing. Metrics from economics that measure fairness are rarely introduced into routing decisions by traditional schemes, making it difficult to effectively quantify the equilibrium and fairness of network energy distribution. Consequently, the network is forced to passively endure the formation of routing void rather than actively avoiding them from the source. (2) The applicability of prediction and identification mechanisms for routing voids in dynamic underwater environments remains to be improved. Existing methods are mostly oriented towards static or low-speed pure acoustic networks. They often rely on periodic interaction or complex calculations, which limits their application in hybrid networks. Furthermore, passive identification mechanisms lead to significant latency and overhead. Currently, a collaborative mechanism fusing energy trend prediction with lightweight real-time detection is lacking. This deficiency makes it difficult to precisely anticipate and lock onto voids in the early stages of their formation. (3) Existing void repair strategies are relatively single-modal in their approaches, making it difficult to balance repair costs and success rates. A one-size-fits-all approach, such as single-medium detouring or premature invocation of expensive auxiliary resources, is adopted by current protocols. In hybrid acoustic-optical networks, further in-depth research is required on how to synergistically utilize the high speed of optical communication and the long distance of acoustic communication to construct a graded progressive repair system ranging from low-cost adjustment to high-cost intervention.

To address the aforementioned challenges, the HAO-AVP protocol is proposed in this paper. The core of this protocol lies in constructing a comprehensive scheme that integrates active avoidance, intelligent decision-making, and graded repair. At the routing decision level, a Reinforcement Learning (RL) framework is introduced. Within this framework, Information Entropy and the Gini Coefficient are incorporated into the reward function. This design actively avoids routing voids from the source by quantifying and optimizing the equilibrium of energy distribution and the fairness of load allocation. Meanwhile, at the void handling level, a collaborative mechanism for prediction, identification, and repair is established. A Markov chain energy prediction model is fused with on-demand hop discovery identification technology. Furthermore, a four-level progressive strategy is designed, comprising intra-medium adjustment (optical path adjustment), cross-medium hopping (acoustic-optical switching), path backtracking, and AUV assisted repair. This ensures that high-cost means are enabled strictly level-by-level only after low-cost schemes prove ineffective, thereby achieving efficient resource utilization. The main research contributions of this paper are as follows:

An intelligent routing decision algorithm based on reinforcement learning is proposed. Subsequently, the Gini Coefficient and Information Entropy are introduced into the reward function to quantify the fairness of network energy allocation, which serves to guide load balancing, actively avoid void formation from the source, and prolong the network life cycle.
Addressing the challenges of difficult underwater void prediction and single-modal repair means, a prediction-identification-repair collaborative mechanism is proposed. By fusing Markov prediction with on-demand hop discovery technology, the effects of anticipating node failure risks and locking onto void regions are achieved.
A four-level progressive repair strategy synergizing acoustic and optical advantages is proposed. Means such as intra-medium adjustment and cross-medium hopping are adaptively matched based on the principle of minimum cost, achieving the unified effect of low overhead, high delivery rate, and strong robustness.

2. Problem Statement

In this work, an intuitive description and a formal definition of the “Routing Void” problem within the underwater environment are first presented. The fundamental differences between the characteristics of underwater and terrestrial networks are detailed to clarify the realistic scenarios and technical challenges addressed in this study. Subsequently, the network topology, communication, and energy consumption models are established.

In geographic-based underwater greedy routing strategies, neighbors that are closer to the Sink node than the forwarding node itself are invariably preferred as the next hop. However, due to the sparsity of underwater node deployment and complex submarine terrain obstacles, this forwarding strategy is prone to falling into a “local optimum” dilemma, known as a routing void. A typical routing void scenario is illustrated in Figure 1.

It is assumed that a data packet is emitted by a source node and successfully transmitted to Node B via relay Node A. At this juncture, a search for a next-hop relay node is initiated by Node B. However, within the communication range of Node B (indicated by the dashed circle), only a single neighbor, Node C, is located. Unfortunately, the Euclidean distance from Node C to the Sink node is significantly greater than that from Node B to the Sink (i.e., Node C is situated behind Node B). According to greedy forwarding rules, no next-hop node capable of providing positive geographic progress can be found by Node B. Consequently, the data transmission link is interrupted at this point, and Node B is identified as a void node. In the absence of an effective recovery mechanism, the data packet will be discarded, and subsequent data flows directed towards Node B will be trapped in a dead loop while consuming valuable energy, thereby severely compromising network performance.

It is noteworthy that although the routing void problem is also encountered in Terrestrial Wireless Sensor Networks (TWSNs), the direct application of terrestrial solutions (such as the Right-Hand Rule based on planar graph traversal) is rendered ineffective in underwater environments. This is attributed to the unique physical characteristics and network architecture inherent to the underwater environment. As illustrated in Table 1, a detailed comparison between TWSNs and UWSNs is conducted across multiple dimensions, including communication media, propagation characteristics, topological structures, and the causes of voids.

Table 1.

Comparison of characteristic dimensions between TWSN and UWSN.

Characteristic Dimension	TWSN	UWSN
Communication Medium	RF	Acoustic waves, Underwater optical waves
Propagation Speed	Extremely high (~3 × 10⁸ m/s)	Extremely low (~1.5 × 10³ m/s, Acoustic)
Propagation Delay	Extremely low, negligible	Extremely high;
Available Bandwidth	Wide (MHz~GHz)	Extremely narrow (kHz level, Acoustic)
Network Topology	Mostly 2D, relatively static	3D, highly dynamic due to water currents
Node Localization	GPS available	GPS unavailable
Node Cost & Density	Low cost; dense deployment achievable	High cost; typically sparse deployment
Channel Quality	Relatively reliable	severely affected by multipath effects and noise
Void Characteristics	Mostly static; caused by obstacles or initial deployment	Dynamic and mobile; caused by node energy depletion and movement
Applicability of Void Handling Schemes	Based on 2D geometry; relying on static path discovery (e.g., planar graph traversal)	Requires 3D spatial awareness; adaptable to dynamic topology

Open in a new tab

As indicated in Table 1, the fundamental disparities between the two are primarily manifested in the following aspects: Regarding communication media and latency discrepancies, TWSNs rely on Radio Frequency (RF) waves, characterized by an extremely high propagation speed (≈3 × 10⁸ m/s), rendering link latency negligible. In contrast, underwater acoustic communication is predominantly adopted by UWSNs, where the propagation speed of acoustic waves in water is extremely low (≈1.5 × 10³ m/s), resulting in excessively high propagation delays. Consequently, void detection methods based on frequent handshakes or real-time topology sensing, which are utilized on land, would incur unacceptable temporal overheads in the underwater environment. In terms of spatial dimensionality, TWSNs are typically modeled as two-dimensional planar networks with relatively static nodes. Conversely, UWSNs are typical three-dimensional networks that exhibit high dynamicity due to the influence of ocean currents. This renders void recovery algorithms based on two-dimensional geometry (such as triangulation) difficult to directly extend to the three-dimensional dynamic space. Regarding void characteristics, voids in terrestrial networks are mostly caused by static obstacles or initial deployment, possessing relatively fixed morphologies. However, voids in underwater networks originate not only from sparse deployment but are also more frequently generated dynamically due to node energy depletion or movement with water currents. This phenomenon of “dynamic voids” necessitates stronger adaptability and predictive capabilities in handling mechanisms. In summary, traditional void handling strategies designed for terrestrial environments are unable to adapt to underwater scenarios characterized by high latency, high dynamicity, and three-dimensional features. Therefore, the design of a specialized routing void prediction and recovery mechanism for underwater hybrid acoustic-optical networks is deemed particularly urgent.

3. Underwater Hybrid Acoustic-Optical Network Model Building

The list of symbols is presented in Table 2, in which all key notations utilized in the equations and algorithms are defined to facilitate quick reference.

Table 2.

List of Symbols and Notations.

Symbol	Description	Symbol	Description
Network & Channel Model (Equations (1)–(19))
$d_{i j}$	Transmission distance	λ	Optical wavelength
$f$	Frequency of acoustic wave	$c (λ)$	Beam attenuation coefficient
$\tilde{k}$	Spreading factor, typically 1.5 or 2	$R$	Responsivity
$α (f)$	Absorption coefficient	$σ_{s h o t}^{2}$	Shot noise variance
$S L$	Source Level	$σ_{t h e r m a l}^{2}$	Thermal noise variance
$N L$	Noise Level	$q$	Elementary charge
$D I$	Directivity Index	$B$	Bandwidth
$N_{t} (f), N_{s} (f)$	Turbulence/Shipping noise	$I_{b g}$	Background light irradiance
$N_{w} (f), N_{t h} (f)$	Wave/Thermal noise	$I_{d}$	Detector dark current
$s$	Shipping activity factor	$k_{B}$	Boltzmann constant
$w$	Wind speed	$T$	Absolute temperature
$P_{t x_a}$ $P_{r x_a}$	Acoustic transmission power Acoustic reception power consumption	$E_{t x_a} (L, d_{i j})$ $E_{r x_a} (L)$	Energy consumption for underwater acoustic comm. transmission/reception
$S N R_{t h}$	Min SNR threshold	$R_{L}$	Load resistance
$P_{e l e c}$	the fixed power consumption for circuit processing	$R_{a}$ $R_{o}$	acoustic comm. rate optical comm. rate
$P_{r x_{_o p t}}$ $P_{t x_o p t}$	Received optical power Optical transmission power	$E_{t x_o} (L, d_{i j})$ $E_{r x_o} (L)$	Energy consumption for underwater optical comm. transmission/reception
$η_{t}, η_{r}$	Optical efficiencies	$E_{i n i t}$	initial energy
$A_{r x}$	Receiver aperture area	$E_{r e m} (v_{i}, t)$	residual energy
θ	Beam divergence angle	$v_{i}$	node
$A_{b e a m} (d_{i j})$	Beam area at distance d	${N (v}_{i})$	neighbor nodes of $v_{i}$
$S N R_{o p t}$	the SNR at the receiver	${d (v}_{i}, v_{s})$	distance of $v_{i}$ and $v_{j}$
Proposed HAO-AVP Protocol (Equations (20)–(52))
$s_{i}$	$State vector of v_{i}$	$η$	the risk aversion coefficient
$E_{r e m} (v_{i})$	Normalized residual energy	$σ$	Penalty Range Parameter
${D (v}_{i})$	$Euclidean distance from v_{i}$ to sink	$λ_{1}$	Penalty Intensity
$A (s_{i})$	$the action space of v_{i}$	$w_{s e v}$	Weight for Severity
$N (v_{i})$	$neighbor nodes of v_{i}$	$w_{a f f}$	Weight for Affected Scope
$Q_{l e n} (v_{i})$	the packet queue length	$w_{q l e n}$	Weight for Queue Length
$Q_{t} (s_{i}, a_{j})$	Q-value function	$w_{d i e t}$	Weight for Distance
$α$	Learning rate	$w_{d e v v}$	Weight for Deviation
$R (s_{i}, a_{j})$	the immediate reward	$C o u n t_{b t} (k)$	Backtrack Counter
$γ$	Discount factor	$\| N_{a f f} (k) \|$	Number of Affected Neighbors
$\underset{a^{'} \in A (s_{j})}{m a x} Q_{t} (s_{j}, a^{'})$	the maximum potential future Q-value	${\bar{Q}}_{l e n} (k)$	Average Queue Length
$A (s_{j})$	$the action space of node of v_{j}$	$d (P_{A U V}, L o c_{k})$	Navigation Distance
$w_{1}, w_{2}, w_{3}$	Weighting coefficients	$D_{m a x}$	Max Navigation Range
$R_{p r o g} (v_{j})$	$Progress Reward of v_{j}$	$E_{t r a v e l} (k)$	Travel Energy Consumption
$R_{e n e r g y} (v_{j})$	$Energy Reward of v_{j}$	$E_{A U V_r e m}$	Remaining Energy of AUV
$R_{b a l a n c e} ({v_{i}, v}_{j})$	Equilibrium Reward of $v_{i}$ and $v_{j}$	${\bar{Q}}_{l e n} (k)$	Average Queue Length
$w_{b} (v_{i})$	Dynamic Weighting Factor	$D e v (k)$	Deviation Degree
$H_{e n e r g y} (v_{i})$	Energy Information Entropy	$F (P)$	Fitness Function
$k$	adjustment parameter	$P$	Candidate Position
$σ_{E} (v_{i})$	standard deviation of the residual energy	$M_{c o n n} (P)$	Metric of Connectivity
$G_{l o a d} (v_{i})$	Forwarding Load Gini Coefficient	$w_{c 1}$	Weight for Connectivity
$p_{k}$	The energy proportion	$w_{d}$	Weight for Distance
${\bar{E}}_{r e m}$	Average residual energy	$P_{A U V}$	Current Position of AUV
$L_{k}$	the number of data packets forwarded	$Θ_{m} (t)$	Particle Velocity
$\sum$	Summation symbol	$\tilde{ω}$	Inertia Weight
$\bar{L} (v_{i})$	the average load	$c_{1}, c_{2}$	Learning Factors
$\tilde{q}$	Random Number	$r_{1}, r_{2}$	Random Numbers
$ε$	Epsilon/Exploration Rate	$P_{b e s t, m}$	Personal Best Position
$π_{j}^{(k)}$	$The state probability vector of node n_{j}$ after k time steps.	$P_{m} (t)$	Current Position
$π_{j}^{(0)}$	$the initial state probability distribution vector of n_{j}$	$G_{b e s t}$	Global Best Position
$(P_{j})^{k}$	$The one-step state transition probability matrix of n_{j}$	$P_{m} (t + 1)$	Next Position
$k$	Predicted time step	$C_{r e l a y}$	Cost of Relay Mode
$S_{c r i t}$	Critical/Endangered State	$C_{d e p l o y}$	Cost of Deployment Mode
$P_{v o i d}$	Joint Probability	$P^{*}$	Optimal Position
$P_{t h}$	Void Warning Threshold	${F P (v}_{j})$	$Forward potential of v_{j}$
$N_{p r o g}^{'}$	Set of Uncovered Progress Neighbors	$N_{a} (v_{i})$	$The acoustic neighbors of v_{i}$
$N_{o} (v_{i})$	Optical Neighbor Set	$P_{b}$	the Bit Error Rate (BER)
$N_{c o v e r e d}$	Set of Covered Neighbors	$A^{'} (s_{p})$	New Action Space
$θ_{n e w}$	The minimum required new	$ϵ$	Minimal constant
$ϕ_{i j}$	the deviation angle	$s_{p}$	State of Previous Hop
$δ_{m a r g i n}$	the angular margin	$L S (v_{p}, v_{k})$	Link Stability
$C_{1} (θ_{n e w})$	The energy cost	$C_{t r a v e l} (P^{*})$	Travel Cost
$θ_{o l d}$	Original Divergence Angle	$E_{r e m_a v g} (v_{j})$	$Average Residual Energy of v_{j}$
$θ_{o l d}$	Original Divergence Angle	$P A P R$ $(d (v_{i}, v_{j}), f)$	Predicted Acoustic Packet Reception Rate
$w_{q}$	Potential Energy Term Weighting	$C_{o p}$	Operational Cost
$w_{r}$	Reliability Weighting	$T_{s t a y}$	Stay Duration
$w_{c}$	Cost Weighting	$C_{n o d e}$	Node Cost
CDUE	Utility value/score	$A c t i o n$	Decision Result

Open in a new tab

3.1. Network Topology Model

In this paper, the UWSN is modeled as an undirected graph $G = (V, E)$ within a three-dimensional Euclidean space. All physical entities within the network are represented by the vertex set $V = {v_{1}, v_{2}, \dots, v_{N}} \cup {v_{s}}$ , where the $i$ -th ordinary sensor node is denoted by $v_{i}$ , and its coordinate in the three-dimensional space is given by $(x_{i}, y_{i}, z_{i})$ , where $z_{i}$ represents the deployment depth of the node. The unique Sink Node, situated at the center of the water surface, is represented by $v_{s}$ , which is responsible for collecting all underwater data and communicating with the onshore data center. All potential communication links between nodes are represented by the edge set $E$ . An edge $(v_{i}, v_{j}) \in E$ is considered to exist if and only if the distance between node $v_{i}$ and node $v_{j}$ is less than or equal to the maximum communication range of a specific communication mode. In the hybrid acoustic-optical network, the edge set $E$ can be further decomposed into the union of the acoustic communication link set $E_{a}$ and the optical communication link set $E_{o}$ , which is expressed as $E = E_{a} \cup E_{o}$ .

An acoustic communication link $(v_{i}, v_{j}) \in E_{a}$ is considered to exist provided that the condition $d (v_{i}, v_{j}) \leq R_{a}$ is satisfied, where $R_{a}$ denotes the maximum effective range of underwater acoustic communication. An optical communication link $(v_{i}, v_{j}) \in E_{o}$ is established if and only if two conditions are simultaneously met:

Distance Condition: The condition $d (v_{i}, v_{j}) \leq R_{o}$ must be satisfied, where $R_{o}$ represents the maximum effective range of underwater optical communication.
Angle Condition: Node $v_{j}$ must be located within the coverage range of the beam divergence angle of node $v_{i}$ , and vice versa.

In this topology model, $R_{o}$ is utilized as the upper distance threshold for defining the existence of potential links, while the specific issue of angular alignment is further considered in the routing decision process. Typically, it is observed that $R_{o} ≪ R_{a}$ .

A neighbor node set $N (v_{i}) = {v_{j} | (v_{i}, v_{j}) \in E}$ is maintained by each node $v_{i}$ , which can be similarly subdivided into an acoustic neighbor set $N_{a} (v_{i})$ and an optical neighbor set $N_{o} (v_{i})$ . The network model is illustrated in Figure 2.

Sensor node $n_{i}$ is equipped with independent acoustic and optical transceivers, possessing dual-mode communication capabilities. The energy of nodes is constrained, with the initial energy denoted as $E_{i n i t}$ . It is assumed that the three-dimensional coordinates $(x_{i}, y_{i}, z_{i})$ of each node can be acquired via a specific localization algorithm, and the residual energy $E_{r e s}$ can be sensed. Node positions are subject to dynamic changes due to the impact of ocean currents. The Sink Node is situated at the water surface and is unrestricted in energy; it is responsible for collecting data uploaded from all underwater nodes and communicating with the onshore data center via RF links. The AUV serves as a mobile auxiliary unit, equipped with acoustic-optical communication capabilities and rechargeable energy sources. Within this protocol, beyond routine cruising missions, the AUV is dispatchable for the execution of routing repair tasks.

3.2. Communication Model

Underwater Acoustic Communication Model

The energy consumption associated with underwater acoustic communication is primarily influenced by the propagation loss of signals within the aqueous medium. According to the Thorp model [40,41], the path loss $T L_{a} (d_{i j}, f)$ (unit: dB) from node $v_{i}$ to node $v_{j}$ can be calculated via the following formula:

T L_{a} (d_{i j}, f) = \tilde{k} \cdot 10 \log_{10} (d_{i j}) + d_{i j} \cdot a (f)

(1)

where $d_{i j}$ represents the distance between the transmitting and receiving nodes (in meters); $f$ denotes the frequency of the acoustic wave (in kHz); $\tilde{k}$ is the spreading factor, which is typically set to 1.5 (corresponding to practical spreading) or 2 (corresponding to spherical propagation); and $a (f)$ denotes the frequency-dependent absorption coefficient, the empirical formula of which is given by:

10 \log_{10} a (f) = \frac{0.11 f^{2}}{1 + f^{2}} + \frac{44 f^{2}}{4100 + f^{2}} + 2.75 \times 10^{- 4} f^{2} + 0.003

(2)

The Signal-to-Noise Ratio (SNR) at the receiving node can be expressed as:

S N R_{a} = S L - T L_{a} (d_{i j}, f) - N L + D I

(3)

where $S L$ represents the Source Level, $N L$ denotes the Ambient Noise Level, and $D I$ stands for the Directivity Index. The power spectral density of ambient noise, $N (f)$ , is typically synthesized from four primary noise sources: turbulence noise $N_{t} (f)$ , shipping noise $N_{s} (f)$ , wave noise $N_{w} (f)$ , and thermal noise $N_{t h} (f)$ . According to empirical formulas [4], these noise components (in dB) can be expressed as follows:

10 \log N_{t} (f) = 17 - 30 \log f

(4)

10 \log N_{s} (f) = 40 + 20 (s - 0.5) + 26 \log f - 60 \log (f + 0.03)

(5)

10 \log N_{w} (f) = 50 + 7.5 \sqrt{w} + 20 \log f - 40 \log (f + 0.4)

(6)

10 \log N_{t h} (f) = - 15 + 20 \log f

(7)

where $s$ denotes the shipping activity factor (ranging from 0 to 1), and $w$ represents the wind speed at the sea surface (in m/s). The total power spectral density of ambient noise is expressed as the sum of these four components:

N (f) = N_{t} (f) + N_{s} (f) + N_{w} (f) + N_{t h} (f)

(8)

The transmission power $P_{t x_a}$ required for transmitting a data packet of length $L$ (bits), and the power $P_{r x_a}$ consumed by the node for receiving said packet, are, respectively, given by:

P_{t x_a} \propto 10^{(T L_{a} (d_{i j}, f) + N L - D I + S N R_{t h}) / 10}

(9)

P_{r x_a} = P_{e l e c}

(10)

where $S N R_{t h}$ represents the minimum SNR threshold required for successful demodulation at the receiver, and $P_{e l e c}$ denotes the fixed power consumption for circuit processing.

2.
Underwater Optical Communication Model

The path loss of underwater optical communication is primarily induced by absorption and scattering, following Beer-Lambert’s law. The optical power $P_{r x_o p t}$ received at the receiver can be expressed as:

P_{r x_o p t} (d_{i j}) = P_{t x_o p t} \cdot η_{t} η_{r} \cdot \frac{A_{r x}}{A_{b e a m} (d_{i j})} \cdot e^{- c (λ) d_{i j}}

(11)

where $P_{t x_o p t}$ denotes the optical transmission power. $η_{t}$ and $η_{r}$ represent the optical efficiencies of the transmitter and receiver, respectively. The aperture area of the receiver is denoted by $A_{r x}$ . The area of the optical beam at a distance $d_{i j}$ is indicated by $A_{b e a m} (d_{i j})$ , which is approximated as $π {(d_{i j} \tan (θ / 2))}^{2}$ , where $θ$ represents the beam divergence angle. Furthermore, $c (λ)$ denotes the beam attenuation coefficient, which is dependent on the optical wavelength $λ$ and the water quality.

Shot noise and thermal noise are identified as the primary sources of noise in communication. Consequently, the SNR at the receiver, denoted as $S N R_{o p t}$ , is expressed as:

S N R_{o p t} = \frac{{(R \cdot P_{t x_o p t})}^{2}}{σ_{s h o t}^{2} + σ_{t h e r m a l}^{2}}

(12)

where $P_{t x_o p t}$ represents the optical transmission power; $R$ denotes the responsivity of the photodetector, with a typical value of approximately 0.5 A/W; and $σ_{s h o t}^{2}$ and $σ_{t h e r m a l}^{2}$ represent the variances of shot noise and thermal noise, respectively, which can be specifically expressed as:

σ_{s h o t}^{2} = 2 q B (R \cdot P_{r x_o p t} + I_{b g} A_{r x} R + I_{d})

(13)

σ_{t h e r m a l}^{2} = \frac{4 k_{B} T}{R_{L}} B

(14)

where q represents the elementary charge, with a value of $1.602 \times 10^{- 19}$ Coulombs; $P_{r x_o p t}$ denotes the received optical power, which is calculated via the path loss formula. $I_{b g}$ signifies the background light irradiance, which is influenced by factors such as sunlight and bioluminescence, exhibiting a wide variation range from $10^{- 5} {W / m}^{2}$ (in the deep sea) to $10^{2} {W / m}^{2}$ (near the water surface). $A_{r x}$ is the receiver aperture area, determined by the dimensions of the receiving lens; $B$ represents the bandwidth, which is associated with the target communication rate; $k_{B}$ is the Boltzmann constant, with a value of $1.38 \times 10^{- 23}$ J/K; $T$ denotes the absolute temperature, which can be set to 283 K; and $R_{L}$ is the load resistance. Furthermore, $I_{d}$ represents the detector dark current, which is dependent on the material and quality of the detector.

3.
Energy Consumption Model

The energy consumption incurred by node $v_{i}$ in transmitting a data packet of length $L$ (bits) to node $v_{j}$ , denoted as $E_{t x} (L, d_{i j})$ , and the energy consumption for receiving said data packet, denoted as $E_{r x} (L)$ , are categorized into two modes: acoustic and optical.

Acoustic Mode:

E_{t x_a} (L, d_{i j}) = P_{t x_a} (d_{i j}) \cdot (L / R_{a})

(15)

E_{r x_a} (L) = P_{r x_a} \cdot (L / R_{a})

(16)

where $R_{a}$ denotes the data rate of acoustic communication.

Optical Mode:

E_{t x_o} (L, d_{i j}) = P_{t x_o p t} (d_{i j}) \cdot (L / R_{o})

(17)

E_{r x_o} (L) = P_{r x_o p t} \cdot (L / R_{o})

(18)

where $R_{o}$ denotes the data rate of optical communication. The initial energy of each node $v_{i}$ is denoted as $E_{i n i t}$ , and its residual energy at time $t$ is represented by $E_{r e m} (v_{i}, t)$ .

In UWSNs, the objective of routing is to identify a reliable and efficient path from the source node $v_{s r c}$ to the sink node $v_{s}$ . In greedy routing strategies based on geographic location or depth, a node $v_{j}$ that offers the maximum “progress” among the neighbors is selected by node $v_{i}$ as the next hop. Progress is typically defined as the difference between the distance from the current node to the destination and the distance from the next-hop node to the destination.

Routing Void Node: A node $v_{i}$ currently forwarding a data packet is designated as a Routing Void Node if, within its entire set of neighbor nodes $N (v_{i})$ , there exists no node $v_{j}$ such that $v_{j}$ is closer to the destination sink node than $v_{i}$ . This can be described mathematically as:

\forall v_{j} \in N (v_{i}), d (v_{j}, v_{s}) \geq d (v_{i}, v_{s})

(19)

In addition to the inherent sparsity of the network, the formation of routing voids is significantly attributed to the premature energy depletion of certain nodes (such as nodes in “hotspot” regions near the Sink Node), which is caused by the assumption of excessive data forwarding tasks. Consequently, a “dead zone” composed of failed nodes is formed within the network topology, which similarly results in routing interruptions.

In summary, based on the underwater hybrid acoustic-optical network system model constructed above, it is observed that significant heterogeneity, dynamicity, and energy constraints are exhibited by the network environment. Constrained by the strict “Line-of-Sight” (LoS) transmission requirements of underwater optical communication and the inherent high propagation latency of acoustic communication, coupled with the continuous positional drift of nodes induced by ocean currents, endogenous structural contradictions are faced by the system during actual operation. Specifically, data packets are extremely prone to being trapped in “Routing Voids” with no available path during the greedy forwarding process, due to the sparsity of node deployment and unpredictable topological changes. Simultaneously, given that underwater node batteries are difficult to replace, the premature failure of critical relay nodes is caused by the uneven distribution of network loads due to rapid energy consumption; this, in turn, induces “Routing Voids” or even leads to large-scale network paralysis. The efficient operation of the network is rendered difficult to maintain by traditional single-medium routing or simple greedy forwarding strategies, owing to the communication dead zones and energy efficiency bottlenecks derived from the inherent characteristics of the model. Therefore, to overcome the aforementioned model limitations, the design of a data transmission mechanism characterized by high reliability, low latency, and energy balance within a dynamic environment is deemed necessary.

4. Proposed HAO-AVP Protocol

In consideration of the severe challenges wherein routing voids and communication interruptions are prone to be induced by limited node energy and dynamic topological changes in underwater hybrid acoustic-optical networks, the HAO-AVP is proposed in this paper. The overall architecture of the HAO-AVP protocol is illustrated in Figure 3. Two core mechanisms are systematically integrated into this framework: First, an intelligent routing decision-making mechanism based on reinforcement learning is adopted, which aims to actively avoid voids through load balancing. Second, a “prediction-identification-repair” collaborative void handling mechanism is constructed, which is utilized to passively cope with existing or impending routing interruptions.

Overall Architecture of the HAO-AVP Protocol.

4.1. Reinforcement Learning-Based Routing Decision-Making via Entropy and Gini Coefficient

To address the issues of high dynamicity in underwater network topology and uneven resource allocation, the Q-Learning algorithm within RL is introduced in this paper, whereby nodes are endowed with intelligent next-hop decision-making capabilities. Simultaneously, it is taken into consideration that the reward functions of traditional reinforcement learning algorithms are confined to local communication gains, and a macroscopic grasp of the global network load status is lacking. Consequently, an improved reinforcement learning routing strategy is proposed, in which Information Entropy and the Gini Coefficient are integrated into the design of the reward function. Through this approach, the optimization of local transmission efficiency is achieved while network-wide load balancing and fairness are simultaneously taken into account.

4.1.1. States, Actions, and Q-Value Functions

In this paper, the routing decision process is modeled as a Markov Decision Process (MDP).

State ( $s$ ): For any node $v_{i}$ holding a data packet, its state $s_{i}$ is defined as a vector containing critical local information that influences the decision-making process.

s_{i} = [E_{r e m} (v_{i}), D (v_{i}), Q_{l e n} (v_{i})]

(20)

where $E_{r e m} (v_{i})$ denotes the normalized residual energy of node $v_{i}$ ; $D (v_{i})$ represents the distance from node $v_{i}$ to the sink node $v_{s}$ ; and $Q_{l e n} (v_{i})$ indicates the packet queue length of node $v_{i}$ .

Action ( $a$ ): The action space $A (s_{i})$ of node $v_{i}$ is defined as the set of all its neighbor nodes $N (v_{i})$ . The execution of an action $a_{j} \in A (s_{i})$ corresponds to the selection of neighbor node $v_{j}$ as the next hop for forwarding the data packet.

Q-Value Function: The Q-value function $Q (s_{i}, a_{j})$ represents the long-term expected reward obtainable by selecting node $v_{j}$ as the next hop under state $s_{i}$ . The update of the Q-value is governed by the classic Bellman equation:

Q_{t + 1} (s_{i}, a_{j}) = (1 - α) \cdot Q_{t} (s_{i}, a_{j}) + α [R (s_{i}, a_{j}) + γ \max_{a^{'} \in A (s_{j})} Q_{t} (s_{j}, a^{'})]

(21)

where $α \in [0, 1]$ represents the learning rate; $R (s_{i}, a_{j})$ denotes the immediate reward obtained subsequent to the execution of action $a_{j}$ ; $γ \in [0, 1]$ is the discount factor; and $\max_{a^{'} \in A (s_{j})} Q_{t} (s_{j}, a^{'})$ signifies the maximum potential future Q-value obtainable in the subsequent state $s_{j}$ .

4.1.2. Reward Function Design Based on Entropy and Gini Coefficient

The reward function $R (s_{i}, a_{j})$ is constituted by three components:

R (s_{i}, a_{j}) = w_{1} \cdot R_{p r o g} (v_{j}) + w_{2} \cdot R_{e n e r g y} (v_{j}) + w_{3} \cdot R_{b a l a n c e} (v_{i}, v_{j})

(22)

where $w_{1}$ , $w_{2}$ , and $w_{3}$ are defined as the weighting coefficients, where the condition $w_{1} + w_{2} + w_{3} = 1$ is satisfied.

Progress Reward $R_{p r o g} (v_{j})$ :
$R_{p r o g} (v_{j}) = \frac{d (v_{i}, v_{s}) - d (v_{j}, v_{s})}{\max_{v_{k} \in N (v_{i})} {d (v_{i}, v_{s}) - d (v_{k}, v_{s})}}$ (23)
Energy Reward $R_{e n e r g y} (v_{j})$ :
$R_{e n e r g y} (v_{j}) = E_{r e m} (v_{j})$ (24)
Equilibrium Reward $R_{b a l a n c e} (v_{i}, v_{j})$ :

This reward is calculated within the neighbor node set $N (v_{i})$ of node $v_{i}$ ; it is jointly determined by the energy information entropy and the forwarding load Gini coefficient, and is regulated by a dynamic weighting factor $ω_{b}$ .

R_{b a l a n c e} (v_{i}, v_{j}) = ω_{b} (v_{i}) \cdot (1 - G_{l o a d} (v_{i})) + (1 - ω_{b} (v_{i})) \cdot \frac{H_{e n e r g y} (v_{i})}{\log_{2} (| N (v_{i}) |)}

(25)

Dynamic Weighting Factor $ω_{b} (v_{i})$ : The emphasis placed on fairness (the Gini Coefficient) and equilibrium (Entropy) is dynamically adjusted by this factor, in accordance with the standard deviation $σ_{E}$ of the energy distribution among neighbor nodes.

ω_{b} (v_{i}) = \tanh (k_{1} \cdot σ_{E} (v_{i}))

(26)

where $k_{1}$ is defined as the adjustment parameter, and $σ_{E} (v_{i}) = \sqrt{\frac{1}{| N (v_{i}) |} \sum_{v_{k} \in N (v_{i})} {(E_{r e m} (v_{k}) - {\bar{E}}_{r e m})}^{2}}$ denotes the standard deviation of the residual energy of neighbor nodes. When a significant energy discrepancy is observed, $ω_{b}$ approaches 1, whereby greater emphasis is placed on load fairness; conversely, the equilibrium of the overall energy distribution is prioritized.

Energy Information Entropy $H_{e n e r g y} (v_{i})$ : The energy proportion $p_{k}$ is expressed as $p_{k} = E_{r e m} (v_{k}) / \sum_{v_{m} \in N (v_{i})} E_{r e m} (v_{m})$ .

H_{e n e r g y} (v_{i}) = - \sum_{v_{k} \in N (v_{i})} p_{k} \log_{2} (p_{k})

(27)

Forwarding Load Gini Coefficient ( $G_{l o a d} (v_{i})$ ): It is assumed that the number of data packets forwarded by each neighbor node $v_{k}$ within the past time window is denoted as $L_{k}$ .

G_{l o a d} (v_{i}) = \frac{\sum_{k = 1}^{|N (v_{i})|} \sum_{m = 1}^{|N (v_{i})|} |L_{k} - L_{m}|}{2 |{N (v_{i})|}^{2} \bar{L} (v_{i})}

(28)

where the average load of the neighbor nodes is represented by $\bar{L} (v_{i}) = \frac{1}{|N (v_{i})|} \sum_{k = 1}^{|N (v_{i})|} L_{k}$ .

4.
Design Principles of the Compound Reward Function and Theoretical Implications on Convergence Behavior

To ensure the convergence and robustness of the routing strategy within the highly dynamic underwater environment, a compound reward function comprising three complementary terms is proposed in this paper. Each term addresses a specific optimization objective, collectively reshaping the Q-value manifold within the Markov Decision Process (MDP), thereby influencing the convergence behavior of the agent:

Composition and Theoretical Role of Reward Terms

(1) Progress Reward $R_{p r o g}$ : Providing Directional Gradient.

This term serves as the fundamental driving force for routing convergence. By rewarding nodes capable of providing positive geographic progress toward the sink node, a gradient field with explicit directionality is constructed within the Q-value space by $R_{p r o g}$ . Theoretically, random walks or loops are prevented by this mechanism, ensuring that the selected path strictly converges geometrically toward the destination.

(2) Energy Reward $R_{e n e r g y}$ : Establishing Feasibility Constraints.

Although the shortest path is guaranteed by $R_{p r o g}$ , the survivability of nodes is neglected. The selection of nodes with low residual energy is penalized by $R_{e n e r g y}$ , introducing an energy-aware constraint. During the convergence process, the attractors of the Q-value landscape are corrected by this term, shifting the convergence point of the optimal policy from the geometric shortest path to an energy-sustainable path, thereby preventing premature link breakage caused by single-point depletion.

(3) Balance Reward ( $R_{b a l a n c e}$ ): Enhancing Robustness via Regularization.

This constitutes the core innovation of the proposed protocol. By integrating Information Entropy and the Gini coefficient, $R_{b a l a n c e}$ functions as a regularization term, imposing a penalty on behaviors where traffic is concentrated on a single optimal node. The peaks of the globally optimal solution in the Q-value function are smoothed, encouraging a probabilistic distribution of traffic among multiple healthy neighbors.

Impact on Convergence Behavior under Dynamic Topology

Under conditions of highly dynamic topology changes, winner-takes-all convergence is often induced by traditional greedy strategies. This state of convergence is fragile—once the optimal node fails due to movement or energy depletion, severe oscillations are faced by the Q-value table. Upon the introduction of the Entropy-Gini term $R_{b a l a n c e}$ , the convergence objective is transformed from a static single optimal solution to a dynamic multi-path routing set. This probabilistic convergence behavior enables the agent to maintain multiple high-value neighbors simultaneously. When drastic changes occur in the network topology, natural redundancy is provided by this wide policy space, allowing the algorithm to rapidly switch to backup paths, thereby maintaining the stability of the convergence state.

4.1.3. Learning and Decision-Making Process

A Q-table is maintained by each node $v_{i}$ . When a data packet is available for transmission, the next hop $v_{j}^{*}$ is selected by adopting the $ε - greedy$ strategy.

v_{j}^{*} = \{\begin{matrix} \arg \max_{v_{j} \in N (v_{i})} Q (s_{i}, a_{j}) & if \tilde{q} > ε \\ random choice from N (v_{i}) & if \tilde{q} \leq ε \end{matrix}

(29)

where $\tilde{q}$ represents a random number distributed within the interval [0,1]. Upon the successful forwarding of a data packet, information regarding the next state is acquired by node $v_{i}$ via mechanism such as listening or ACK. Subsequently, its Q-table is updated in accordance with the Bellman equation (Equation (21)).

In summary, the specific execution flow of the reinforcement learning-based intelligent next-hop decision-making process is summarized in Algorithm 1. It is detailed in this algorithm how the local state is observed by a node, and how the Q-value is iteratively updated through the calculation of a composite reward function incorporating energy entropy and the Gini coefficient, whereby intelligent routing decision-making is realized.

Algorithm 1: RL-Based Routing with Entropy & Gini Reward

Input: Current Node

n_{i}

, Neighbor Set N_{i}

Output: Next Hop

n_{n e x t}

Observe State S_{t} = (E_{r e s}, D i s t_{\sin k}, Q_{l e n})

2: // Action Selection (Epsilon-Greedy)

IF r a n d o m () < ε

THEN Select random n_{n e x t}

form N_{i}

ELSE n_{n e x t} = a r g m a x_{n_{j}} Q (S_{t}, n_{j})

Forward Packet to n_{n e x t}

and Observe S_{t + 1}

6: // Reward Calculation (Core Innovation)

Compute R_{p r o g r e s s}

and R_{e n e r g y}

using Equations (23) and (24)

Compute R_{e q u i l i b r i u m}

based on Entropy Equation (27) & Gini Equation (28)

R_{t o t a l} = w 1 \cdot R_{p r o g r e s s} + w 2 \cdot R_{e n e r g y} + w 3 \cdot R_{e q u i l i b r i u m}

10: // Update Q-Value

11:

Q (S_{t}, n_{n e x t}) = (1 - α) \cdot Q (S_{t}, n_{n e x t}) + α \cdot (R_{t o t a l} + γ \cdot m a x (Q (S_{t + 1})))

Open in a new tab

4.2. Collaborative Routing Void Handling Mechanism

In response to the issue wherein the formation of underwater routing voids is characterized by concealment and difficulty in real-time capture, a Markov chain model is introduced to actively predict the energy depletion trends of nodes. Simultaneously, given that the risk of false alarms exists in a single probability prediction model and sudden interruptions in the physical topology cannot be perceived, a collaborative void prediction and identification mechanism is proposed. Within this mechanism, background energy trend prediction is deeply integrated with foreground on-demand hop discovery technology, whereby the timely perception of potential circuit break risks by nodes is ensured, and void regions are precisely locked.

4.2.1. Markov Chain-Based Void Prediction

Potential routing voids are predicted by monitoring the residual energy variation trends of neighbor nodes. The energy state of each neighbor node $v_{j}$ is divided into $M$ discrete levels, denoted as $S = {S_{1}, S_{2}, \dots, S_{M}}$ . An $M \times M$ energy state transition probability matrix, $P_{j}$ , is constructed by observing the frequency of transitions between different energy levels over a long period.

Using this Markov model, the probability $π_{j}^{(k)} (S_{c r i t})$ that node $v_{j}$ transitions to the “critical” state $S_{c r i t}$ after $k$ time synchronization steps, given the current state $S_{x}$ , can be predicted as follows:

π_{j}^{(k)} = π_{j}^{(0)} \cdot {(P_{j})}^{k}

(30)

where $π_{j}^{(0)}$ is defined as the initial state probability distribution vector. A routing void warning is triggered when the joint probability $P_{v o i d}$ , representing the scenario where the entire set of “progress” neighbors of node $v_{i}$ , denoted as $N_{p r o g} (v_{i}) = {v_{j} \in N (v_{i}) | d (v_{j}, v_{s}) < d (v_{i}, v_{s})}$ , enters the “critical” state within the next $k$ steps—exceeds the warning threshold $P_{t h}$ .

P_{v o i d} = \prod_{v_{j} \in N_{p r o g} (v_{i})} π_{j}^{(k)} (S_{c r i t}) > P_{t h}

(31)

4.2.2. Void Identification via On-Demand Hop Discovery

A lightweight, on-demand hop discovery mechanism is initiated when node $v_{i}$ is confirmed to be trapped in a routing void (i.e., $N_{p r o g} (v_{i}) = Ø$ ). A “VOID_DISCOVERY” request packet containing a Time-To-Live (TTL) limit is broadcast by $v_{i}$ to all its neighbors. Upon receipt of this request, the hop count to the sink node, $H (v_{j}, v_{s})$ , is returned by neighbor $v_{j}$ , provided that $v_{j}$ is not a void node itself. By collecting this feedback, whether an accessible path exists in the vicinity can be rapidly ascertained by $v_{i}$ , whereby a basis for selecting a recovery strategy is provided.

In summary, to achieve precise anticipation and rapid localization of voids, the detailed execution steps of the collaborative void prediction and identification algorithm are summarized in Algorithm 2. Through the coordination of background trend prediction and foreground real-time detection, the timely perception of potential circuit break risks by nodes is ensured by this algorithm.

Algorithm 2: Void Prediction and Identification

Input: Neighbors

N_{i}

, Energy History H_{E}

1: // --- Proactive Prediction (Background Process) ---

Update Markov Transition Matrix based on H_{E}

Calculate Probability P_{d a n g e r}

of neighbors entering “Endangered State”

IF J o i n t_{P r o b a b i l i t y} (P_{d a n g e r}) > T h r e s h o l d_{w a r n i n g}

THEN

5: Trigger Void Alert and update Neighbor List

6: END IF

7: // --- Real-time Identification (Forwarding Process) ---

8: IF no neighbor provides progress to Sink THEN

Broadcast V O I D_{D I S C O V E R Y}

(TTL)

10:

Receive H o p_{C o u n t}

feedback from neighbors

11: IF valid path exists THEN Update Routing Table

12: ELSE Trigger Hierarchical Repair (Algorithm 3)

13: END IF

Open in a new tab

4.3. Graded Void Repair Mechanism

To address the issues of link interruption and packet loss caused by underwater routing voids, conventional path backtracking or single-medium repair mechanisms are introduced to attempt communication recovery. Simultaneously, it is considered that balancing repair success rate and network energy consumption is difficult for existing single repair means in complex hybrid acoustic-optical environments (e.g., limited coverage of purely optical repair and excessive energy consumption of purely acoustic repair). Therefore, a graded progressive void repair strategy is proposed, wherein intra-medium optical path adjustment, cross-medium acoustic hopping, path backtracking, and AUV-assisted repair are deeply integrated. The dual optimization of high delivery rate and low energy consumption is realized, while the lowest cost solution is adaptively matched according to the severity of the void. Upon confirmation of a routing void, a four-level progressive repair strategy is initiated by the HAO-AVP protocol.

Level 1: Intra-medium Adjustment (Optical Path Adjustment)

If the data packet is currently being transmitted via optical communication and a void is encountered, repair is attempted by node $v_{i}$ within the optical communication medium. The objective of this strategy is to identify a minimum beam divergence angle increment $Δ θ$ . Let $θ_{o l d}$ be denoted as the initial divergence angle. The set of uncovered progress neighbors is defined as:

N_{p r o g}^{'} = {v_{j} \in N_{o} (v_{i}) \ N_{c o v e r e d} | d (v_{j}, v_{s}) < d (v_{i}, v_{s})}

(32)

The minimum required new divergence angle is given by:

θ_{n e w} = \min_{v_{j} \in N_{p r o g}^{'}} {2 ϕ_{i j}} + δ_{m a r g i n}

(33)

where $ϕ_{i j}$ denotes the deviation angle of $v_{j}$ , and $δ_{m a r g i n}$ represents the angular margin. The energy cost associated with increasing the divergence angle can be modeled as:

C_{1} (θ_{n e w}) = E_{t x_o} \cdot {(\frac{\tan (θ_{n e w} / 2)}{\tan (θ_{o l d} / 2)})}^{2}

(34)

Decision Rule: This repair is executed provided that $N_{p r o g}^{'} \neq Ø$ , $θ_{n e w} \leq θ_{m a x}$ , and $C_{1} (θ_{n e w}) < T_{1}$ (cost threshold). Otherwise, the process proceeds to the second level.

2.
Level 2: Cross-medium Hopping (Acoustic-Optical Switching)

If optical path adjustment fails, “cross-medium hopping” is performed by utilizing the long-distance characteristics of acoustic communication. A Comprehensive Detour Utility Evaluation Function (CDUE), denoted as $U (v_{j})$ , is designed in this protocol to guide the decision-making process:

U (v_{j}) = w_{q} \cdot (\frac{F P (v_{j})}{\max_{v_{k} \in N_{a} (v_{i})} F P (v_{k}) + ϵ} \cdot e^{\frac{E_{r e m_a v g} (v_{j})}{E_{i n i t}}}) + w_{r} \cdot PAPR (d (v_{i}, v_{j}), f) - w_{c} \cdot \frac{E_{t x_a} (L, d (v_{i}, v_{j}))}{E_{r e m} (v_{i})}

(35)

where $w_{q}$ , $w_{r}$ , $w_{c}$ are defined as weighting coefficients, and $ϵ$ is a small constant used to prevent division by zero. $E_{r e m_a v g} (v_{j})$ denotes the average residual energy of the optical neighbors of $v_{j}$ . $F P (v_{j})$ represents the forward optical potential of node $v_{j}$ :

F P (v_{j}) = |{v_{k} \in N_{o} (v_{j}) | d (v_{k}, v_{s}) < d (v_{j}, v_{s})}|

(36)

$PAPR (d, f)$ is defined as the predicted packet reception rate of the acoustic link:

PAPR (d, f) = {(1 - P_{b} (d, f))}^{L}

(37)

where $P_{b} (d, f)$ denotes the Bit Error Rate (BER), and $L$ represents the packet length.

Decision Rule: The acoustic neighbor $v_{j}^{*}$ that maximizes the comprehensive detour utility is selected by node $v_{i}$ as the next hop:

v_{j}^{*} = \arg \max_{v_{j} \in N_{a} (v_{i})} U (v_{j})

(38)

The cost of this operation is defined as the energy consumption of acoustic communication, $C_{2} = E_{t x_a} (L, d (v_{i}, v_{j}^{*}))$ .

3.
Level 3: Path Backtracking

Path backtracking is initiated when node $v_{i}$ is trapped in a complete void. Upon receipt of the $BACKTRACK (v_{i}, H)$ message, a new next hop $v_{k}^{*}$ must be intelligently selected by the previous hop node $v_{p}$ to avoid falling into the void again. To this end, a Backtrack Utility (BU) function is designed:

v_{k}^{*} = \arg \max_{v_{k} \in A^{'} (s_{p})} {Q (s_{p}, a_{k}) \cdot L S {(v_{p}, v_{k})}^{η} - λ_{1} \cdot \exp (- \frac{d {(v_{k}, v_{i})}^{2}}{2 σ^{2}})}

(39)

where $A^{'} (s_{p}) = N (v_{p}) \ ({Blacklist}_{p} \cup H)$ is defined as the new action space.

Model Derivation and Analysis: The concept of risk aversion in backtracking scenarios is embodied by this decision model. $Q (s_{p}, a_{k})$ is maintained as the basis for decision-making, representing the long-term expected return of selecting node $v_{k}$ . $L S {(v_{p}, v_{k})}^{η}$ is defined as the link stability factor, where $L S (v_{p}, v_{k}) \in [0, 1]$ denotes the link quality evaluated based on historical communication success rates, and $η \geq 1$ represents the risk aversion coefficient. As $η$ increases, a stronger inclination is shown by the decision-making process toward selecting links with extremely stable historical performance.

The final term is identified as a Gaussian penalty term. A penalty is imposed on candidate nodes $v_{k}$ that are geographically proximate to the known void node $v_{i}$ . Where $d (v_{k}, v_{i})$ denotes the distance between the two nodes, $σ$ controls the range of the penalty, and $λ_{1}$ represents the intensity of the penalty. The introduction of this penalty term is based on a reasonable assumption: voids are typically regional in nature; therefore, a higher risk of becoming trapped in a void is possessed by nodes located near a known void point. Through this penalty term, data packets are guided to actively bypass the explored void regions.

If $A^{'} (s_{p}) = Ø$ , node $v_{p}$ is considered to be trapped in a void as well, and backtracking is continued to its previous hop.

4.
Level 4: AUV-Assisted Repair

When backtracking is repeatedly observed in a region, indicating the existence of large-scale network partitioning, AUV-assisted repair is triggered.

Dynamic Joint Evaluation Model of Task Urgency and Repair Benefit: A comprehensive priority score $S_{k}$ is evaluated by the AUV for each repair request $k$ (originating from node $v_{k}$ ):

\begin{matrix} S_{k} = (w_{s e v} \cdot \log (1 + C o u n t_{b t} (k)) + w_{a f f} \cdot |N_{a f f} (k)|) \cdot (1 + w_{q l e n} \cdot {\bar{Q}}_{l e n} (k)) \\ - (w_{d i s t} \cdot \frac{d (P_{A U V}, L o c_{k})}{D_{m a x}} + w_{e n e r g y} \cdot \frac{E_{t r a v e l} (k)}{E_{A U V_r e m}} + w_{d e v} \cdot Dev (k)) \end{matrix}

(40)

where the $w$ series are defined as weighting coefficients. $C o u n t_{b t} (k)$ is denoted as the backtrack counter, $|N_{a f f} (k)|$ represents the number of affected neighbors, ${\bar{Q}}_{l e n} (k)$ is the regional average queue length, $d (P_{A U V}, L o c_{k})$ is the navigation distance, $E_{t r a v e l} (k)$ is the travel energy consumption, and $Dev (k)$ denotes the degree of deviation from the main mission route.

Calculation of Optimal Repair Point (PSO): Upon determination of the task, the optimal repair position $P^{*}$ is solved by the AUV via the PSO algorithm. The fitness function is given by:

F (P) = w_{c 1} \cdot M_{c o n n} (P) - w_{d} \cdot d (P_{A U V}, P)

(41)

where $M_{c o n n} (P)$ represents the quantity of void-surrounding nodes that can be connected at position $P$ . Particle Velocity Update:

Θ_{m} (t + 1) = \tilde{ω} \cdot Θ_{m} (t) + c_{1} r_{1} (P_{b e s t, m} - P_{m} (t)) + c_{2} r_{2} (G_{b e s t} - P_{m} (t))

(42)

Particle Position Update:

P_{m} (t + 1) = P_{m} (t) + Θ_{m} (t + 1)

(43)

where $ω$ is the inertia weight, $c_{1}$ and $c_{2}$ are learning factors, $r_{1}$ and $r_{2}$ are random numbers, $P_{b e s t, m}$ denotes the historical best position of particle $m$ , and $G_{b e s t}$ represents the global best position.

Repair Mode Decision:

Temporary Relay Cost:

C_{r e l a y} = C_{t r a v e l} (P^{*}) + C_{o p} \cdot T_{s t a y}

(44)

New Node Deployment Cost:

C_{d e p l o y} = C_{t r a v e l} (P^{*}) + C_{n o d e}

(45)

Decision Rule:

Action = \{\begin{array}{l} Deploy Node & if C_{d e p l o y} < C_{r e l a y} \\ Act as Relay & otherwise \end{array}

(46)

Through this set of graded repair mechanisms, the most appropriate solution is flexibly and efficiently selected by the HAO-AVP protocol according to the severity of the void. Valuable network resources are conserved to the maximum extent while network reliability is ensured.

In summary, to minimize network energy consumption while guaranteeing the repair success rate, this cost-aware four-level progressive repair logic is summarized as Algorithm 3. The principle of “low cost priority” is followed by this algorithm, wherein intra-medium adjustment, cross-medium hopping, path backtracking, and AUV assistance are sequentially attempted through logical judgment until a new forwarding path is successfully established.

Algorithm 3: Hierarchical Void Repair Strategy

Input: Void Node

n_{v o i d}

, Data Packet P

1: // Level 1: Intra-medium Adjustment (Optical)

IF (M e d i u m = O p t i c a l

) AND (C o s t (N e w_{A n g l e}) < T h r e s h o l d

) THEN

3: Adjust Divergence Angle and Retransmit P; RETURN Success

4: END IF

5: // Level 2: Cross-medium Jumping (Acoustic)

6: Calculate CDUE for all acoustic neighbors using Equation (35)

IF m a x (C D U E) > 0

THEN

8: Switch to Acoustic Mode; Forward P to best neighbor; RETURN Success

9: END IF

10: // Level 3: Path Backtracking

11:

Send B a c k t r a c k_{M s g}

to P r e v i o u s_{H o p}

12:

P r e v i o u s_{H o p}

selects new path using Backtrack Utility Equation (39)

13:

IF P a t h_{F o u n d}

THEN RETURN Success

14: // Level 4: AUV-Assisted Repair

15: Request AUV assistance

16: AUV calculates Optimal Position using PSO Equations (41)–(43)

17: AUV performs Relay or Deployment based on Cost Equation (46); RETURN Success

Open in a new tab

5. Simulation Experiments and Performance Analysis

The performance of the proposed HAO-AVP is comprehensively evaluated in this chapter through simulation experiments. HAO-AVP is compared with five representative existing protocols, RLORP-DI, PHVP, ERR-UWSN, SOVHAR, and T-SAPR, under various network scenarios. Furthermore, the advantages of the proposed protocol in terms of reliability, efficiency, and load balancing are verified via a series of performance metrics.

5.1. Simulation Environment and Parameter Settings

Simulation Environment

The simulation experiments in this study are implemented on the MATLAB R2013b platform. A three-dimensional underwater network simulation environment is constructed to simulate the node deployment, mobility, communication, and energy consumption of hybrid acoustic-optical networks. The communication and energy consumption models described in Section 4 are integrated into this environment, whereby key processes such as packet generation, multi-hop forwarding, and void formation and repair can be simulated. The specific hardware configurations and software environment parameters adopted in the experiments are presented in Table 3.

2.
Simulation Parameters

Table 3.

Software and Hardware Environment Configurations for Simulation Experiments.

Name	Setting
CPU	Intel(R) Core(TM) Ultra9-185H
Frequency	2.30 GHz
RAM	32.0 GB
Hard drive	1 TB
GPU	NVIDIA GeForce RTX 4060 Laptop GPU
VRAM	8.0 GB
Operating system	Windows 11
Language	MATLAB R2019b

Open in a new tab

In order to align the simulation environment more closely with realistic underwater scenarios and to ensure the comparability of experimental results, classic literature in relevant fields was primarily referenced for the simulation parameters in this study. The detailed parameter settings are presented in Table 4.

3.
Performance Evaluation Metrics

Table 4.

Simulation Parameters.

Parameter Category	Parameter Name	Value/Description
Network Parameters	Simulation Area	$1000 \times 1000 \times 1000 m^{3}$
	Number of Nodes	50–300
	Initial Energy of Nodes	1000 J
	Node Mobility Speed	0–5 m/s
Communication Parameters	$Max Acoustic Range R_{a}$	800 m
	$Max Optical Range R_{o}$	80 m
	Acoustic Data Rate	5 kbps
	Optical Data Rate	100 Mbps
Protocol Parameters	$Learning Rate α$	0.1
	$Discount Factor γ$	0.9
	$constant ϵ$	Initially 1.0, decaying over time
	$Reward Function Weights w_{1}$ $, w_{2}$ $, w_{3}$	0.4, 0.3, 0.3
	AUV Speed	5 m/s

Open in a new tab

To comprehensively evaluate protocol performance, the following five core metrics are adopted in this paper. All numerical values of the indicators are obtained by averaging the results of multiple simulation runs to eliminate errors caused by randomness.

Packet Delivery Ratio (PDR): This is defined as the ratio of the total number of data packets successfully reaching the sink node to the total number of data packets sent by all source nodes. A higher PDR indicates that stronger reliability and void handling capability are possessed by the protocol. Its calculation formula is given by:

PDR = \frac{P_{r e c e i v e d}}{P_{s e n t}} \times 100 %

(47)

where $P_{r e c e i v e d}$ denotes the total number of unique data packets successfully received by the sink node, and P_sent represents the total number of data packets sent by all source nodes in the network.

Average End-to-End Delay: The communication efficiency of the network is reflected by this metric. It is defined as the average time consumed by all successfully delivered data packets from their generation at the source node to their final reception by the sink node. Lower latency indicates that higher efficiency is achieved by the protocol. Its calculation formula is expressed as:

{Delay}_{avg} = \frac{\sum_{k = 1}^{P_{r e c e i v e d}} (T_{a r r i v a l, k} - T_{g e n e r a t e, k})}{P_{r e c e i v e d}}

(48)

where $T_{a r r i v a l, k}$ represents the time at which the $k$ -th successfully received data packet arrives at the sink node, and $T_{g e n e r a t e, k}$ denotes the time at which the said data packet is generated at the source node.

Network Lifetime: This metric is utilized to measure the durability and overall energy efficiency of the network. In this paper, it is defined as the duration from the commencement of network operation until the first sensor node fails (dies) due to energy depletion. A longer lifetime indicates that stronger energy management and load balancing capabilities are demonstrated by the protocol. Its mathematical definition is given by:

Lifetime = \min \{t | \exists v_{j} \in V, E_{r e m} (v_{j}, t) \leq 0\}

(49)

where V is the set of all nodes in the network, and $E_{r e m} (v_{j}, t)$ represents the residual energy of node $v_{j}$ at time $t$ .

Number of Alive Nodes: The overall health status of the network as it evolves over time is dynamically reflected by this metric. It is defined as the quantity of nodes within the network possessing residual energy greater than zero at any given instant $t$ (or at a specific round) during the simulation. Its value at time $t$ is expressed by the following equation:

N_{alive} (t) = |\{v_{j} \in V ∣ E_{r e m} (v_{j}, t) > 0\}|

(50)

where $| \cdot |$ denotes the cardinality of the set.

Gini Coefficient of Energy Consumption: This metric is employed to quantify the degree of equilibrium in the distribution of residual energy across all nodes in the network, serving as a critical indicator for measuring load balancing performance. The range of the Gini coefficient is [0,1]. A more balanced energy consumption among nodes and a fairer network load allocation are indicated by a value closer to 0, whereas a larger disparity in energy consumption and the existence of severe “hotspot” issues are indicated by a value closer to 1. Its calculation formula is given by:

G = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{N} |E_{r e m, i} - E_{r e m, j}|}{2 N \sum_{k = 1}^{N} E_{r e m, k}}

(51)

where $N$ represents the total number of nodes in the network, and $E_{r e m, i}$ and $E_{r e m, j}$ denote the residual energy of node $i$ and node $j$ , respectively.

5.2. Simulation Results and Analysis

5.2.1. Performance Analysis with Varying Number of Nodes

In this section, the performance of the proposed HAO-AVP protocol is comprehensively analyzed through simulation experiments under different node densities. Specifically, four core metrics—PDR, Average End-to-End Delay, Network Lifetime, and the Energy Consumption Gini Coefficient—are evaluated. Furthermore, the intrinsic correlations among these metrics are explored. The comparative performance analysis results under varying node densities are illustrated in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9.

Number of Nodes and PDR:

Figure 4 illustrates the relationship between the number of nodes and the PDR. As can be observed from Figure 4, with an increase in the number of nodes, an upward trend in PDR is exhibited by all protocols. This is attributed to the improvement in network connectivity and the increase in path redundancy. By virtue of its advantages in hybrid acoustic-optical communication, RL-based intelligent decision-making, and a robust graded repair mechanism (incorporating “cross-medium hopping” and “AUV assistance”), communication segmentation in sparse networks is effectively bridged by the HAO-AVP protocol. A consistently high PDR is maintained, which approaches saturation as the number of nodes increases, reaching a maximum of approximately 96.8%. At 300 nodes, compared to PHVP, T-SAPR, DROR, ERR-UWSN, and SOVHAR, the PDR of HAO-AVP is improved by 2.08%, 13.54%, 8.33%, 11.45%, and 18.08%, respectively. Limited PDR improvements are shown by other protocols due to limitations in their routing decisions or repair mechanisms. When the density becomes high enough that the network is nearly “fully connected,” the physical upper limit is approached by the PDR, and marginal benefits become negligible.

2.
Node Quantity and Average End-to-End Delay:

Figure 5 depicts the relationship between the number of nodes and the average end-to-end delay. As indicated in Figure 5, when the number of nodes is 350, a significant advantage is demonstrated by HAO-AVP. Compared with PHVP, DROR, T-SAPR, ERR-UWSN, and SOVHAR, the delay is reduced by approximately 15.6%, 19.5%, 22.6%, 30.2%, and 33.5%, respectively. This advantage is primarily attributed to the prioritized utilization of high-speed optical links for transmission by HAO-AVP, combined with RL intelligent decision-making to actively avoid congestion, whereby transmission and retransmission times are effectively reduced. In contrast, pure acoustic protocols such as DROR and ERR-UWSN are constrained by the low speed of sound, resulting in high base latency. Although hybrid acoustic-optical communication is utilized by PHVP, a global prediction mechanism is lacking; meanwhile, due to the involvement of waiting times for AUV scheduling in T-SAPR, a larger overall latency is incurred.

Average End-to-End Delay vs. Number of Nodes.

3.
Node Quantity and Network Lifetime:

Figure 6 illustrates the comparison of the network lifetime under different node densities. As demonstrated in Figure 6, when the number of nodes is 350, a significant advantage is exhibited by the HAO-AVP protocol. Compared with PHVP, T-SAPR, DROR, ERR-UWSN, and SOVHAR, the network lifetime is extended by approximately 27.3%, 32.6%, 35.5%, 7.7%, and 50.0%, respectively. The primary reason for this result is attributed to the fact that Information Entropy and the Gini Coefficient are innovatively incorporated into the Reinforcement Learning reward function by HAO-AVP. Consequently, the quantification and optimization of the fairness of network energy distribution are realized, thereby effectively avoiding the premature failure of critical nodes caused by excessive load (i.e., the “energy hotspot” problem). In contrast, although path connectivity is considered by protocols such as SOVHAR and T-SAPR, a lack of fine-grained control over global load balancing is exhibited, leading to uneven energy consumption among nodes and a shortened overall network lifespan. Furthermore, while an energy-aware mechanism is possessed by ERR-UWSN, its balancing capability in high-density networks is still considered less precise than the entropy-weighted strategy of HAO-AVP.

4.
Node Quantity and Energy Consumption Gini Coefficient:

Figure 7 illustrates the relationship between the number of nodes and the energy consumption Gini coefficient. As indicated in Figure 7, a more balanced energy consumption is represented by a lower energy consumption Gini coefficient. The lowest Gini coefficient is maintained by the HAO-AVP protocol across all node densities. This is directly attributed to the optimization of Information Entropy and the Gini Coefficient within its RL reward function, whereby superior load balancing is achieved, and the overuse of hotspots is avoided. For instance, when the number of nodes is 200, the energy consumption Gini coefficient of the proposed method is reduced by 29.8% relative to that of ERR-UWSN. Higher Gini coefficients are exhibited by other protocols due to the lack of explicit guidance for global balancing. In general, as node density increases and path options multiply, a slight decrease in the Gini coefficients is observed across all protocols, which is consistent with the trend of extended network lifetime.

5.
Node Quantity and Void Identification Rate and Void Recovery Rate:

Figure 8 illustrates the performance of the void identification rate with respect to the number of nodes. As demonstrated in Figure 8, the performance comparison regarding the void identification rate between the proposed HAO-AVP protocol and five comparative protocols (PHVP, DROR, T-SAPR, ERR-UWSN, and SOVHAR) is presented under different node densities (ranging from 50 to 350 nodes).

Gini Coefficient of Energy Consumption vs. Number of Nodes.

Hole Detection Results vs. Number of Nodes.

Overall, as node density increases, an upward trend in the identification rate is exhibited by all protocols. A consistently high void identification rate is maintained by HAO-AVP, reaching approximately 99.2% when the number of nodes is 350. This is primarily attributed to the integration of the collaborative mechanism involving “Markov trend prediction” and “on-demand hop discovery,” whereby potential circuit breaks are actively perceived, and void boundaries are confirmed in real-time. In contrast, although hierarchical processing is introduced by PHVP, its identification relies mainly on passive feedback, resulting in an identification rate of approximately 94.5%. It is noteworthy that in the low-density region (50–150 nodes), the curve of DROR is slightly higher than that of T-SAPR; however, it is surpassed by T-SAPR as density increases. This is because simple voids can be rapidly discovered in sparse networks by the depth-information-based greedy strategy of DROR, whereas identification bottlenecks are caused by the lack of a global perspective in high-density complex topologies. Conversely, although T-SAPR is initially constrained by the latency of AUV scheduling, as the number of nodes increases, malicious or faulty nodes are more accurately eliminated by its trust-model-based routing decisions, thereby improving the effective identification rate. Relatively limited identification capabilities (84.4% and 80.5%, respectively) are exhibited by ERR-UWSN and SOVHAR when facing large-scale or complex-shaped voids, as they rely primarily on local neighbor information. Richer neighbor information is provided by higher node density. At 350 nodes, compared with PHVP, DROR, T-SAPR, ERR-UWSN, and SOVHAR, the average void identification rate of HAO-AVP is improved by approximately 7.6%, 8.4%, 13.8%, 19.5%, and 25.3%, respectively, validating the effectiveness of the proposed method across different network scales.

Figure 9 illustrates the effect of node quantity on the void recovery rate. As depicted in Figure 9, the trends of void recovery rates for various protocols under different node densities are presented. With an increase in the number of nodes, network connectivity is enhanced, and a steady upward trend in the recovery performance of each protocol is exhibited. Among them, relatively optimal performance is consistently maintained by HAO-AVP. This is attributed to its four-level progressive repair strategy, whereby multiple fault-tolerant defense lines are constructed by flexibly integrating means such as optical adjustment, acoustic hopping, and AUV physical repair. The performance of PHVP ranks second, as single-medium interruptions are effectively alleviated by its acoustic-optical dual-mode switching mechanism. It is noteworthy that the recovery rate of DROR is slightly superior to that of T-SAPR. This is because depth-based multi-path opportunistic forwarding is adopted by DROR, resulting in a wider path search range. In contrast, although AUV assistance is introduced in T-SAPR, nodes that are physically connected but possess low reputation scores may be actively eliminated by its core trust management mechanism; consequently, potential repair paths are sacrificed to a certain extent by this focus on security. Conversely, passive backtracking or local scanning is primarily relied upon by ERR-UWSN and SOVHAR. Consequently, local optima are easily encountered when facing complex dead-end topologies, resulting in limited recovery success rates. In the typical scenario of 350 nodes, compared with PHVP, DROR, T-SAPR, ERR-UWSN, and SOVHAR, the void recovery rate of HAO-AVP is improved by approximately 4.3%, 9.6%, 12.0%, 18.4%, and 24.2%, respectively, fully validating the effectiveness of the multi-modal collaborative repair mechanism.

Hole Repair Results vs. Number of Nodes.

5.2.2. Performance Analysis Under Different Mobility Speeds

In this section, the robustness of the protocol under varying network dynamics is evaluated. The number of nodes is fixed at 150 (representing medium density). By increasing the maximum mobility speed of nodes from 0 m/s (static) to 6 m/s (highly dynamic), the variations in PDR, Average End-to-End Delay, and Network Lifetime are observed.

Node Moving Speed and Network Lifetime:

Figure 10 illustrates the impact of node mobility speed on network lifetime. As indicated by the figure, as the node mobility speed increases from 0 m/s to 6 m/s, the dynamicity of the network topology is significantly enhanced. This leads to frequent link interruptions and routing reconstructions, thereby increasing the energy consumption of nodes; consequently, a downward trend in network lifetime is exhibited by all protocols. However, the longest network lifetime is consistently maintained by the proposed HAO-AVP protocol across all speed settings, demonstrating exceptional robustness. In particular, in the high-dynamic scenario of 6 m/s, the advantage of HAO-AVP is particularly evident. Compared with PHVP, T-SAPR, DROR, ERR-UWSN, and SOVHAR, the network lifetime is extended by approximately 2.1%, 14.1%, 9.2%, 12.6%, and 21.0%, respectively.

The primary reasons for this performance improvement are attributed to the following factors: First, the Gini Coefficient and Information Entropy are introduced into the RL reward function by HAO-AVP. Even under conditions of rapid topological changes, data flows are continuously guided away from low-energy nodes by this mechanism, whereby the equilibrium of energy consumption (Load Balancing) is maintained on a global scale. Second, the risks of circuit breaks are accurately anticipated by the proposed prediction-identification-repair collaborative mechanism, and energy wastage caused by blind retransmissions and invalid path searches is effectively reduced. In contrast, excessive energy consumption is incurred by T-SAPR due to the high computational and signaling overheads brought by its trust evaluation mechanism. Furthermore, the acceleration of node energy exhaustion is caused by the frequent triggering of passive repair strategies in SOVHAR and PHVP under high-dynamic environments.

2.
Node Mobility Speed and Average End-to-End Delay:

Figure 11 illustrates the trends in average end-to-end delay for each protocol under different node mobility speeds. In high-dynamic scenarios with a node mobility speed of 6 m/s, the lowest delay performance is maintained by the HAO-AVP protocol. Compared with PHVP, T-SAPR, DROR, ERR-UWSN, and SOVHAR, the delay is reduced by approximately 11.3%, 21.4%, 27.6%, 30.3%, and 32.5%, respectively. The advantage of HAO-AVP is primarily attributed to the prioritized utilization of high-speed optical communication links and the rapid adaptation to topological changes achieved through RL, whereby routing oscillation and reconstruction time are reduced. In contrast, protocols such as SOVHAR and ERR-UWSN are constrained by the low propagation speed of underwater acoustic channels; furthermore, a drastic increase in latency is caused by the frequent triggering of time-consuming passive repair mechanisms under high-speed mobility. It is noteworthy that a crossover phenomenon is observed between the curves of ERR-UWSN and T-SAPR in the low-speed phase (0–3 m/s). At low speeds, paths are effectively found by the active avoidance strategy of ERR-UWSN, resulting in lower latency than T-SAPR. However, as speed increases, the avoidance mechanism of ERR-UWSN is rendered ineffective by overly rapid topological changes, necessitating frequent retransmissions. Consequently, a rapid rise in latency is exhibited, surpassing that of T-SAPR. Conversely, although high base overhead is incurred by T-SAPR, its sensitivity to dynamicity is relatively lower, resulting in a more moderate increase.

3.
Node Mobility Speed and PDR:

Average End-to-End Delay vs. Node Moving Speed.

Figure 12 depicts the impact of node mobility speed on PDR. As indicated in the figure, as the node mobility speed increases, changes in network topology become more frequent and drastic, leading to difficulties in link maintenance and unstable neighbor relationships; consequently, a downward trend in PDR is exhibited by all protocols. However, a relatively gentle decline in the PDR curve is shown by the HAO-AVP protocol, demonstrating strong robustness. This is primarily attributed to its core reinforcement learning routing decision mechanism, where continuous learning and adaptation to environmental changes are enabled, allowing for rapid adjustment of routing strategies even during fast topological shifts. Simultaneously, a strong guarantee for maintaining network connectivity in high-dynamic environments is provided by its comprehensive four-level graded repair mechanism, particularly path backtracking and AUV-assisted repair. For instance, when the node mobility speed is 5 m/s, PDR improvements of 18.12%, 7.95%, 4.54%, 23.86%, and 29.54% are achieved by HAO-AVP relative to PHVP, T-SAPR, DROR, ERR-UWSN, and SOVHAR, respectively. Among the comparative algorithms, relatively stable performance is also exhibited by DROR and T-SAPR. Reinforcement learning is similarly utilized by DROR, endowing it with a certain degree of adaptability to partially mitigate the effects of mobility. Although the basic routing of T-SAPR is significantly affected by mobility, physical-layer connection restoration during severe link interruptions is provided by its AUV-assisted repair mechanism, thereby maintaining a relatively high PDR, despite the potential latency associated with this repair. In contrast, more distinct declines in PDR with increasing speed are observed for PHVP, ERR-UWSN, and SOVHAR, due to their relatively slow response to topological changes or limited repair mechanisms. Although PHVP is a hybrid protocol, the adaptability of its routing decisions and repair mechanisms to rapid topological changes is considered inferior to that of RL-based protocols. The active avoidance strategy of ERR-UWSN is liable to fail more frequently when topology changes rapidly. Similarly, large-scale or frequently changing voids caused by rapid node movement are difficult to cope with using the local detouring strategy of SOVHAR.

4.
Node Mobility Speed and Void Identification Rate/Void Recovery Rate:

Figure 13 presents the results regarding the impact of node mobility speed on the void identification rate. As can be seen from the figure, as the node mobility speed increases from 1 m/s to 6 m/s, the dynamicity of the network topology is intensified, leading to a downward trend in the void identification performance of all protocols. However, by virtue of the collaborative Markov prediction mechanism, node failure risks are actively anticipated by HAO-AVP, allowing strong robustness to be maintained in high-speed scenarios. In contrast, PHVP is limited by the maintenance lag of its packet grading structure, resulting in a decline in the identification rate as speed increases. It is noteworthy that the performance of DROR is superior to that of T-SAPR. This is because the depth-based opportunistic forwarding strategy adopted by DROR possesses stronger adaptability to real-time topological changes, whereas reliance is placed on trust model updates by T-SAPR, where the convergence speed of trust values is found to lag behind topological changes under high-speed mobility, leading to the misjudgment of certain nodes. The most significant performance degradation is observed in ERR-UWSN and SOVHAR due to their reliance on local passive sensing, which makes it difficult to keep pace with high-speed topological changes. In the high-dynamic limit scenario of 6 m/s, compared with PHVP, DROR, T-SAPR, ERR-UWSN, and SOVHAR, the void identification rate of HAO-AVP is improved by approximately 17.5%, 23.7%, 28.8%, 42.4%, and 54.1%, respectively, fully validating the effectiveness of the active prediction mechanism in dynamic networks.

Impact of Node Moving Speed on Hole Detection Results.

Figure 14 illustrates the results regarding the impact of node mobility speed on the void recovery rate. As indicated by the figure, as the node mobility speed increases from 1 m/s to 6 m/s, a downward trend in the void recovery rates of all protocols is induced by high-frequency variations in network topology; however, significant differences in the attenuation magnitudes are observed among the various protocols. The relatively highest recovery performance is maintained by HAO-AVP throughout the entire testing interval. This is attributed to the fact that link ruptures are perceived in advance by its collaborative prediction mechanism, and rapid active repair is realized through the utilization of AUV assistance and acoustic-optical switching, whereby the negative impacts brought by mobility are effectively offset.

Impact of Node Moving Speed on Hole Repair Results.

It is noteworthy that a distinct cross point is observed between the curves of PHVP and DROR at a speed of approximately 3.1 m/s. In low-speed scenarios, the hierarchical structure established by PHVP is relatively stable, and more determinate recovery paths can be provided compared to DROR. However, as the speed exceeds 3 m/s, a maintenance lag in the hierarchical structure of PHVP is caused by high-frequency topological changes, leading to a sharp decline in recovery performance. In contrast, a depth-based opportunistic forwarding strategy is adopted by DROR. A higher tolerance for topological deformation is possessed by this structure-free loose routing method; therefore, PHVP is surpassed by DROR in high-speed mobility scenarios. The performance of T-SAPR is slightly inferior to that of DROR. This is primarily because a lag exists in its trust value updates under high-speed mobility, potentially causing nodes that are physically connected but whose trust values have not yet converged to be excluded from repair paths. Furthermore, as static backtracking or local scanning is relied upon by ERR-UWSN and SOVHAR, the “path drift” caused by rapid movement is difficult to adapt to, resulting in the most severe performance attenuation in high-speed scenarios. In the high-dynamic limit scenario with a node mobility speed of 6 m/s, significant robustness is exhibited by HAO-AVP. Compared with DROR, PHVP, T-SAPR, ERR-UWSN, and SOVHAR, the void recovery rate is improved by approximately 18.8%, 22.0%, 27.1%, 40.8%, and 52.5%, respectively.

5.2.3. Analysis of Load Balancing Performance

Figure 15 illustrates the comparison of the number of living nodes as it varies with simulation rounds. Simultaneously, the average trend of the number of living nodes over time (center line) and performance stability (shadow band width) for each protocol are displayed in Figure 15. A relatively gentle decline is shown by the mean curve of the HAO-AVP protocol, and the highest number of living nodes is maintained. More importantly, a narrower shadow band is observed for HAO-AVP compared to all other algorithms, reflecting its higher operational stability. This unification of high performance and high stability is derived from its load balancing mechanism based on Information Entropy and the Gini Coefficient, by which network energy consumption is evenly distributed, thereby significantly extending the network lifetime. In contrast, faster declines in curves and wider shadow bands are exhibited by all comparative protocols, indicating higher performance volatility. Among them, although a certain degree of energy awareness is possessed by ERR-UWSN and DROR, stability is still considered insufficient due to the lack of a global balancing objective. Since no load balancing mechanism is built into PHVP, SOVHAR, and T-SAPR, nodes are prone to randomly becoming hotspots. Consequently, not only is their mean performance relatively lower, but the weakest stability is also demonstrated by their broad band regions. For instance, at the 2000th round, the numbers of living nodes for HAO-AVP, T-SAPR, SOVHAR, PHVP, DROR, and ERR-UWSN are observed to be 129, 63, 75, 80, 88, and 92, respectively.

Comparison of the Number of Alive Nodes vs. Simulation Rounds (Number of Nodes = 150).

5.2.4. Analysis of the Effectiveness of the Graded Repair Mechanism

Figure 16 illustrates the comparison regarding the effectiveness and necessity of the graded repair mechanism. As indicated by the figure, the PDR performance of the protocol configured with three different settings is compared across varying numbers of nodes.

Effectiveness Analysis of the Hierarchical Repair Mechanism.

HAO-AVP-Base is comprised solely of reinforcement learning routing, with no repair mechanism included; HAO-AVP-L2 is augmented from the Base version by adding two levels of local repair—optical path adjustment and acoustic-optical switching; whereas HAO-AVP-Full is constituted by a four-level repair mechanism, encompassing optical path adjustment, acoustic-optical switching, path backtracking, and AUV assistance. As illustrated in the figure, the lowest PDR level is exhibited by HAO-AVP-Base across all densities. Particularly in sparse networks (50–100 nodes), a PDR lower than 0.45 is observed, representing suboptimal performance. This indicates that routing voids are identified as a critical bottleneck constraining network reliability; therefore, the necessity of repair mechanisms is demonstrated. Upon the activation of local repair in HAO-AVP-L2, the PDR is increased from 0.45 to 0.80 at 100 nodes, whereby the high efficiency of the optical path adjustment and acoustic-optical switching mechanisms is verified.

It is noteworthy that a PDR superior to that of HAO-AVP-L2 and HAO-AVP-Base is exhibited by the proposed HAO-AVP-Full protocol across all node quantities. The advantage of the HAO-AVP-Full protocol is observed to be more pronounced in sparse networks (50–100 nodes). In such networks, topological voids are found to be more severe, and the local repair capabilities of optical path adjustment and acoustic-optical switching are found to have reached their upper limits. At this juncture, “hard voids” that cannot be resolved by HAO-AVP-L2 are effectively overcome by HAO-AVP-Full by virtue of its unique path backtracking and AUV assistance repair means. Consequently, a PDR of 0.80 is attained at 50 nodes, which is higher than the 0.65 achieved by HAO-AVP-L2. With an increase in the number of nodes (>200 nodes), the curves of HAO-AVP-L2 and HAO-AVP-Full are observed to converge. This is attributed to the fact that in dense networks, the problem is sufficiently addressed by optical path adjustment and acoustic-optical switching. Consequently, the high-cost path backtracking and AUV assistance mechanisms are adaptively reserved, whereby unnecessary resource overhead is avoided. In summary, high reliability and robustness of the protocol under various network densities are ensured by the four-level progressive mechanism of HAO-AVP-Full. It is further indicated that in practical deployments, the most appropriate repair strategy can be selectively configured by balancing specific requirements for PDR reliability against acceptable network overheads (such as the operational cost of AUVs).

5.2.5. Sensitivity Analysis of Reward Function Weights

The impact of the weight parameter $w_{3}$ on network performance, specifically the influence of the equilibrium reward term $R_{b a l a n c e}$ on the network lifetime, is illustrated in Figure 17. To simplify the analysis, the weights are constrained by $w_{1} = w_{2} = (1 - w_{3}) / 2$ , and $w_{3}$ is systematically varied from 0 to 0.6 to observe variations in network lifetime. As $w_{3}$ increases from 0, a significant extension in network lifetime is initially observed, reaching a peak within the interval of $w_{3} = 0.3 \sim 0.4$ . However, when $w_{3}$ is set excessively high (>0.4), a decline in network lifetime is observed.

Variation of Network Lifetime with Equilibrium Reward Weight $w_{3}$ .

This non-monotonic trend of network lifetime versus weight provides empirical verification for the theoretical analysis regarding reshaping convergence behavior via the reward function. The regularization strength of $R_{b a l a n c e}$ within the Q-value landscape is essentially controlled by the weight $w_{3}$ . The experimental results clearly demonstrate three distinct stages of convergence behavior:

Under-regularization Stage ( $w_{3} < 0.3$ ):

When the balance weight is low (or $w_{3} = 0$ ), the regularization constraint is insufficient. The algorithm is primarily driven by the progress reward $R_{p r o g}$ , exhibiting strong greedy characteristics. Agents tend to converge to a few nodes on the geometric shortest path (i.e., falling into local optima), leading to the premature emergence of energy hotspots. This validates the theoretical inference that the absence of $R_{b a l a n c e}$ leads to winner-takes-all convergence.

2.
Optimal Balance Point ( $w_{3} \approx 0.3$ ):

At this point, the optimal trade-off between routing efficiency (gradient depth) and load fairness (gradient breadth) is achieved. The Q-value peaks are moderately smoothed by the regularization term provided by $R_{b a l a n c e}$ , extending the convergence target from a single optimal path to a routing set containing multiple healthy neighbors, thus realizing the maximization of global load balance and network lifetime.

3.
Over-regularization Stage ( $w_{3} > 0.3$ ):

As $w_{3}$ increases further, a decline in network lifetime is observed. This is attributed to Gradient Dilution caused by excessive regularization. The directional objective is overwhelmed by the balance objective, causing agents to select inefficient paths with excessive hop counts and long physical distances in pursuit of extreme load fairness. The benefits of load balancing are negated by the additional transmission energy consumption. This confirms that excessive $R_{b a l a n c e}$ weakens the directional guidance of $R_{p r o g}$ .

In summary, the experimental data not only determines the optimal parameter configuration ( $w_{3} = 0.3$ ) but also confirms the theoretical mechanism of the Entropy-Gini term acting as a Q-value regularizer from an empirical perspective.

5.2.6. Analysis of Computational Complexity and Engineering Feasibility

To validate the practical deployability of the proposed HAO-AVP protocol on resource-constrained underwater sensor nodes, quantitative evaluations of its computational complexity and memory footprint were conducted in this section. Additionally, lightweight strategies oriented towards engineering implementation were proposed.

Complexity Analysis and Resource Consumption Evaluation:

To evaluate the computational overhead of the proposed algorithm, performance tests were performed on the core functional modules using a simulation platform (Intel Core Ultra9, 32 GB RAM). To mitigate stochastic errors associated with single runs, the average code execution time and peak memory usage from 10 independent runs were recorded, and the results are presented in Table 5. As indicated by the data, the steady-state operation components of the protocol (RL routing and Markov prediction) are extremely lightweight; a single decision requires approximately 0.153 ms, with memory usage maintained below 7 KB. Even for the relatively computationally intensive PSO repair module, the execution time was controlled at approximately 45 ms, and the memory usage did not exceed 20 KB. These results demonstrate that the algorithm possesses high execution efficiency on general-purpose computing platforms, providing a solid foundation for subsequent porting to resource-constrained embedded nodes.

2.
Deployment-Oriented Lightweight Strategies:

Table 5.

Computational Complexity and Memory Footprint of the Algorithm.

Module	PC Runtime (Sim)	Memory (RAM)
RL Routing	0.008 ± 0.002 ms	2.45 KB
Markov Prediction	0.145 ± 0.012 ms	4.12 KB
PSO Repair (Lvl 4)	45.32 ± 2.15 ms	18.6 KB
Routine Total	≈0.153 ms	<10 KB

Open in a new tab

Although the aforementioned evaluations indicate that the baseline overhead of the protocol falls within an acceptable range, considering the severely constrained computational resources and battery capacities of underwater nodes (e.g., the STM32L4 series or MSP430), three specific engineering lightweight strategies are proposed in this paper to further reduce power consumption and optimize real-time performance:

Markov State Space Compression:

The computational complexity of matrix multiplication in the Markov prediction module is proportional to the square of the number of energy states. Although fine-grained discretization levels (S = 100) were utilized in the theoretical model, a coarse-grained scheme (S = 8) was implemented for practical deployment. From a mathematical perspective, floating-point operations are reduced by over 99% through this state reduction. Consequently, sufficient sensitivity for void warning is maintained while CPU cycles are significantly conserved.

Asynchronous Q-Value Update:

In traditional Q-Learning, a Q-value update is executed for every forwarded data packet, leading to frequent write operations on Flash memory. Therefore, an event-triggered asynchronous update mechanism is adopted: a Q-value iteration is triggered only when the magnitude of change in the cumulative reward exceeds a preset threshold or after a fixed time window has elapsed. In high-traffic scenarios, memory I/O frequency and the associated CPU overhead are expected to be reduced by approximately 40% via this strategy, thereby extending the hardware lifespan.

Micro-PSO Implementation:

A Micro-PSO variant is employed for the computationally intensive Level-4 repair. By reducing the particle population size from the standard 50 to 10 and limiting the maximum number of iterations to within 20, the algorithm is constrained to output a suboptimal but feasible solution within a strict time budget. Through this trade-off, it is ensured that the algorithm converges to a feasible suboptimal solution within the strict real-time budget of embedded systems, preventing the triggering of a system watchdog timer timeout.

5.2.7. Multi-AUV Collaboration and Cost–Benefit Sensitivity Analysis

To address the practical constraints of underwater deployment-specifically the trade-offs between repair latency, energy consumption, and hardware costs-this paper investigates the performance of the HAO-AVP protocol under varying AUV densities. A task allocation model is also proposed to resolve scheduling conflicts, alongside an analysis of the cost–benefit ratio (CBR) to determine the optimal triggering threshold.

Multi-AUV Task Allocation Model:

When multiple AUVs ( $N_{a u v} > 1$ ) are available, a centralized bidding strategy is employed to prevent scheduling conflicts. The triggering of an AUV repair task is determined by the Void Severity Score ( $S_{k}$ ), which was previously defined in Equation (40) to quantify the urgency of the routing void based on local sparsity and traffic load. We introduce a variable Trigger Threshold ( $τ$ ). A physical repair request is broadcast only when the severity score satisfies $S_{k} > τ$ .

Upon receiving a request, the central controller (e.g., the Sink) evaluates the assignment cost $C_{i j}$ for each available AUV $a$ to target void $v o i d 1$ :

C_{a, v o i d 1} = ω_{t} \cdot \frac{d_{a, v o i d 1}}{v_{a u v}} + ω_{e} \cdot (1 - \frac{E_{r e m}^{a}}{E_{i n i t}})

(52)

where $d_{a, v o i d 1}$ is the Euclidean distance, $v_{a u v}$ represents the moving velocity of the AUV, $E_{r e m}^{a}$ and $E_{i n i t}$ represent the residual energy of AUV $a$ and its initial energy capacity, respectively. $ω_{t}$ and $ω_{e}$ are weighting coefficients balancing the travel time and energy cost, satisfying $ω_{t} + ω_{e} = 1$ .

The task is assigned to the AUV with the minimum $C_{i j}$ , ensuring that resources are dispatched based on both proximity and energy capabilities.

2.
Sensitivity Analysis and Cost–Benefit Trade-offs:

This paper presents a sensitivity analysis by varying the AUV number ( $N_{a u v}$ ) and the trigger threshold ( $τ$ ). The Cost–Benefit Ratio (CBR) is defined as the percentage improvement in Packet Delivery Ratio (PDR) per unit of energy consumed (kJ). Table 6 presents the simulation results. The data reveals a clear trade-off between network reliability and operational cost:

3.
Discussion on Engineering Feasibility:

Table 6.

Sensitivity Analysis of Multi-AUV Configuration and Cost–Benefit Ratio.

$N_{a u v}$	$τ$	Task Allocation	Total Energy Cost (J)	CBR	PDR
0 (No AUV)	N/A	Fallback Scheme: Purely relying on acoustic backtracking (Level 3).	1520	N/A	82.5%
1 (Single)	High ( $τ$ = 0.8)	Greedy: Only triggers repair when void severity $S_{k} > 0.8$ (critical voids).	1850	Optimal	88.4%
2 (Multi)	Medium ( $τ$ = 0.6)	Collaborative: Triggers repair for moderate voids ( $S_{k} > 0.6$ ); partition-based coverage.	2140	Good	93.1%
3 (Multi)	Low ( $τ$ = 0.4)	Aggressive: Triggers repair even for minor voids ( $S_{k} > 0.4$ ); high mobility cost.	2780	Diminishing	94.5%

Open in a new tab

Fallback Scheme (AUV-Absent Scenario):

As shown in the first row of Table 6, when $N_{a u v} = 0$ , the protocol automatically falls back to the Level 3 mechanism (Path Backtracking). Although PDR drops to 82.5%, the network maintains basic connectivity without incurring mechanical movement costs, proving that the protocol is not solely dependent on AUVs.

Optimal Configuration:

The CBR peaks at $N_{a u v} = 1$ or 2 with a high threshold ( $τ \geq 0.6$ ). This indicates that deploying a small number of AUVs to fix only critical “hard voids” yields the maximum return on investment.

Diminishing Returns:

With $N_{a u v} = 3$ and a low threshold ( $τ = 0.4$ ), the PDR improvement is marginal compared to the surge in energy consumption. This verifies that an aggressive repair strategy is economically inefficient.

5.2.8. Sensitivity Analysis of Optical Link Robustness Under Jerlov Water Types

To validate the adaptability of the optical communication model in realistic underwater environments, the performance of the proposed in-medium adjustment (Level 2 repair) strategy is evaluated under varying water turbidity conditions.

Optical Attenuation Model:

The modeling of the optical attenuation coefficient, $c (λ)$ , adopts the classic Jerlov water type classification standard. According to the received optical power model defined previously (Equation (11)), the received power exhibits an exponential decay relationship with the attenuation coefficient. Specifically, the value range of $c (λ)$ is set from 0.056 m⁻¹ (Type I, clear ocean) to 0.55 m⁻¹ (Coastal, turbid). A repair attempt is deemed successful only when the adjusted beam SNR exceeds the decoding threshold ( $Γ_{t h}$ ).

2.
Impact of Turbidity on Repair Success Rate:

The optical repair success rates under different Jerlov water types are presented in Table 7.

Table 7.

Impact of Jerlov Water Types on Optical Repair Success and Channel Switching.

Jerlov Water Type	Attenuation Coeff. c(λ)(m⁻¹)	Optical Repair Success Rate	Acoustic Fallback Rate	Environment Description
Type I	0.056	98.50%	1.50%	Extremely Clear (Deep Ocean)
Type IA	0.078	96.20%	3.80%	Clear Ocean
Type IB	0.096	92.40%	7.60%	Moderate Ocean
Type II	0.16	78.50%	21.50%	Coastal Waters (Clear)
Type III	0.39	45.20%	54.80%	Coastal Waters (Turbid)
Coastal (C1)	0.55	12.60%	87.40%	Harbor (High Turbidity)

Open in a new tab

Clear Water (Type I-IB): In deep ocean environments ( $c (λ) < 0.1 m^{- 1}$ ), a success rate exceeding 92% is maintained, indicating the effectiveness of the beam steering mechanism under standard operating conditions.

Transition Zone (Type II): As turbidity increases ( $c (λ) \approx 0.16 m^{- 1}$ ), a decrease in the success rate to 78.5% is observed, attributed to the limited effective range caused by scattering.

Turbid Water (Type III-Coastal): In coastal environments ( $c (λ) > 0.39 m^{- 1}$ ), the optical repair success rate declines to below 45% due to signal attenuation and bio-occlusion effects.

3.
Effectiveness of the Hybrid Channel Switching Mechanism:

As indicated in Table 7, the transition from deep ocean (Type I) to turbid coastal environments (Coastal C1) corresponds to an increase in the optical attenuation coefficient, resulting in a reduction in the Level 2 optical repair success rate from 98.5% to 12.6%. However, network disconnection is avoided. The data demonstrates that as the optical link failure rate increases, a complementary rise in the Acoustic Fallback Rate occurs. These results verify that the Hybrid Channel Switching Mechanism adaptively toggles between the high-bandwidth optical mode and the “robust acoustic mode” based on environmental turbidity, thereby ensuring network connectivity is preserved under varying turbidity conditions.

5.3. Discussion

5.3.1. Advantages of the Proposed Method

The Entropy-Gini reinforcement learning decision-making is combined with the prediction-identification-repair collaborative mechanism by the HAO-AVP protocol proposed in this paper. Energy efficiency balance in routing, robustness in void handling, and high dynamic adaptability are simultaneously taken into account, making the protocol suitable for complex underwater hybrid acoustic-optical network scenarios.

Advantages in Void Identification and Repair: Void boundaries are collaboratively locked by Markov trend prediction and on-demand hop discovery. Means such as optical adjustment, acoustic hopping, and AUV assistance are adaptively matched by the four-level progressive repair strategy. Consequently, the detection of concealed voids, dead-end regions, and large-scale fractures is rendered more accurate, and link recovery is made more stable. Furthermore, under extreme conditions of node sparsity (50 nodes) and drastic topological changes (6 m/s), missed detections and false alarms are effectively suppressed. Identification and recovery rates above 94% and 90%, respectively, are consistently maintained under high dynamics. Stability significantly superior to single-mechanism protocols (such as PHVP and ERR-UWSN) is exhibited when migrating across scenarios (different densities and speeds).
Advantages in Energy Efficiency Balance and Latency: Information Entropy and the Gini Coefficient are introduced into the routing decision layer to replace the simple residual energy greedy strategy, whereby the premature death of “hotspot” nodes is reduced. The high bandwidth and low latency characteristics of optical communication are prioritized in the transmission layer, and end-to-end latency is significantly compressed while connectivity is ensured. It is indicated by experiments that the lowest energy consumption Gini coefficient is achieved by HAO-AVP, effectively preventing the generation of energy holes and adapting to long-cycle monitoring and latency-sensitive tasks.
Advantages in Dynamic Adaptability and Robustness: A full-process strategy ranging from active source avoidance to mid-stage collaborative prediction to terminal graded repair is adopted. Acute perception of topological evolution is retained, link reconstruction speed is accelerated, and routing oscillation is reduced, facilitating the maintenance of continuous communication in environments with ocean current interference and node drift. By combining active risk warning with multi-modal switching, communication is rendered more stable, and lower packet loss rates and circuit break risks are observed when migrating from low-speed static waters to high-speed dynamic current environments.

5.3.2. Limitations and Future Work

Although stable benefits have been achieved by HAO-AVP in variable simulation environments, algorithm uncertainty may still be amplified by turbidity, complex noise, and micro-node computing constraints in real underwater environments. In particular, a further trade-off and optimization between communication reliability under extremely low SNR and resource consumption for end-side deployment are still required.

Challenges in Extreme Underwater Channels: Optical communication distances are severely weakened by strong turbulence, high turbidity, and background optical noise. Link stability is degraded under strict alignment requirements, and the quality of acoustic-optical switching remains to be improved. It is suggested that real water optical characteristic models and adaptive beam divergence control be introduced. Samples containing turbidity and multipath interference should be added during the training phase, and dynamic channel quality thresholds should be adopted during the inference phase to stabilize output.
Strong Dependence on Node Computing Power: Heavy calculation loads are incurred by Reinforcement Learning and AUV path planning (PSO). Calculation latency is amplified by long-tail state spaces and frequent Q-value updates, while the storage and energy consumption of micro-sensor nodes are limited. It is suggested that “cloud-edge-end” collaborative computing be implemented. Complex training tasks should be transferred to sink nodes or surface buoys, and lightweight RL models and operator pruning should be utilized to alleviate the computational burden on single nodes.
Optimization of AUV Scheduling Costs: Although a high repair rate is yielded by AUVs, significant navigation energy consumption and scheduling time costs are incurred. All breakpoints are difficult to cover by a single AUV in ultra-large-scale networks. It is suggested that research on multi-AUV collaboration and task allocation algorithms be conducted. Energy harvesting technologies (such as thermal energy and wave energy) should be combined to reduce the comprehensive cost of physical repair without sacrificing the success rate of recovery.

To address the aforementioned issues, lightweight neural networks and transfer learning will be introduced in subsequent work to reduce reliance on computing power. The realism of channel models will be enhanced by combining tank experiments with lake trial data. Distributed multi-agent collaboration will be conducted to reduce latency while maintaining accuracy, and anti-interference coding for underwater acoustic communication will be explored to reinforce connectivity rates under weak signal conditions.

5.3.3. Scenario Expansion

Good transferability is possessed by the multi-modal collaborative and adaptive repair mechanisms of HAO-AVP, allowing them to be reused in various underwater infrastructure scenarios characterized by “high dynamics, limited energy, and harsh communication environments.” Through parameter fine-tuning and minor strategy adaptations, stable transmission can be maintained under different operating conditions, providing reliable link support for the closed-loop of the subsequent Marine Internet of Things.

Smart Marine Ranching Monitoring: Frequent blocking of optical paths is caused by high breeding density and fish shoal movement. High-definition video and environmental parameters can be stably transmitted back by adopting rapid switching with optical priority and acoustic backup. Combined with energy efficiency balance strategies, node failure and maintenance frequency under continuous monitoring are synchronously compressed, making it suitable for multi-source networking of fixed pile foundations and mobile inspection torpedoes.
Underwater Military Defense and Tactical Reconnaissance: High concealment is required in highly antagonistic battlefield environments. Load concealment is reinforced by the Entropy-Gini reward function, and the probability of detection by sonar is reduced by the silent characteristic of optical communication. Combined with active void avoidance, long survival cycles are balanced with high-reliability intelligence transmission. When combined with underwater vehicle formations, a closed-loop command and control system of anomaly detection-silent transmission-collaborative strike can be formed.
Deep-sea Oil/Gas Pipeline and Cable Inspection: The target area is narrow and long with a large depth span. Physical voids caused by cable breaks can be precisely handled by AUV assistance in the four-level repair strategy. Combined with path backtracking and multi-hop relaying, long-distance inspection data relay can be realized in deep-sea areas without infrastructure. Furthermore, routing interruption risks can be directly mapped to maintenance work orders and emergency repair priority sequencing through early warning mechanisms.

6. Conclusions

This paper proposes the HAO-AVP protocol to address routing voids and energy constraints in UWSNs. The Gini Coefficient and Information Entropy are integrated into the Reinforcement Learning reward function to achieve energy fairness and load balancing. Additionally, a prediction–identification–repair mechanism is designed, employing a four-level strategy (optical adjustment, acoustic-optical switching, backtracking, and AUV assistance) to handle routing voids. Simulation results validate the protocol’s effectiveness in complex underwater environments. Experimental data indicates that in high-density scenarios with 350 nodes, extremely high reliability is exhibited by HAO-AVP. Compared with PHVP, DROR, T-SAPR, ERR-UWSN, and SOVHAR, the void identification rate is improved by approximately 7.6%, 8.4%, 13.8%, 19.5%, and 25.3%, respectively, and the void recovery rate is improved by approximately 4.3%, 9.6%, 12.0%, 18.4%, and 24.2%, respectively. In addition, existing mainstream protocols are also outperformed by HAO-AVP in key metrics such as network lifetime, average end-to-end delay, and load balancing. Although certain results have been achieved in this paper, several directions remain to be explored in depth in future work. First, efforts will be dedicated to constructing a more refined dynamic acoustic-optical channel model, where physical environmental factors such as water turbulence and suspended particle turbidity are incorporated into the routing decision considerations. Second, to address the computational challenges of large-scale heterogeneous networks, Deep Reinforcement Learning (DRL) or Distributed Multi-Agent Reinforcement Learning (MARL) algorithms are planned to be introduced to enhance algorithm scalability and convergence speed. Finally, considering the security requirements of underwater environments, the integration of trust evaluation and anti-attack strategies into the HAO-AVP framework to construct a comprehensive routing system with high reliability, strong robustness, and security is also identified as an important direction for subsequent research.

Author Contributions

Conceptualization, L.H. and C.M.; methodology, L.H.; software, L.H.; validation, L.H., C.M. and J.A.; formal analysis, L.H. and J.A.; investigation, L.H.; resources, C.M.; data curation, L.H.; writing—original draft preparation, L.H.; writing—review and editing, C.M. and J.A.; visualization, L.H.; supervision, C.M.; project administration, C.M.; funding acquisition, C.M. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1.Ogundile O.O., Babalola O.P., Agboola E.O., Ogundile O.M., Davidson I.E. Path Priority Routing Protocol for Underwater Wireless Sensor Network. IEEE Internet Things J. 2025;12:31654–31668. doi: 10.1109/JIOT.2025.3574076. [DOI] [Google Scholar]
2.Khan G., Gola K.K., Ali W., Gupta G.K. VEER: A void-aware energy efficient routing algorithm for underwater wireless sensor networks. Sci. Rep. 2025;15:29682–29695. doi: 10.1038/s41598-025-29682-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Abdulzahra A.M.K., Al-Qurabat A.K.M. An Energy-Efficient Clustering Protocol for the Lifetime Elongation of Wireless Sensors in IOT Networks. IT Appl. Sustain. Living. 2023;6:103–114. [Google Scholar]
4.Gola K.K., Khan G., Kumar S., Kanauzia R. Intelligent node identification and dynamic clustering for underwater acoustic sensor networks. Discov. Comput. 2024;28:251–273. doi: 10.1007/s10791-025-09739-3. [DOI] [Google Scholar]
5.Alharbi A., Ibrahim S. Adaptive Localization-Free Secure Routing Protocol for Underwater Sensor Networks. Sensors. 2025;26:17. doi: 10.3390/s26010017. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chen Y., Zhu J., Wan L., Fang X., Tong F., Xu X. Routing failure prediction and repairing for AUV-assisted underwater acoustic sensor networks in uncertain ocean environments. Appl. Acoust. 2022;186:108479. doi: 10.1016/j.apacoust.2021.108479. [DOI] [Google Scholar]
7.Ardizzon F., Casari P., Tomasin S. A RNN-based approach to physical layer authentication in underwater acoustic networks with mobile devices. Comput. Netw. 2024;243:110311–110335. doi: 10.1016/j.comnet.2024.110311. [DOI] [Google Scholar]
8.Khan G., Mishra P.K., Agarwal A.K., Alroobaea R., Asenso E., Kolla B.P., Sengan S. Energy-efficient routing algorithm for optimizing network performance in underwater data transmission using Gray Wolf Optimization algorithm. J. Sens. 2024;45:2288527. doi: 10.1155/2024/2288527. [DOI] [Google Scholar]
9.Ismail M., Qadir H., Khan F.A., Jan S., Wadud Z., Bashir A.K. A novel routing protocol for underwater wireless sensor networks based on shifted energy efficiency and priority. Comput. Commun. 2023;210:147–162. doi: 10.1016/j.comcom.2023.07.014. [DOI] [Google Scholar]
10.Aman W., Al-Kuwari S., Qaraqe M. A novel physical layer authentication mechanism for static and mobile 3D underwater acoustic communication networks. Phys. Commun. 2024;66:102430–102459. doi: 10.1016/j.phycom.2024.102430. [DOI] [Google Scholar]
11.Hussain A., Hussain T., Ali F., Attar R.W., Alhomoud A. Void hole avoidance using three hop-by-hop forwarding verification in UWSN. Telecommun. Syst. 2025;88:31. doi: 10.1007/s11235-025-01260-8. [DOI] [Google Scholar]
12.Ali E.S., Saeed R.A., Eltahir I.K. Depth based stable election routing protocol for heterogeneous internet of underwater things (IoUT) energy efficiency. Comput. Electr. Eng. 2024;119:109507. doi: 10.1016/j.compeleceng.2024.109507. [DOI] [Google Scholar]
13.Khan M.U., Aamir M., Otero P. Reliable, Energy-Optimized, and Void-Aware (REOVA), Routing Protocol with Strategic Deployment in Mobile Underwater Acoustic Communications. J. Mar. Sci. Eng. 2024;12:2215. doi: 10.3390/jmse12122215. [DOI] [Google Scholar]
14.Zradgui H., Ibrahimi K. BDREA Betta and Dolphin Pods Routing via Energy Scarcity Aware Protocol for Underwater Acoustic Wireless Sensor Networks (UAWSNs) Acoustics. 2022;4:656–678. doi: 10.3390/acoustics4030040. [DOI] [Google Scholar]
15.Jin Z., Wang Y., Liang J., Li H., Su Y. Energy-efficient Nonuniform Cluster-based Routing Protocol with Q-Learning for UASNs. Ad Hoc Netw. 2025;173:103797. doi: 10.1016/j.adhoc.2025.103797. [DOI] [Google Scholar]
16.Khoshvaght P., Haider A., Rahmani A.M., Altulyan M., Zaidi M.M., Yousefpoor M.S., Yousefpoor E., Hosseinzadeh M. An intelligent Q-learning-based tree routing method in underwater acoustic sensor networks. Eng. Appl. Artif. Intell. 2025;152:110753. doi: 10.1016/j.engappai.2025.110753. [DOI] [Google Scholar]
17.Shi Z., Zhang Y. EA-VBF, an underwater acoustic sensor network protocol that balances node residual energy and packet relay count; Proceedings of the Third International Conference on Algorithms, Microchips, and Network Applications (AMNA 2024); Jinan, China. 8–10 March 2024; pp. 242–248. [Google Scholar]
18.Saleh M.H., Takruri H., Ismail R.M. Enhanced energy aware and void avoidance routing protocol based on vector based forwarding for underwater acoustic wireless sensor network; Proceedings of the 2024 14th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP); Rome, Italy. 17–19 July 2024; pp. 437–442. [Google Scholar]
19.Abdulzahra A.M.K., Al-Qurabat A.K.M., Abdulzahra S.A. Optimizing Energy Consumption in WSN-Based IOT Using Unequal Clustering and Sleep Scheduling Methods. Internet Things. 2023;22:100765–1007882. doi: 10.1016/j.iot.2023.100765. [DOI] [Google Scholar]
20.Yang J., Liu F., Cao J. Greedy discrete particle swarm optimization based routing protocol for cluster-based wireless sensor networks. J. Ambient Intell. Humaniz. Comput. 2024;15:1277–1292. doi: 10.1007/s12652-017-0515-3. [DOI] [Google Scholar]
21.Mahdi O.A., Al-Mayouf Y.R., Al-Obaidi S.S., Al-Attar B., Balogun H., Khan S. Hotspot Issue Handling and Reliable Data Forwarding Technique for Ocean Underwater Sensor Networks. Iraqi J. Comput. Sci. Math. 2025;6:37. doi: 10.52866/2788-7421.1307. [DOI] [Google Scholar]
22.Liu Y., He L., Fan G., Wang X., Zhang Y. A co-localization algorithm for underwater moving targets with an unknown constant signal propagation speed and platform errors. Sensors. 2024;24:3127. doi: 10.3390/s24103127. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Khan M.U., Otero P., Aamir M. An energy efficient clustering routing protocol based on arithmetic progression for underwater acoustic sensor networks. IEEE Sens. J. 2024;24:6964–6975. doi: 10.1109/JSEN.2024.3354252. [DOI] [Google Scholar]
24.Ahmad I., Rahman T., Zeb A., Khan I., Ben Othman M.T., Hamam H. Cooperative energy-efficient routing protocol for underwater wireless sensor networks. Sensors. 2022;22:6945. doi: 10.3390/s22186945. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ye J., Jiang W. Routing Protocol for Underwater Wireless Sensor Networks Based on a Trust Model and Void-Avoided Algorithm. Sensors. 2024;24:7614. doi: 10.3390/s24237614. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Wang Y., Luo L. IB-DARP: An Algorithm for Multi-Vessel Collaborative Task and Path Planning. J. Mar. Sci. Eng. 2026;14:165. doi: 10.3390/jmse14020165. [DOI] [Google Scholar]
27.Chu S., Lin M., Li D., Lin R., Xiao S. Adaptive reward shaping based reinforcement learning for docking control of autonomous underwater vehicles. Ocean Eng. 2025;318:120139. doi: 10.1016/j.oceaneng.2024.120139. [DOI] [Google Scholar]
28.Zhu R., Boukerche A., Feng L., Yang Q. A trust management-based secure routing protocol with AUV-aided path repairing for Underwater Acoustic Sensor Networks. Ad Hoc Netw. 2023;149:103212–103245. doi: 10.1016/j.adhoc.2023.103212. [DOI] [Google Scholar]
29.Kaiser M.S., Hossain M.S., Mahmud M., Andersson K. Energy-Efficient Routing for Cooperative Multi-AUV System. IFAC-Pap. 2022;55:112–116. doi: 10.1016/j.ifacol.2022.08.019. [DOI] [Google Scholar]
30.Ullah R., Akram B.A., Zafar A., Saeed A., Almotiri S.H., Al Ghamdi M.A. An efficient routing scheme based on node density for underwater acoustic sensor networks. KSII Trans. Internet Inf. Syst. 2024;18:1390–1411. doi: 10.3837/tiis.2024.05.013. [DOI] [Google Scholar]
31.Lateef H.M., Al-Qurabat K.M. An Overview of Using Mobile Sink Strategies to Provide Sustainable Energy in Wireless Sensor Networks. Int. J. Comput. Digit. Syst. 2024;16:797–812. doi: 10.12785/ijcds/160158. [DOI] [Google Scholar]
32.Luo H., Xu Z., Wang J., Yang Y., Ruby R., Wu K. Reinforcement Learning-Based Adaptive Switching Scheme for Hybrid Optical-Acoustic AUV Mobile Network. Wirel. Commun. Mob. Comput. 2022;2022:9471698. doi: 10.1155/2022/9471698. [DOI] [Google Scholar]
33.He Y., Han G., Jiang J., Wang H., Martinez-Garcia M. A trust update mechanism based on reinforcement learning in underwater acoustic sensor networks. IEEE Trans. Mob. Comput. 2022;21:811–821. doi: 10.1109/TMC.2020.3020313. [DOI] [Google Scholar]
34.Gao J., Ye F., Zhang K., Zhang Z. A Q-learning-based load balancing routing protocol for underwater wireless sensor networks; Proceedings of the 2024 IEEE 7th International Conference on Electronic Information and Communication Technology (ICEICT); Xi’an, China. 30 July–2 August 2024; pp. 55–60. [Google Scholar]
35.Wang C., Shen X., Wang H., Zhang H., Mei H. Reinforcement learning-based opportunistic routing protocol using depth information for energy-efficient underwater wireless sensor networks. IEEE Sens. J. 2023;23:17771–17783. doi: 10.1109/JSEN.2023.3285751. [DOI] [Google Scholar]
36.Nandyala C.S., Kim H.W., Cho H.S. QTAR: A Q-learning-based topology-aware routing protocol for underwater wireless sensor networks. Comput. Netw. 2023;222:109562. doi: 10.1016/j.comnet.2023.109562. [DOI] [Google Scholar]
37.Nazareth P., Chandavarkar B.R. Cluster-based multi-attribute routing protocol for underwater acoustic sensor networks. Wirel. Pers. Commun. 2024;134:781–808. doi: 10.1007/s11277-024-10926-6. [DOI] [Google Scholar]
38.Zhu W., Zeng X., Qiu Y. A routing protocol for underwater acoustic-optical hybrid wireless sensor networks based on packet hierarchy and void processing. IEEE Sens. J. 2024;24:5203–5214. doi: 10.1109/JSEN.2023.3348757. [DOI] [Google Scholar]
39.Sathish K., Cv R., Ab Wahab M.N., Anbazhagan R., Pau G., Akbar M.F. Underwater wireless sensor networks performance comparison utilizing Telnet and Superframe. Sensors. 2023;23:4844. doi: 10.3390/s23104844. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Wang L., Hong Q.-X. A Drift-Aware Clustering and Recovery Strategy for Surface-Deployed Wireless Sensor Networks in Ocean Environments. Sensors. 2025;25:5883. doi: 10.3390/s25185883. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Aziz A., Raza N., Tahir M., Jafri M.R., Junaid M. Adaptive-DBR in Underwater Wireless Sensor Networks to Increase Throughput and Lifetime. SN Comput. Sci. 2025;6:772. doi: 10.1007/s42979-025-04316-3. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

[B1-sensors-26-00684] 1.Ogundile O.O., Babalola O.P., Agboola E.O., Ogundile O.M., Davidson I.E. Path Priority Routing Protocol for Underwater Wireless Sensor Network. IEEE Internet Things J. 2025;12:31654–31668. doi: 10.1109/JIOT.2025.3574076. [DOI] [Google Scholar]

[B2-sensors-26-00684] 2.Khan G., Gola K.K., Ali W., Gupta G.K. VEER: A void-aware energy efficient routing algorithm for underwater wireless sensor networks. Sci. Rep. 2025;15:29682–29695. doi: 10.1038/s41598-025-29682-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3-sensors-26-00684] 3.Abdulzahra A.M.K., Al-Qurabat A.K.M. An Energy-Efficient Clustering Protocol for the Lifetime Elongation of Wireless Sensors in IOT Networks. IT Appl. Sustain. Living. 2023;6:103–114. [Google Scholar]

[B4-sensors-26-00684] 4.Gola K.K., Khan G., Kumar S., Kanauzia R. Intelligent node identification and dynamic clustering for underwater acoustic sensor networks. Discov. Comput. 2024;28:251–273. doi: 10.1007/s10791-025-09739-3. [DOI] [Google Scholar]

[B5-sensors-26-00684] 5.Alharbi A., Ibrahim S. Adaptive Localization-Free Secure Routing Protocol for Underwater Sensor Networks. Sensors. 2025;26:17. doi: 10.3390/s26010017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6-sensors-26-00684] 6.Chen Y., Zhu J., Wan L., Fang X., Tong F., Xu X. Routing failure prediction and repairing for AUV-assisted underwater acoustic sensor networks in uncertain ocean environments. Appl. Acoust. 2022;186:108479. doi: 10.1016/j.apacoust.2021.108479. [DOI] [Google Scholar]

[B7-sensors-26-00684] 7.Ardizzon F., Casari P., Tomasin S. A RNN-based approach to physical layer authentication in underwater acoustic networks with mobile devices. Comput. Netw. 2024;243:110311–110335. doi: 10.1016/j.comnet.2024.110311. [DOI] [Google Scholar]

[B8-sensors-26-00684] 8.Khan G., Mishra P.K., Agarwal A.K., Alroobaea R., Asenso E., Kolla B.P., Sengan S. Energy-efficient routing algorithm for optimizing network performance in underwater data transmission using Gray Wolf Optimization algorithm. J. Sens. 2024;45:2288527. doi: 10.1155/2024/2288527. [DOI] [Google Scholar]

[B9-sensors-26-00684] 9.Ismail M., Qadir H., Khan F.A., Jan S., Wadud Z., Bashir A.K. A novel routing protocol for underwater wireless sensor networks based on shifted energy efficiency and priority. Comput. Commun. 2023;210:147–162. doi: 10.1016/j.comcom.2023.07.014. [DOI] [Google Scholar]

[B10-sensors-26-00684] 10.Aman W., Al-Kuwari S., Qaraqe M. A novel physical layer authentication mechanism for static and mobile 3D underwater acoustic communication networks. Phys. Commun. 2024;66:102430–102459. doi: 10.1016/j.phycom.2024.102430. [DOI] [Google Scholar]

[B11-sensors-26-00684] 11.Hussain A., Hussain T., Ali F., Attar R.W., Alhomoud A. Void hole avoidance using three hop-by-hop forwarding verification in UWSN. Telecommun. Syst. 2025;88:31. doi: 10.1007/s11235-025-01260-8. [DOI] [Google Scholar]

[B12-sensors-26-00684] 12.Ali E.S., Saeed R.A., Eltahir I.K. Depth based stable election routing protocol for heterogeneous internet of underwater things (IoUT) energy efficiency. Comput. Electr. Eng. 2024;119:109507. doi: 10.1016/j.compeleceng.2024.109507. [DOI] [Google Scholar]

[B13-sensors-26-00684] 13.Khan M.U., Aamir M., Otero P. Reliable, Energy-Optimized, and Void-Aware (REOVA), Routing Protocol with Strategic Deployment in Mobile Underwater Acoustic Communications. J. Mar. Sci. Eng. 2024;12:2215. doi: 10.3390/jmse12122215. [DOI] [Google Scholar]

[B14-sensors-26-00684] 14.Zradgui H., Ibrahimi K. BDREA Betta and Dolphin Pods Routing via Energy Scarcity Aware Protocol for Underwater Acoustic Wireless Sensor Networks (UAWSNs) Acoustics. 2022;4:656–678. doi: 10.3390/acoustics4030040. [DOI] [Google Scholar]

[B15-sensors-26-00684] 15.Jin Z., Wang Y., Liang J., Li H., Su Y. Energy-efficient Nonuniform Cluster-based Routing Protocol with Q-Learning for UASNs. Ad Hoc Netw. 2025;173:103797. doi: 10.1016/j.adhoc.2025.103797. [DOI] [Google Scholar]

[B16-sensors-26-00684] 16.Khoshvaght P., Haider A., Rahmani A.M., Altulyan M., Zaidi M.M., Yousefpoor M.S., Yousefpoor E., Hosseinzadeh M. An intelligent Q-learning-based tree routing method in underwater acoustic sensor networks. Eng. Appl. Artif. Intell. 2025;152:110753. doi: 10.1016/j.engappai.2025.110753. [DOI] [Google Scholar]

[B17-sensors-26-00684] 17.Shi Z., Zhang Y. EA-VBF, an underwater acoustic sensor network protocol that balances node residual energy and packet relay count; Proceedings of the Third International Conference on Algorithms, Microchips, and Network Applications (AMNA 2024); Jinan, China. 8–10 March 2024; pp. 242–248. [Google Scholar]

[B18-sensors-26-00684] 18.Saleh M.H., Takruri H., Ismail R.M. Enhanced energy aware and void avoidance routing protocol based on vector based forwarding for underwater acoustic wireless sensor network; Proceedings of the 2024 14th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP); Rome, Italy. 17–19 July 2024; pp. 437–442. [Google Scholar]

[B19-sensors-26-00684] 19.Abdulzahra A.M.K., Al-Qurabat A.K.M., Abdulzahra S.A. Optimizing Energy Consumption in WSN-Based IOT Using Unequal Clustering and Sleep Scheduling Methods. Internet Things. 2023;22:100765–1007882. doi: 10.1016/j.iot.2023.100765. [DOI] [Google Scholar]

[B20-sensors-26-00684] 20.Yang J., Liu F., Cao J. Greedy discrete particle swarm optimization based routing protocol for cluster-based wireless sensor networks. J. Ambient Intell. Humaniz. Comput. 2024;15:1277–1292. doi: 10.1007/s12652-017-0515-3. [DOI] [Google Scholar]

[B21-sensors-26-00684] 21.Mahdi O.A., Al-Mayouf Y.R., Al-Obaidi S.S., Al-Attar B., Balogun H., Khan S. Hotspot Issue Handling and Reliable Data Forwarding Technique for Ocean Underwater Sensor Networks. Iraqi J. Comput. Sci. Math. 2025;6:37. doi: 10.52866/2788-7421.1307. [DOI] [Google Scholar]

[B22-sensors-26-00684] 22.Liu Y., He L., Fan G., Wang X., Zhang Y. A co-localization algorithm for underwater moving targets with an unknown constant signal propagation speed and platform errors. Sensors. 2024;24:3127. doi: 10.3390/s24103127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23-sensors-26-00684] 23.Khan M.U., Otero P., Aamir M. An energy efficient clustering routing protocol based on arithmetic progression for underwater acoustic sensor networks. IEEE Sens. J. 2024;24:6964–6975. doi: 10.1109/JSEN.2024.3354252. [DOI] [Google Scholar]

[B24-sensors-26-00684] 24.Ahmad I., Rahman T., Zeb A., Khan I., Ben Othman M.T., Hamam H. Cooperative energy-efficient routing protocol for underwater wireless sensor networks. Sensors. 2022;22:6945. doi: 10.3390/s22186945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25-sensors-26-00684] 25.Ye J., Jiang W. Routing Protocol for Underwater Wireless Sensor Networks Based on a Trust Model and Void-Avoided Algorithm. Sensors. 2024;24:7614. doi: 10.3390/s24237614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26-sensors-26-00684] 26.Wang Y., Luo L. IB-DARP: An Algorithm for Multi-Vessel Collaborative Task and Path Planning. J. Mar. Sci. Eng. 2026;14:165. doi: 10.3390/jmse14020165. [DOI] [Google Scholar]

[B27-sensors-26-00684] 27.Chu S., Lin M., Li D., Lin R., Xiao S. Adaptive reward shaping based reinforcement learning for docking control of autonomous underwater vehicles. Ocean Eng. 2025;318:120139. doi: 10.1016/j.oceaneng.2024.120139. [DOI] [Google Scholar]

[B28-sensors-26-00684] 28.Zhu R., Boukerche A., Feng L., Yang Q. A trust management-based secure routing protocol with AUV-aided path repairing for Underwater Acoustic Sensor Networks. Ad Hoc Netw. 2023;149:103212–103245. doi: 10.1016/j.adhoc.2023.103212. [DOI] [Google Scholar]

[B29-sensors-26-00684] 29.Kaiser M.S., Hossain M.S., Mahmud M., Andersson K. Energy-Efficient Routing for Cooperative Multi-AUV System. IFAC-Pap. 2022;55:112–116. doi: 10.1016/j.ifacol.2022.08.019. [DOI] [Google Scholar]

[B30-sensors-26-00684] 30.Ullah R., Akram B.A., Zafar A., Saeed A., Almotiri S.H., Al Ghamdi M.A. An efficient routing scheme based on node density for underwater acoustic sensor networks. KSII Trans. Internet Inf. Syst. 2024;18:1390–1411. doi: 10.3837/tiis.2024.05.013. [DOI] [Google Scholar]

[B31-sensors-26-00684] 31.Lateef H.M., Al-Qurabat K.M. An Overview of Using Mobile Sink Strategies to Provide Sustainable Energy in Wireless Sensor Networks. Int. J. Comput. Digit. Syst. 2024;16:797–812. doi: 10.12785/ijcds/160158. [DOI] [Google Scholar]

[B32-sensors-26-00684] 32.Luo H., Xu Z., Wang J., Yang Y., Ruby R., Wu K. Reinforcement Learning-Based Adaptive Switching Scheme for Hybrid Optical-Acoustic AUV Mobile Network. Wirel. Commun. Mob. Comput. 2022;2022:9471698. doi: 10.1155/2022/9471698. [DOI] [Google Scholar]

[B33-sensors-26-00684] 33.He Y., Han G., Jiang J., Wang H., Martinez-Garcia M. A trust update mechanism based on reinforcement learning in underwater acoustic sensor networks. IEEE Trans. Mob. Comput. 2022;21:811–821. doi: 10.1109/TMC.2020.3020313. [DOI] [Google Scholar]

[B34-sensors-26-00684] 34.Gao J., Ye F., Zhang K., Zhang Z. A Q-learning-based load balancing routing protocol for underwater wireless sensor networks; Proceedings of the 2024 IEEE 7th International Conference on Electronic Information and Communication Technology (ICEICT); Xi’an, China. 30 July–2 August 2024; pp. 55–60. [Google Scholar]

[B35-sensors-26-00684] 35.Wang C., Shen X., Wang H., Zhang H., Mei H. Reinforcement learning-based opportunistic routing protocol using depth information for energy-efficient underwater wireless sensor networks. IEEE Sens. J. 2023;23:17771–17783. doi: 10.1109/JSEN.2023.3285751. [DOI] [Google Scholar]

[B36-sensors-26-00684] 36.Nandyala C.S., Kim H.W., Cho H.S. QTAR: A Q-learning-based topology-aware routing protocol for underwater wireless sensor networks. Comput. Netw. 2023;222:109562. doi: 10.1016/j.comnet.2023.109562. [DOI] [Google Scholar]

[B37-sensors-26-00684] 37.Nazareth P., Chandavarkar B.R. Cluster-based multi-attribute routing protocol for underwater acoustic sensor networks. Wirel. Pers. Commun. 2024;134:781–808. doi: 10.1007/s11277-024-10926-6. [DOI] [Google Scholar]

[B38-sensors-26-00684] 38.Zhu W., Zeng X., Qiu Y. A routing protocol for underwater acoustic-optical hybrid wireless sensor networks based on packet hierarchy and void processing. IEEE Sens. J. 2024;24:5203–5214. doi: 10.1109/JSEN.2023.3348757. [DOI] [Google Scholar]

[B39-sensors-26-00684] 39.Sathish K., Cv R., Ab Wahab M.N., Anbazhagan R., Pau G., Akbar M.F. Underwater wireless sensor networks performance comparison utilizing Telnet and Superframe. Sensors. 2023;23:4844. doi: 10.3390/s23104844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40-sensors-26-00684] 40.Wang L., Hong Q.-X. A Drift-Aware Clustering and Recovery Strategy for Surface-Deployed Wireless Sensor Networks in Ocean Environments. Sensors. 2025;25:5883. doi: 10.3390/s25185883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41-sensors-26-00684] 41.Aziz A., Raza N., Tahir M., Jafri M.R., Junaid M. Adaptive-DBR in Underwater Wireless Sensor Networks to Increase Throughput and Lifetime. SN Comput. Sci. 2025;6:772. doi: 10.1007/s42979-025-04316-3. [DOI] [Google Scholar]

PERMALINK

HAO-AVP: An Entropy-Gini Reinforcement Learning Assisted Hierarchical Void Repair Protocol for Underwater Wireless Sensor Networks

Lijun Hao

Chunbo Ma

Jun Ao

Roles

Abstract

1. Introduction

2. Problem Statement

Figure 1.

Table 1.

3. Underwater Hybrid Acoustic-Optical Network Model Building

Table 2.

3.1. Network Topology Model

Figure 2.

3.2. Communication Model

4. Proposed HAO-AVP Protocol

Figure 3.

4.1. Reinforcement Learning-Based Routing Decision-Making via Entropy and Gini Coefficient

4.1.1. States, Actions, and Q-Value Functions

4.1.2. Reward Function Design Based on Entropy and Gini Coefficient

4.1.3. Learning and Decision-Making Process

4.2. Collaborative Routing Void Handling Mechanism

4.2.1. Markov Chain-Based Void Prediction

4.2.2. Void Identification via On-Demand Hop Discovery

4.3. Graded Void Repair Mechanism

5. Simulation Experiments and Performance Analysis

5.1. Simulation Environment and Parameter Settings

Table 3.

Table 4.

5.2. Simulation Results and Analysis

5.2.1. Performance Analysis with Varying Number of Nodes

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

5.2.2. Performance Analysis Under Different Mobility Speeds

Figure 10.

Figure 11.

Figure 12.

Figure 13.

Figure 14.

5.2.3. Analysis of Load Balancing Performance

Figure 15.

5.2.4. Analysis of the Effectiveness of the Graded Repair Mechanism

Figure 16.

5.2.5. Sensitivity Analysis of Reward Function Weights

Figure 17.

5.2.6. Analysis of Computational Complexity and Engineering Feasibility

Table 5.

5.2.7. Multi-AUV Collaboration and Cost–Benefit Sensitivity Analysis

Table 6.

5.2.8. Sensitivity Analysis of Optical Link Robustness Under Jerlov Water Types

Table 7.

5.3. Discussion

5.3.1. Advantages of the Proposed Method

5.3.2. Limitations and Future Work

5.3.3. Scenario Expansion

6. Conclusions

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases