. 2023 Oct 6;23(19):8263. doi: 10.3390/s23198263

Table 2.

Related research studies on the use of RL for network management.

Reference	Year	Contribution	Application Domain	Algorithm Model
Jin et al. [60]	2019	The authors of this paper propose RL-based congestion-avoided routing for underwater acoustic sensor networks to reduce end-to-end delay and energy consumption.	Underwater acoustic sensor networks—RCAR	Q-learning
Di Valerio et al. [61]	2019	In this paper, the authors propose an RL-based data forwarding scheme for a node based on the number of unsuccessful transmissions. The node adaptively switches between single-path and multi-path routing to optimize energy consumption and the packet delivery ratio.	Underwater WSN—CARMA	Q-learning
Safdar Malik et al. [62]	2023	This paper presents a routing approach based on RL for CRs. The idea of this study is to add the channel selection decision capability to provide improvements in the average data rate and throughput.	CRs—RL-IoT	Q-learning
Mao et al. [63]	2019	In this paper, the authors propose a CNN-based scheme that continuously adapts and optimizes routing decisions based on network conditions. This approach computes the routing path combinations with high accuracy.	SDNs	CNN
Safdar et al. [64]	2015	The authors of this paper propose RL-based routing in CR ad hoc networks to reduce the protocol overhead and end-to-end delay and improve the packet delivery ratio.	CRs ad hoc networks—CRAHN	Q-learning
Stampa et al. [65]	2017	This paper proposes a DRL approach for optimizing routing in SDNs. The agent in this approach optimizes the routing policy based on traffic conditions to minimize network delays.	SDN	DQL
Cicioğlu et al. [66]	2023	The authors of this paper proposed an ML-assisted centralized link-state routing system for an SDN-based network. This scheme utilizes historical data of parameters such as the latency, bandwidth, signal-to-noise ratio, and distance to make routing decisions.	SDN—MLaR	Supervised learning
Cheng et al. [67]	2012	In this paper, the authors proposed load balancing in a multi-sink WSN. This approach divides the network into several zones based on the remaining energy of hotspots around the sink node. ML is applied to the mobile anchor, enabling it to adapt to traffic patterns and discover an optimal control policy for its movement.	WSNs—QAZP	Q-learning
Wei et al. [68]	2017	In this approach, the authors present a task scheduling algorithm for dynamic WSNs that minimizes the exchange of cooperative information and balances resource utilization.	WSNs—QS	Q-learning with shared value function
Wei et al. [69]	2019	In this paper, the authors introduce a Q-learning algorithm for task scheduling in WSNs based on support vector machine. Their proposed approach optimizes the application performance and reduces energy consumption.	WSNs—ISVM-Q	Q-learning and support vector machine
Ancillotti et al. [70]	2017	This paper proposes a link quality monitoring strategy for the RPL in IPv6-WSN using a multi-armed bandit algorithm. The proposed approach minimizes overhead and energy consumption by employing both synchronous and asynchronous monitoring.	WSNs—RL-Probe	Multi-armed bandit
Guo et al. [71]	2020	The authors of this paper propose a DRL-based QoS-aware secure routing protocol for the SDN-IoT. The primary objective is to design a routing protocol that efficiently routes traffic in a large-scale SDN.	SDN—DQSP	DQL
Künzel et al. [72]	2020	This paper introduces a Q-learning approach in which an agent adjusts weight values in an industrial WSN, leading to improved communication reliability and reduced network latency.	Industrial WSN—QLRR-WA	Q-learning
Jung et al. [73]	2017	In this paper, the authors introduce Q-learning-based geographic routing to enhance the performance of unmanned robotic networks and address the challenge of network overhead in high-mobility scenarios.	Unmanned robotic networks—QGeo	Q-learning
Sharma et al. [74]	2017	The authors of this paper introduce a tailored Q-learning algorithm for routing in WSNs with a focus on minimizing energy consumption, addressing the challenge of reliance on non-renewable energy sources.	WSNs	Tailored Q-learning
Su et al. [75]	2022	This paper presents an approach to enhance energy efficiency and prolong network lifetime using Q-learning-based routing for WSNs. It allows nodes to select neighboring nodes for transmission by considering various energy consumption factors, resulting in a reduced and balanced energy usage.	WSNs	Q-learning
Akbari et al. [76]	2020	This paper addresses the need for efficient routing structures in sensor networks to optimize their lifetime and reduce energy consumption. The paper combines fuzzy logic and RL, utilizing factors such as the remaining node energy, available bandwidth, and distance to the sink for routing decisions.	WSNs	RL with fuzzy logic
Liu et al. [77]	2019	The authors of this paper address the importance of connectivity solutions for wide-area applications in IoT networks. The proposed technique uses a distributed and energy-efficient RL-based routing algorithm for wide-area scenarios.	Wireless mesh IoT networks	Temporal difference
Sharma et al. [78]	2020	In this paper, the authors propose routing in opportunistic IoT networks using the Policy Iteration algorithm to automate routing and enhance message delivery possibilities.	IoT networks—RLProph	Policy Iteration algorithm
Chakraborty et al. [79]	2022	In this paper, the authors proposed a routing algorithm that adjusts its routing policy based on local information, aiming to find an optimal solution that balances the network latency and lifetime in wireless mesh IoT networks.	Wireless mesh IoT networks	Q-learning
Muthanna et al. [80]	2022	This paper presents a system that optimizes transmission policy parameters and implements multi-hop routing for a high QoS in LoRa networks.	LoRa IoT networks—MQ-LoRa	Soft actor-critic
Kaur et al. [81]	2021	The authors of this paper proposed an algorithm that divides the network into clusters based on sensor node data loads, preventing premature network failure. This paper addresses issues such as high communication delays, low throughputs, and poor network lifetimes.	IoT-enabled WSNs	DQL
Zhang et al. [82]	2021	The authors of this paper use recurrent neural networks and the deep deterministic policy gradient method to predict the network traffic distribution. They employ a double deep Q-network to make routing decisions based on the current network state.	IoT-enabled WSNs	RNN and the deep deterministic policy gradient
Krishnan et al. [83]	2021	This paper focuses on addressing the challenge of maximizing the network lifetime in WSNs. Q-learning is employed to facilitate automatic learning to find the shortest routes.	IoT-enabled WSNs	Q-learning
Serhani et al. [84]	2020	This paper explores the challenges of integrating MANETs with the IoT and focuses on the issue of network node mobility. The authors introduce an adaptive routing protocol that enhances link stability in both static and mobile scenarios.	MANETs-IoT systems—AQ-Routing	Q-learning
Pandey et al. [85]	2022	In this paper, the authors address the challenge of establishing large-scale connectivity among IoT devices. They introduce a multi-hop data routing approach utilizing the Q-learning method.	Low-power wide-area networks for IoT	Q-learning
Ren et al. [86]	2023	In this paper, the authors address the challenges of energy efficiency and network lifetime using the mean field RL method. Mean field theory simplifies interactions among nodes, and a prioritized sampling, loop-free algorithm prevents routing loops.	IoT-enabled WSNs	Mean field RL
Serhani et al. [87]	2023	In this paper, the authors introduce an efficient routing mechanism for the Internet of Medical Things. The proposed technique categorizes network traffic into three classes, optimizes paths based on QoS and energy metrics, and employs RL for path computation.	Internet of Medical Things—EQRSRL	Q-learning