Table 2.
Related research studies on the use of RL for network management.
| Reference | Year | Contribution | Application Domain | Algorithm Model |
|---|---|---|---|---|
| Jin et al. [60] | 2019 | The authors of this paper propose RL-based congestion-avoided routing for underwater acoustic sensor networks to reduce end-to-end delay and energy consumption. | Underwater acoustic sensor networks—RCAR | Q-learning |
| Di Valerio et al. [61] | 2019 | In this paper, the authors propose an RL-based data forwarding scheme for a node based on the number of unsuccessful transmissions. The node adaptively switches between single-path and multi-path routing to optimize energy consumption and the packet delivery ratio. | Underwater WSN—CARMA | Q-learning |
| Safdar Malik et al. [62] | 2023 | This paper presents a routing approach based on RL for CRs. The idea of this study is to add the channel selection decision capability to provide improvements in the average data rate and throughput. | CRs—RL-IoT | Q-learning |
| Mao et al. [63] | 2019 | In this paper, the authors propose a CNN-based scheme that continuously adapts and optimizes routing decisions based on network conditions. This approach computes the routing path combinations with high accuracy. | SDNs | CNN |
| Safdar et al. [64] | 2015 | The authors of this paper propose RL-based routing in CR ad hoc networks to reduce the protocol overhead and end-to-end delay and improve the packet delivery ratio. | CRs ad hoc networks—CRAHN | Q-learning |
| Stampa et al. [65] | 2017 | This paper proposes a DRL approach for optimizing routing in SDNs. The agent in this approach optimizes the routing policy based on traffic conditions to minimize network delays. | SDN | DQL |
| Cicioğlu et al. [66] | 2023 | The authors of this paper proposed an ML-assisted centralized link-state routing system for an SDN-based network. This scheme utilizes historical data of parameters such as the latency, bandwidth, signal-to-noise ratio, and distance to make routing decisions. | SDN—MLaR | Supervised learning |
| Cheng et al. [67] | 2012 | In this paper, the authors proposed load balancing in a multi-sink WSN. This approach divides the network into several zones based on the remaining energy of hotspots around the sink node. ML is applied to the mobile anchor, enabling it to adapt to traffic patterns and discover an optimal control policy for its movement. | WSNs—QAZP | Q-learning |
| Wei et al. [68] | 2017 | In this approach, the authors present a task scheduling algorithm for dynamic WSNs that minimizes the exchange of cooperative information and balances resource utilization. | WSNs—QS | Q-learning with shared value function |
| Wei et al. [69] | 2019 | In this paper, the authors introduce a Q-learning algorithm for task scheduling in WSNs based on support vector machine. Their proposed approach optimizes the application performance and reduces energy consumption. | WSNs—ISVM-Q | Q-learning and support vector machine |
| Ancillotti et al. [70] | 2017 | This paper proposes a link quality monitoring strategy for the RPL in IPv6-WSN using a multi-armed bandit algorithm. The proposed approach minimizes overhead and energy consumption by employing both synchronous and asynchronous monitoring. | WSNs—RL-Probe | Multi-armed bandit |
| Guo et al. [71] | 2020 | The authors of this paper propose a DRL-based QoS-aware secure routing protocol for the SDN-IoT. The primary objective is to design a routing protocol that efficiently routes traffic in a large-scale SDN. | SDN—DQSP | DQL |
| Künzel et al. [72] | 2020 | This paper introduces a Q-learning approach in which an agent adjusts weight values in an industrial WSN, leading to improved communication reliability and reduced network latency. | Industrial WSN—QLRR-WA | Q-learning |
| Jung et al. [73] | 2017 | In this paper, the authors introduce Q-learning-based geographic routing to enhance the performance of unmanned robotic networks and address the challenge of network overhead in high-mobility scenarios. | Unmanned robotic networks—QGeo | Q-learning |
| Sharma et al. [74] | 2017 | The authors of this paper introduce a tailored Q-learning algorithm for routing in WSNs with a focus on minimizing energy consumption, addressing the challenge of reliance on non-renewable energy sources. | WSNs | Tailored Q-learning |
| Su et al. [75] | 2022 | This paper presents an approach to enhance energy efficiency and prolong network lifetime using Q-learning-based routing for WSNs. It allows nodes to select neighboring nodes for transmission by considering various energy consumption factors, resulting in a reduced and balanced energy usage. | WSNs | Q-learning |
| Akbari et al. [76] | 2020 | This paper addresses the need for efficient routing structures in sensor networks to optimize their lifetime and reduce energy consumption. The paper combines fuzzy logic and RL, utilizing factors such as the remaining node energy, available bandwidth, and distance to the sink for routing decisions. | WSNs | RL with fuzzy logic |
| Liu et al. [77] | 2019 | The authors of this paper address the importance of connectivity solutions for wide-area applications in IoT networks. The proposed technique uses a distributed and energy-efficient RL-based routing algorithm for wide-area scenarios. | Wireless mesh IoT networks | Temporal difference |
| Sharma et al. [78] | 2020 | In this paper, the authors propose routing in opportunistic IoT networks using the Policy Iteration algorithm to automate routing and enhance message delivery possibilities. | IoT networks—RLProph | Policy Iteration algorithm |
| Chakraborty et al. [79] | 2022 | In this paper, the authors proposed a routing algorithm that adjusts its routing policy based on local information, aiming to find an optimal solution that balances the network latency and lifetime in wireless mesh IoT networks. | Wireless mesh IoT networks | Q-learning |
| Muthanna et al. [80] | 2022 | This paper presents a system that optimizes transmission policy parameters and implements multi-hop routing for a high QoS in LoRa networks. | LoRa IoT networks—MQ-LoRa | Soft actor-critic |
| Kaur et al. [81] | 2021 | The authors of this paper proposed an algorithm that divides the network into clusters based on sensor node data loads, preventing premature network failure. This paper addresses issues such as high communication delays, low throughputs, and poor network lifetimes. | IoT-enabled WSNs | DQL |
| Zhang et al. [82] | 2021 | The authors of this paper use recurrent neural networks and the deep deterministic policy gradient method to predict the network traffic distribution. They employ a double deep Q-network to make routing decisions based on the current network state. | IoT-enabled WSNs | RNN and the deep deterministic policy gradient |
| Krishnan et al. [83] | 2021 | This paper focuses on addressing the challenge of maximizing the network lifetime in WSNs. Q-learning is employed to facilitate automatic learning to find the shortest routes. | IoT-enabled WSNs | Q-learning |
| Serhani et al. [84] | 2020 | This paper explores the challenges of integrating MANETs with the IoT and focuses on the issue of network node mobility. The authors introduce an adaptive routing protocol that enhances link stability in both static and mobile scenarios. | MANETs-IoT systems—AQ-Routing | Q-learning |
| Pandey et al. [85] | 2022 | In this paper, the authors address the challenge of establishing large-scale connectivity among IoT devices. They introduce a multi-hop data routing approach utilizing the Q-learning method. | Low-power wide-area networks for IoT | Q-learning |
| Ren et al. [86] | 2023 | In this paper, the authors address the challenges of energy efficiency and network lifetime using the mean field RL method. Mean field theory simplifies interactions among nodes, and a prioritized sampling, loop-free algorithm prevents routing loops. | IoT-enabled WSNs | Mean field RL |
| Serhani et al. [87] | 2023 | In this paper, the authors introduce an efficient routing mechanism for the Internet of Medical Things. The proposed technique categorizes network traffic into three classes, optimizes paths based on QoS and energy metrics, and employs RL for path computation. | Internet of Medical Things—EQRSRL | Q-learning |