Skip to main content
. 2021 Jun 30;21(13):4510. doi: 10.3390/s21134510

Table 4.

Machine learning CC algorithms.

Algorithm Learning Method Main Algorithm Characteristics
Remy [114] Offline Based on network and traffic models. Uses pre-specified objectives for CC and the lookup table for maximising the expected value of the objective function.
Indigo [115] Offline Congestion control oracles are generated that map the algorithm state to correct actions using an emulated network model. The training data is generated by applying an imitation learning algorithm that uses CC oracles.
PCC [116] Online Learning is based on live experimental evidence and the utility function which describes an objective. Uses multiple micro-experiments to make a rate control decision and gradient ascent-based online learning algorithm.
PCC Vivace [117] Online A variant of the PCC algorithm that uses a learning-theory informed framework. In addition to the bandwidth and loss rates, the proposed framework includes RTT gradients for utility function derivations.
Aurora [118] DRL Uses a small fully connected neural network model with changes in the sending rates such as agent actions. Computes statistics vectors based on latency gradient, latency ratio and the sending ratio. The algorithm is based on a fixed-length history of the statistic vectors representing its states. The algorithm gives rewards in terms of throughput improvements while penalising latency and packet losses using a linear reward function. The ML approach is based on the Proximal Policy Optimisation (PPO) algorithm in order to train DRL agents.
Eagle [119] DRL Model training is based on a long-short term memory (LSTM) neural network. The cross-entropy method is used to train a DRL agent. Uses a summary of the past four observation states and different reward functions for different cases. The actions of the ML agent regulate the discrete changes in the sending rate and the cwnd size.
Orca [120] DRL Uses DRL coarse-grain control and classic TCP CC schemes for fine-grain control. This ML approach is based on the recurrent neural network model. Exploits the twin delayed deep deterministic policy gradient (TD3) as the training algorithm. The reward function is calculated using packet delivery rate, delay and loss, which averages the values to compose the state space.