Deep Reinforcement Learning Assisted Beam Tracking and Data Transmission for 5G V2X Networks

Junliang Ye; Hamid Gharavi

doi:10.1109/tits.2023.3272548

. Author manuscript; available in PMC: 2024 Mar 6.

Published in final edited form as: IEEE trans Intell Transp Syst. 2023;24(9):10.1109/tits.2023.3272548. doi: 10.1109/tits.2023.3272548

Deep Reinforcement Learning Assisted Beam Tracking and Data Transmission for 5G V2X Networks

Junliang Ye ¹, Hamid Gharavi ¹

PMCID: PMC10916650 NIHMSID: NIHMS1924031 PMID: 38449573

Abstract

Beam tracking is a core issue in 5G vehicle-to-everything (V2X) networks. Specifically, higher beamforming gain is required to compensate for the path loss at higher frequencies, e.g., 5G FR2, to realize high data rate vehicle-toinfrastructure (V2I) communications. However, shorter time slots at higher frequencies, high velocity of vehicles, and unpredictable localization errors make this problem more challenging. Under these circumstances, wider beams can lead to higher beam tracking accuracy. Bear in mind that wider beams mean lower beamforming gain, which cannot compensate for high path loss at high frequencies and would further influence the data rate of V2I communications. Thus, there exists a trade-off between tracking accuracy and data rate in V2I communications. Furthermore, this problem needs to be solved within an extremely short time slot according to the high transmission frequency. To solve this problem, we propose a reinforcement learning (RL) assisted, high-resolution codebook-based beam tracking method. By comparing several different RL frameworks, we found that the twin delayed deep deterministic policy gradient (TD3) framework can help the roadside infrastructure (RSI) determine a proper beam pattern within a short duration. Moreover, according to the Hurst exponent analysis, recurrent neural networks (RNNs) are selected to improve the performance of the RL framework. The simulation results show that the proposed method performs well in tracking accuracy, data rate, and temporal efficiency.

Keywords: Vehicular networks, V2I, reinforcement learning, beam tracking

I. Introduction

Aa a core part of the fifth generation (5G) standard, V2X communications have achieved worldwide attention. With the research on technologies such as autonomous driving becoming more profound, industry and academia have also put forward higher requirements for the transmission capability of a vehicular network [1], [2], [3]. However, the transmission for vehicles requires both high reliability and low latency, together with a high data rate [4]. In order to solve this problem, a series of technologies are suggested to be used in the V2X network, such as massive multiple inputs and multiple outputs (MIMO), millimeter wave communications, hybrid precoding, and so on [5], [6], [7].

To achieve high data rate transmission in vehicular communication scenarios, transmission on a higher frequency is a potential solution [8]. The massive MIMO technology with large antenna arrays is necessary for roadside infrastructures (RSIs) to generate pencil-like beams to compensate for the high path loss on corresponding frequencies [9]. So far, there have been considerable studies focusing on this area [8], [10], [11], [12], [13], [14], [15]. The authors of [10] propose an improved beam sweeping scheme, which aims to guarantee the occurrence of beam alignment and optimizes the latency to achieve beam alignment. By using a deep learning (DL) estimator, the authors of [11] propose an angle of arrival (AoA) and angle of departure (AoD) estimation algorithm, called the two-step angle estimation (TSAE) algorithm, to handle the high mobility vehicular communications. By using a realistic power consumption model, the authors of [8] investigate the performance of the generalized hybrid beamforming array structure in a cellular infrastructure-to-everything (C-I2X) communication scenario using a single-path terahertz (THz) channel model. By analyzing the capacity and deriving closed-form upper bounds on the capacity of the massive MIMO vehicle-to-vehicle (V2V) non-orthogonal multiple access (NOMA) channels, a pair of power allocation optimization schemes are formulated to improve the bit error rate (BER) in [12]. A packet-level channel model in the form of a finite-state Markov chain (FSMC) is proposed in [13] for vehicle-to-infrastructure (V2I) elevation multiplexing uplinks. The proposed channel model provides an enabling tool for analyzing and simulating the V2I elevation multiplexing links in 5G networks. A three-dimensional (3D) geometry-based stochastic model for V2V massive MIMO channels is proposed in [14] to study the influence of vehicle density on channel statistics. The authors of [15] derive asymptotic spectral efficiency to explore the performance of large antenna arrays used in V2X networks. A bio-inspired algorithm was used in [16] to improve the latency and throughput performance of V2I communications.

Despite the above studies, fast and accurate beam tracking under high mobility conditions still faces many challenging issues. For instance, under the pencil-like beam patterns requirement, RSIs need to know the vehicle’s precise location information in order to achieve highly accurate beam tracking during the data transmission phase [17]. However, acquiring accurate location information cannot be easily achieved [18].

Furthermore, acquiring accurate location information is not the only problem. Spectral efficiency is also an important key performance metric of V2I communications [19]. To improve the spectral efficiency, the authors of [20] propose a radar-assisted algorithm to perform the beam alignment task in a V2I scenario. By applying the analytical results with the traffic and data information, a distributed multisource scheduling algorithm is proposed in [21] to enhance the data dissemination efficiency in V2I communications. There is a trade-off between tracking accuracy and channel capacity when a vehicle communicates with its associated RSI. The authors of [22] propose a V2I data offloading scheme with QoS provisioning. The performance evaluation shows that their method can offload more high-priority data compared to traditional V2I data offloading schemes. In [23], the authors propose a QoS-aware data offloading scheme for V2I data offloading with QoS provisioning (DOVEQ), demonstrating that DOVEQ outperforms other schemes by offloading more data with less offloading delay and running time. A method to improve the data packet reception rate of optical camera communication based V2I using a deep learning based region-of-interest (ROI) detector is introduced in [24]. In [25], the authors adopt regression models to estimate traffic states. They show that their approach improves the performance in terms of decoding error rate in time-varying traffic. A new data dissemination algorithm, named the offline algorithm for hybrid data dissemination (OFDD), which chooses feasible V2I disseminations based on V2V broadcasting, is proposed in [26]. Reference [27] investigates the minimum RSI height required in multi-lane highways to guarantee all-time 60GHz line of sight (LOS) connectivity for different lanes. Also, the numerical results in [27] analyze how both signal strength and overall system data rate decrease with RSI height.

While the above studies have made solid progress in this area, little attention has been paid to the impact of the beam width on the performance. For instance, an RSI can achieve very high tracking accuracy by using a wide beam, but this would be at the expense of compromising the channel capacity due to the low beamforming gain. Moreover, for a high fading channel, a wide beam with low beamforming gain may also cause a disconnection between a vehicle and an RSI due to poor SNR. Under these conditions, a suitable AI-driven method can potentially assist overcoming this problem. Currently, there is considerable research activity mainly focusing on AI applications in vehicular networks [28], [29], [30], [31], [32], [33], [34]. For example, a DL-based channel prediction method is considered to estimate channel responses for V2I communications [28]. The authors of [29] propose a channel estimation algorithm based on deep learning and show that their algorithm can enhance channel estimation accuracy, hence improving the bit error rate and robustness. Modeling the resource sharing of V2V and V2I communications as a multi-agent reinforcement learning (RL) problem is investigated by the authors in [30]. They offer a solution using a fingerprint-based deep Q-network method that is amenable to a distributed implementation. Furthermore, in [31] a modelfree Q-learning technique is proposed to predict the channel adaptively by selecting the best impulse response predictor at the current orthogonal frequency division multiplexing block. Using deep RL, the authors of [32] propose a Fast Reflection (LFR) algorithm, which autonomously learns from the observable traffic pattern to select desirable reflector angles. An AI-based model is also proposed in [33] to perform an efficient blind channel state information (CSI) prediction. To accelerate handovers, a method that is based on historical handover data and K-nearest neighbor (KNN) machine learning (ML) algorithms is investigated in [34] to predict handover decisions without involving time-consuming target selection and beam training processes. A novel integrated ML and coordinated beamforming solution is developed to overcome the coverage, reliability, and latency challenges of mmWave applications under high mobility conditions [35]. The authors of [36] propose an ML approach to achieve an efficient and fast analog beam selection for mmWave V2V communications. In [37] the authors study the temporal effects of dynamic blockage in vehicular networks and propose a deep RL framework to overcome dynamic blockage. To improve network capacity whilst suppressing the additional beam search overhead, a partitioned search method is designed with an ML framework in [38]. Despite significant research progress in this area, the trade-off between beamforming gain and tracking accuracy, still remains unresolved.

Therefore, to investigate this issue, we consider a V2I network for relatively high velocity vehicular communications in a highway or sub-urban scenario. The RSIs are assumed to be situated along a straight road where the mobility of VUs is restricted within two lanes. Downlink beam tracking and data transmissions of the V2I network are supported by RL, i.e., an RL agent determines the beam pattern of each RSI. This would require overcoming the following challenges: 1), Vehicles are moving with high velocity, hence their locations are fast changing. Under these conditions, the location information achieved by RSI via uplink feedback may not be accurate enough for downlink beam tracking and data transmission. This is mainly due to the movement of the vehicle during the time gap between uplink feedback and downlink beamforming. 2), Not highly accurate location information provided by vehicles to assist the beam tracking may cause unpredictable errors. 3), The trade-off issue between tracking accuracy and spectral efficiency in V2I communications. The contributions of this paper to address these shortcomings are summarized as follows:

An RL assisted beam tracking and data transmission method is proposed to improve the performance of V2I communications. We formulate the spectral efficiency and tracking accuracy optimization as an RL problem to improve the performance of V2I communications. The RL-assisted method basically requires defining the state, action, and reward of the RL framework. Thus, to solve the optimization problem, we map the parameters that influence the performance of V2I communications into state, action, and reward forms.
We firstly use a regular deep deterministic policy gradient (DDPG) framework to do the optimization. To improve the performance of the RL framework, we make a Hurst exponent analysis on the training data and found that there exists a strong temporal dependency. Then, we change the fully connected neural networks (NNs) used in the DDPG framework to hybrid NNs that contain fully connected NNs and recurrent neural networks (RNN).
To increase the tracking accuracy and spectral efficiency of V2I communications, we revise the DDPG framework with hybrid NNs to a twin delayed DDPG (TD3) framework. Moreover, through simulations, we find that the TD3 framework with hybrid NNs has the best performance as compared with the regular DDPG framework and DDPG framework with hybrid NNs.

The rest of the article is organized as follows: The system model is described in detail in Section II and Section III presents our optimization approach. Simulations of the proposed algorithms are carried out in Section IV, followed by the conclusion in Section V.

Notation:

Throughout this paper, $J$ is the imaginary unit, $\min (\cdot) / \max (\cdot)$ is the minimum/maximum operator, ${[\cdot]}_{i, j}$ represents the corresponding element of a vector or matrix, $Pr (\cdot)$ is the probability operation, $E (\cdot)$ is the expectation operator, $⌊ \cdot ⌋$ is the floor function, and $\det (\cdot)$ is the determinant operator, $ϕ_{E}$ is the empty set.

II. System Model

We consider a V2I network where RSIs are assumed to be situated along a straight road (e.g., under a highway or sub-urban scenario). The road consists of two lanes and the mobility of Vehicle Units (VUs) is restricted to these two lanes. In this paper, the downlink beam tracking and data transmissions of the V2I network are supported by RL technology., i.e., an RL agent determines the beam pattern of each RSI.

A. Network Architecture

At the beginning of each training episode, RSIs’ locations are configured to be governed by a uniform distribution within the range $[0, l_{R}]$ , where $l_{R}$ is the length of the road. Notice that this configuration is only used in the training process of the AI-agent to enhance the diversity of the training data. This is due to the fact that since the locations of the RSIs are fixed among all training episodes, the trained AI-agent can only be applied to the same RSIs locations. If we use a random distribution to generate the locations of the RSIs, the trained AI-agent can be used in other cases, regardless of the RSIs locations. Without loss of generality, we assume the road consists of two lanes, $L_{up}$ and $L_{down}$ . The movement of the vehicles is configured to be restricted to these two lanes. The width of a lane is 4 meters and the length of each lane is 500 meters. A set of vehicles is defined by ${V u_{k} ∣ k \in [1, N_{V}]}$ , where $N_{V}$ is the number of vehicles in the V2I network. Similarly, a set of the RSIs is defined by ${R s_{k} ∣ k \in [1, N_{R}]}$ , where $N_{R}$ is the number of RSIs in the network.

In this paper, the network is assumed to operate in a time-slotted manner. Based on the configuration in [1], [39], and [40], we choose OFDM for V2I downlink transmissions. Also, based on [1], [39], and [40], the V2I link frame has a duration of 10 ms, and a frame is divided into ten subframes, each with a duration of 1 ms. Depending on the transmission frequency, the number of slots per subframe and the subcarrier spacing (SCS) for the OFDM waveform can be flexible for NR V2X. To realize high data rate downlink transmission we choose FR2 (i.e., 24.25 GHz and 52.6 GHz) as the V2I downlink transmission frequency. The number of time slots per subframe is defined by $2^{μ_{s}}$ , and the corresponding duration of a time slot should be $2^{- μ_{s}}$ . Based on [40], when FR2 (from 24.25GHz to 52.6GHz) is used, the value of $μ_{s}$ can be 1, 2, or 3 in the absence of any gap between each time slot. The velocity of $V u_{n}$ at time slot $t$ is defined as $v_{n} (t)$ . Furthermore, the movement of a typical vehicle, $V u_{n}$ , is governed by three moving patterns,

Regular pattern: If there is no other vehicle that exists within a given distance ( $D_{p}$ in front of $V u_{n}$ ) then $V u_{n}$ will keep moving in the same lane.
Merging pattern: If there exists another vehicle $V u_{o}$ within a given distance, $D_{p}$ , in front of $V u_{n}$ in the same lane and the velocity satisfies the condition $v_{n} (t) - v_{o} (t) \geq v_{thr}$ (where $v_{thr}$ is the pre-defined threshold), then $V u_{n}$ will check its surroundings to see whether there are vehicles on the other lane within a distance, $D_{M}$ , in front of or behind $V u_{n}$ . If not, then $V u_{n}$ will merge into the other lane.
Following pattern: If $V u_{n}$ fails to merge into the other lane (e.g., there are vehicles within a certain range in the other lane), to avoid an accident the $V u_{n}$ will keep moving in the same lane and reduce its velocity to $v_{n}^{*} (t) = v_{n} (t) - Δ v_{n} < v_{o} (t)$ . Also, $V u_{n}$ will keep observing the surroundings. Once the surrounding environment satisfies the condition for the merging pattern, it will follow that pattern to avoid potential traffic congestion caused by low-velocity vehicles. Once it completes the merging pattern, it will change to a regular pattern.

While in a regular pattern, the velocity of the vehicle will follow,

v_{n} (t) = τ_{n} v_{n} (t - 1) + (1 - τ_{n}) {\bar{v}}_{n} + \sqrt{1 - τ_{n}^{2}} e_{v},

(1)

where ${\bar{v}}_{n}$ is the average velocity of $V u_{n}$ , and $e_{V}$ is a uniformly distributed random variable, i.e., $e_{v} \sim N (0, σ_{v})$ . Also, to make the mobility model as realistic as possible, the value of the average velocity ${\bar{v}}_{n}$ is set to range from 20 m/s to 30 m/s. $τ_{n}$ is a parameter that reflects the degree of randomness of the VU’s mobility, e. g., $τ_{n} = 1$ and $τ_{n} = 0$ indicate the lowest and highest degree of randomness, respectively. Notice that the duration of a time slot is very short, thus, we can ignore the changes in VUs’ velocity and acceleration during a time slot without loss of generality. The network architecture can be shown as Fig. 1.

B. V2I Beam Tracking and Communication

The beam tracking and communication process is shown in Fig. 2. Recall that the network operates in a time-slotted manner. The duration of each time slot is defined by $t_{Ts} = 2^{- μ_{s}}$ . Here we define the nearest RSI of $V u_{n}$ as $R s_{m}$ , and $V u_{n}$ will be associated to $R s_{m}$ before it moves out of the coverage of $R s_{m}$ . As shown in Fig. 2, the V2I communication process can be summarized into two phases;

Pre-access and beam training phase: In this phase, a typical RSI, $R s_{m}$ will use a sub-6GHz band to sweep all directions with a wide beam pattern to capture $V u_{n}$ . Once it finishes beam sweeping, feedback will be sent from $V u_{n}$ to $R s_{m}$ through the uplink with sub-6GHz and omnidirectional antennas. Then, $R s_{m}$ will perform beam refining to narrow the beam for downlink transmission. To achieve this, the $R s_{m}$ will use the channel state information, which is provided by $V u_{n}$ through the uplink feedback. Once $V u_{n}$ moves out of the coverage of $R s_{m}$ , or $R s_{m}$ fails to track $V u_{n}$ in the following beam tracking phase the communication between $V u_{n}$ and $R s_{m}$ will be turned to pre-access phase.
Beam tracking and data transmission phase: In this phase, $R s_{m}$ will keep adjusting the beam direction and width to maintain the link quality between $R s_{m}$ and $V u_{n}$ . At each time slot, $R s_{m}$ will first determine a proper beam pattern based on the previous feedback information from $V u_{n}$ through uplink transmission. Then, if the link quality is high enough, e.g., the channel capacity is higher than a given threshold, $R s_{m}$ will maintain the beam tracking phase and keep transmitting data to $V u_{n}$ . However, if the link quality drops (mostly because of an unpredicted large movement of $V u_{n}$ ), $R s_{m}$ will start a new pre-access phase to re-capture $V u_{n}$ .

Since there is no data transmitted during the pre-access phase, $R s_{m}$ needs to avoid this case to improve the long-term average channel capacity. However, the movement of $V u_{n}$ cannot be easily predicted since $R s_{m}$ only has the feedback information (including location, velocity, and so on) of the previous time slots. In most cases however, even such information is not accurate enough as the location information provided by GPS (e.g., inaccuracies in meters level). In practice there are some scenarios where GPS signals cannot be utilized (e.g., driving through tunnels). Under these conditions, the received signal strength indication (RSSI)-based positioning method can be used to obtain the location information of VUs through the pre-access phase [41]. Thus, similar to [42], in this paper we assume the location information provided by $V u_{n}$ always has a small normally distributed error $e_{lo}$ , i.e., $e_{lo} \sim N (0, σ_{lo})$ where the value of $e_{lo}$ ranges from several centimeters to decimeters. Moreover, since RSI uses the massive MIMO technology to enable high data rate downlink transmission, the beam width will be pencil-like to compensate for the high channel fading. In this case, even if the value of the error is as small as several centimeters, it still may influence the performance of the link and force the communication to repeat the pre-access phase. Furthermore, we model the failure of uplink transmission/decoding as a stationary stochastic process $Θ_{F} = {𝓝_{i} = 0.01 ∣ i = 1, 2, \dots, \infty}$ , where $𝓝_{i}$ is the probability that an uplink transmission/decoding failure occurs at $f_{i}^{sub}$ . Under these conditions, no feedback information will be received by the RSI. Moreover, since blockage effects caused by other vehicles can be significant at mmWave frequencies [37], [43], [44], we model the blockage failure as a stationary stochastic process $Θ_{B} = {𝓜_{i} = 0.01 ∣ i = 1, 2, \dots, \infty}$ , where $𝓜_{i}$ is the probability that there is a blockage failure at $f_{i}^{sub}$ . To solve these problems, we propose an AI-assisted beam tracking method as described next.

C. AI-Assisted V2I Beam Tracking and Data Transmission

1). codebook Based Beamforming:

As we mentioned in the last section, in the pre-access phase $R s_{m}$ and $V u_{n}$ will set up an initial link through beam training. However, since $V u_{n}$ is probably moving with a high velocity, the link is not stable. Thus, $R s_{m}$ needs to keep adjusting the beam pattern to maintain an acceptable link quality. In this paper, the link quality is measured by the channel capacity, which is contained in the feedback message that $V u_{n}$ sends to $R s_{m}$ through uplink. We assume that $R s_{m}$ is equipped with a transmit uniform planar array (UPA) of $M_{x} \times M_{y}$ antenna elements and $V u_{n}$ is equipped with a receiver UPA of $N_{x} \times N_{y}$ antenna elements. Since the duration of a time slot is very short, the channel condition during each time slot is supposed to be stable, i.e., the channel matrix between $R s_{m}$ and $V u_{n}$ will not change during each time slot. The channel between the transmit array and receiver array at time slot $t$ can be expressed by (2), shown at the bottom of the next page, [45], where $G_{TR}$ and $G_{AR}$ are the transmit and receive antenna gains; $α_{TR}^{m, n} (\cdot)$ and $α_{AR}^{m, n} (\cdot)$ represent the array steering vectors of $R s_{m}$ and $V u_{n}$ , respectively; $α_{LoS}^{m, n} (f_{c}, d_{t}^{m, n})$ and $α_{NL}^{k, m, n} (f_{c}, d_{t}^{k, m, n})$ are the path losses of the LoS path and NLoS path, respectively; $f_{c}$ is the carrier frequency; $d_{t}^{m, n}$ is the length of the LoS path at time slot $t$ ; $d_{t}^{k, m, n}$ is the length of the $k th$ NLoS path at time slot $t$ ; $N_{pa}$ represents the number of NLoS paths; $θ_{DP}^{m, n} (t)$ and $θ_{AR}^{m, n} (t)$ are the azimuth AoD and AoA of the LoS path at time slot $t$ , respectively; $θ_{AR}^{k, m, n} (t)$ and $θ_{DP}^{k, m, n} (t)$ are the azimuth AoD and AoA of the $k th$ NLoS path at time slot $t$ , respectively. Similarly, $φ_{DP}^{m, n} (t) / φ_{AR}^{m, n} (t)$ is the elevation AoD/AoA of the LoS path at time slot $t$ , and $φ_{AR}^{k, m, n} (t) / φ_{DP}^{k, m, n} (t)$ is the elevation AoD/AoA of the the $k th$ NLoS path at time slot $t$ . More specifically, the number of multipath components, $N_{pa}$ , is a uniform distributed variable within a range of [1] and [5]. For the azimuth AoD and AoA of the $k th$ NLoS path at time slot $t$ , i.e., $θ_{DP}^{k, m, n} (t)$ and $θ_{AR}^{k, m, n} (t)$ , we have

{\begin{array}{l} θ_{DP}^{k, m, n} (t) = θ_{DP}^{m, n} (t) + ϑ_{DP}^{k, m, n} (t) \\ θ_{AR}^{k, m, n} (t) = θ_{AR}^{m, n} (t) + ϑ_{AR}^{k, m, n} (t) \end{array}

(3a)

where $ϑ_{DP}^{k, m, n} (t)$ and $ϑ_{AR}^{k, m, n} (t)$ follow two independent uniform distributions on $[- π, π]$ (i.e., −180°, 180°). Similarly, for the elevation AoD and AoA of the $k th$ NLoS path at time slot $t$ , i.e., $φ_{DP}^{k, m, n} (t)$ and $φ_{AR}^{k, m, n} (t)$ , we have

{\begin{array}{l} φ_{DP}^{k, m, n} (t) = φ_{DP}^{m, n} (t) + ψ_{DP}^{k, m, n} (t) \\ φ_{AR}^{k, m, n} (t) = φ_{AR}^{m, n} (t) + ψ_{AR}^{k, m, n} (t) \end{array}

(3b)

where $ψ_{DP}^{k, m, n} (t)$ and $ψ_{AR}^{k, m, n} (t)$ follow two independent uniform distributions on $[- π / 4, π / 4]$ (i.e., −45°, 45°). For $M_{x} \times M_{y}$ -elements UPA, the array steering vector can be expressed by

a_{UPA} (θ, φ) = [1, \dots, e^{J π (m_{x} \sin φ \sin θ + m_{y} \cos θ)}, {\dots, e^{J π ((M_{x} - 1) \sin φ \sin θ + (M_{y} - 1) \cos θ)}]}^{T},

(3c)

where $m_{x}$ and $m_{y}$ are the antenna elements index with $0 \leq m_{x} \leq M_{x}$ and $0 \leq m_{y} \leq M_{y}$ , respectively; $r_{A} = λ_{C} / 2$ is the antenna element spacing, $J$ is the imaginary unit, $λ_{C}$ is the wavelength; $θ$ and $φ$ are variables of the function $a_{UPA} (θ, φ)$ . Since the duration of a time slot is short, here we assume that the beam pattern, i.e., precoding vector, is reselected in every subframe instead of every time slot. Denoting $N_{TR} = M_{x} \times M_{y}$ , $N_{RC} = N_{x} \times N_{y}$ , based on (3), the channel capacity of the link at time slot $t$ can be expressed by,

C_{t}^{m, n} = \log_{2} \det (I_{N_{RC}} + \frac{P_{TR}}{σ_{s}^{2}} H_{t}^{m, n} F_{i}^{m, n} {(F_{i}^{m, n})}^{H} {(H_{t}^{m, n})}^{H}),

(4)

where $σ_{s}^{2}$ is the power of additive white Gaussian noise (AWGN) of the channel between the RSI and UE; $I_{N_{RC}}$ is an $N_{RC} \times N_{RC}$ identity matrix; $\det (\cdot)$ is the determinant of the given matrix; $F_{i}^{m, n}$ is the precoding vector for subframe $f_{i}^{sub}$ and $P_{TR}$ is the transmit power of the UPA.

A regular mechanism to drive $R s_{m}$ to change the beam pattern is based on defining a threshold for the SNR of the current link. If the SNR of the current link is beyond the threshold, then $R s_{m}$ will keep the current beam pattern. Otherwise, it will choose another beam pattern to improve the link quality. However, this method is not suitable for V2I communication because the quality of the link may dramatically change when the vehicle’s velocity is very high. To solve this problem, $F_{i}^{m, n}$ is selected from a pre-defined codebook by an AI agent, which controls the actions of $R s_{m}$ . The pre-defined codebook is shown as follows;

H_{t}^{m, n} = α_{LoS}^{m, n} (f_{c}, d_{t}^{m, n}) G_{TR} G_{AR} α_{AR}^{m, n} (θ_{AR}^{m, n} (t), φ_{AR}^{m, n} (t)) α_{TR}^{m, n} {(θ_{DP}^{m, n} (t), φ_{DP}^{m, n} (t))}^{H} + \sum_{k = 1}^{N_{pa}} α_{NL}^{k, m, n} (f_{c}, d_{t}^{k, m, n}) G_{TR} G_{AR} \cdot α_{AR}^{k, m, n} (θ_{AR}^{k, m, n} (t), φ_{AR}^{k, m, n} (t)) α_{TR}^{k, m, n} {(θ_{DP}^{k, m, n} (t), φ_{DP}^{k, m, n} (t))}^{H} .

(2)

The construction of the codebook can be found in [46]. Since the movement of $V u_{n}$ is hard to predict, a codebook with the same beam width is not robust enough to handle all the situations that may occur during beam tracking. This is why we have used a multi-level codebook. For example, a codeword $F_{P}^{{a, b}}$ in the codebook will be selected by the RL agent to do beam tracking and data transmission in each subframe $f_{i}^{sub}$ , i.e., $F_{P}^{{a, b}} \to F_{i}^{m, n}$ in (4). The parameter $a$ denotes the level of the codeword and $b$ is the location of the codeword at level $a$ . Since the vehicles’ dynamics on elevation is relatively small compared with the azimuth cases, we do not consider the variation of elevation AoD and AoA in this paper. Then, the corresponding beam width and beam direction of $F_{P}^{{a, b}}$ can be expressed by

{\begin{array}{l} B_{W}^{a, b} = \frac{π}{2^{a}} \\ B_{D}^{a, b} = \frac{π}{2} - \frac{b π}{2^{a}} . \end{array}

(5)

To improve the beamforming gain and compensate for excessive path losses at higher frequencies, a codeword in the lower level of the codebook is not considered for beam tracking and data transmission. Thus, in this paper, RSIs only use the codeword from levels 5 to 7 of the codebook to do beam tracking and data transmission. Based on (5), the corresponding beam width is from $π / 32 = {5.625}^{\circ}$ to $π / 128 = {1.40625}^{\circ}$ [47].

2). RL Framework:

The communication process between $R s_{m}$ and $V u_{n}$ can be summarized as follows:

At the beginning of each subframe $f_{i}^{sub}$ (i.e., the first time slot of $f_{i}^{sub}$ ), $V u_{n}$ will send a feedback message (including information of channel capacity, velocity, and location) through the uplink.
After $R s_{m}$ successfully receives and decodes the feedback message, the AI agent uses this information and previous experiences to determine a codeword (i.e., beam pattern) for $R s_{m}$ to track $V u_{n}$ . If $R s_{m}$ cannot determine a codeword $F_{i}^{m, n}$ in a given time slot due to the computation complexity, the next time slot in $f_{i}^{sub}$ will be used to select the codeword. If $R s_{m}$ fails to determine a beam pattern within a subframe, then its data rate will be 0. We define this type of failure as overtime failure. If the beam tracking at subframe $f_{i - 1}^{sub}$ succeeded, then the codeword $F_{i - 1}^{m, n}$ for $f_{i - 1}^{sub}$ will be used for tracking and data transmission until $F_{i}^{m, n}$ is selected.
Subsequently, $R s_{m}$ will use $F_{i}^{m, n}$ as the codeword for precoding to track $V u_{n}$ . If the tracking process succeeds, then $R s_{m}$ will use $F_{i}^{m, n}$ to do data transmission as well as for the following time slots until the next codeword $F_{i + 1}^{m, n}$ in the next subframe $f_{i + 1}^{sub}$ . If the tracking process fails, $V u_{n}$ will also send a feedback message to $R s_{m}$ . Then the communication process will be turned to the pre-access phase, and no downlink data transmission will take place by $V u_{n}$ before $F_{i + 1}^{m, n}$ is selected. We assume that the pre-access phase can be done before the next subframe $f_{i + 1}^{sub}$ starts. Once $f_{i + 1}^{sub}$ begins, the communication process will go back to step 1).

The following figure shows the communication process within a subframe.

Since $R s_{m}$ needs to determine a beam pattern, i.e., codeword, for beam tracking and data transmission after it receives the feedback from $V u_{n}$ at each subframe $f_{i}^{sub}$ in a complicated environment, here we use a deep deterministic policy gradient (DDPG) as a basic RL framework, to assist the beam tracking and data transmission.

Details of the DDPG agent can be found in [48]. Here we define $E x p_{i} = {s_{i}^{t}, a_{i}^{t}, r_{i}^{t}, s_{i + 1}^{t + Δ t}}$ to as the agent experience gained from the environment at subframe $f_{i}^{sub}$ , time slot $t$ , where $s_{i}^{t}$ is the state of the environment at subframe $f_{i}^{sub}$ , time slot $t$ ; $a_{i}^{t}$ is the action taken by the agent at subframe $f_{i}^{sub}$ , time slot $t$ ; $r_{i}^{t}$ is the current reward that the AI agent has achieved through the action $a_{i}^{t}$ , and $s_{i + 1}^{t + Δ t}$ is the state of the environment at subframe $f_{i + 1}^{sub}$ , time slot $t + Δ t$ (here $Δ t$ denotes the duration of a subframe), respectively. $N_{ba}$ is the capacity of the minibatch, and $π_{θ} (\cdot)$ is the policy that the DDPG agent uses to determine an action, i.e., codeword. Notice that there exists a relationship between $t$ and $i$ , i.e., $⌊ t / 4 ⌋ = i$ , where $⌊ \cdot ⌋$ is the floor function. We extend $v_{n} (t)$ to $v_{n} (i, t)$ , because the AI agent is updated in each subframe. Moreover, if a failure of uplink transmission/decoding occurs, the feedback information will be an empty set, i.e., $E x p_{i} = ϕ_{E}$ ; once a blockage failure happens, the reward of $f_{i}^{sub}$ will be 0. Let us define $s_{i}^{t}$ by

s_{i}^{t} = {[v_{n} (i, t), x_{n}^{Vu} (i, t), y_{n}^{Vu} (i, t), x_{m}^{Rs} (i, t), y_{m}^{Rs} (i, t)]}^{T} .

(6)

where $v_{n} (i, t)$ is the velocity of $V u_{n}$ at subframe $f_{i}^{sub}$ , time slot $t$ ; $v_{n} (i, t)$ , $x_{n}^{Vu} (i, t)$ , $y_{n}^{Vu} (i, t)$ , $x_{m}^{Rs} (i, t)$ , $y_{m}^{Rs} (i, t)$ are the coordinates of $V u_{n}$ and the associated RSI at subframe $f_{i}^{sub}$ , time slot $t$ , respectively. Normally, $x_{n}^{Vu} (i, t)$ and $y_{n}^{Vu} (i, t)$ are achieved by GPS or other localization algorithms, such as the RSSI-based method, which may have an estimation error. Thus, we add a normally distributed random variable $e_{lo}$ to the exact location to make the training data more realistic and the AI agent more robust. $r_{i}^{t}$ is the corresponding channel capacity when the selected codeword is used for data transmission at subframe $f_{i}^{sub}$ , time slot $t$ . $a_{i}^{t}$ is the RSI selected codeword at subframe $f_{i}^{sub}$ , time slot $t$ . $a_{i}^{t}$ can be expressed by,

a_{i}^{t} = [F_{i, t}^{R s_{1}^{V}, 1}, F_{i, t}^{R s_{2}^{V}, 2}, \dots, F_{i, t}^{R s_{n}^{V}, n}, \dots, F_{i, t}^{R s_{N_{V}}^{V}, N_{V}}] .

(7)

where $R s_{n}^{V}$ is the associated RSI of $V u_{n}$ during subframe $f_{i}^{sub}$ . Notice that, since the network is operating at a high frequency (i.e.,24.25 GHz and 52.6 GHz), the duration of a timeslot is very short. Thus, the AI agent is configured to update in each subframe instead of each time slot. The flowchart of the beam tracking process is shown as follows.

III. Problem Formulation

In this paper, the system is operated in a time-slotted manner. Based on the discussion in Section-B, for a given subframe $f_{i}^{sub}$ , time slot $t$ , the communication status between a VU $V u_{n}$ and the associated RSI; $R s_{m}$ , may either be in the pre-access phase or beam tracking phase, depending on the result of the beam tracking at the corresponding subframe. We define the beam width and direction that the $R s_{m}$ uses to track $V u_{n}$ at subframe $f_{i}^{sub}$ , time slot $t$ as $B w_{i, t}^{m, n}$ and $B d_{i, t}^{m, n}$ , respectively. Then, only in these two cases, the beam tracking phase will be turned to the pre-access phase at subframe $f_{i}^{sub}$ , time slot $t$ :

The SNR of the link between $R s_{m}$ and $V u_{n}$ at subframe $f_{i}^{sub}$ , and time slot $t$ , is lower than a given threshold $γ_{th}$ , i.e.,
$ρ_{i, t}^{m, n} \leq γ_{th}, ρ_{i, t}^{m, n} = \det (\frac{P_{TR}}{σ_{s}^{2}} H_{i, t}^{m, n} F_{i, t}^{m, n} \times {(F_{i, t}^{m, n})}^{H} {(H_{i, t}^{m, n})}^{H}) .$ (8)
Where $F_{i, t}^{m, n}$ is the codeword (i.e., beam pattern) used by $R s_{m}$ and $V u_{n}$ at subframe $f_{i}^{sub}$ , time slot $t$ (i.e., $m = R s_{n}^{V}$ ); $ρ_{i, t}^{m, n}$ and $H_{i, t}^{m, n}$ are the SNR and channel matrix of the link between $R s_{m}$ and $V u_{n}$ at subframe $f_{i}^{sub}$ , time slot $t$ , respectively.
The location of $V u_{n}$ at subframe $f_{i}^{sub}$ , time slot $t$ is out of the coverage of $F_{i, t}^{m, n}$ , i.e.,
$κ_{i, t}^{m, n} > \frac{B w_{i, t}^{m, n}}{2}, κ_{i, t}^{m, n} = \arctan (\frac{y_{m}^{Rs} (i, t) - y_{n}^{Vu} (i, t)}{x_{m}^{Rs} (i, t) - x_{n}^{Vu} (i, t)}) + \frac{π}{2} - B d_{i, t}^{m, n},$ (9)
where $y_{m}^{Rs} (i, t)$ , $x_{m}^{Rs} (i, t)$ , $y_{n}^{Vu} (i, t)$ and $x_{n}^{Vu} (i, t)$ are the coordinates of $R s_{m}$ and $V u_{n}$ at subframe $f_{i}^{sub}$ , time slot $t$ , respectively; $κ_{i, t}^{m, n}$ is the deviation angle from $V u_{n}$ ’s location to $B d_{i, t}^{m, n}$ .

Suppose all the above conditions are satisfied, then the communication between $R s_{m}$ and $V u_{n}$ will be turned to pre-access phase, and no data, except the control signal will be sent in this phase. Otherwise, the communication between $R s_{m}$ and $V u_{n}$ will maintain in the beam tracking and data transmission phase. In this case, the AI agent $Θ_{A}$ will pick up a codeword, $F_{P}^{{a, b}}$ (i.e., $F_{i, t}^{m, n} = F_{P}^{{a, b}}$ ), from a pre-defined codebook CB to track $V u_{n}$ and transmit data. If conditions (8) and (9) are satisfied, then the $V u_{n}$ is successfully tracked by $R s_{m}$ . Consequently, the corresponding channel capacity can be expressed as,

C_{i, t}^{m, n} = \log_{2} \det (I_{N_{RC}} + \frac{P_{TR}}{σ_{S}^{2}} H_{i, t}^{m, n} F_{i, t}^{m, n} {(F_{i, t}^{m, n})}^{H} {(H_{i, t}^{m, n})}^{H}) .

(10)

We also assume that each VU can only be associated with the nearest RSI, So the average channel capacity at subframe $f_{i}^{sub}$ , time slot $t$ , can be expressed by,

\bar{C_{i, t}} = \frac{1}{N_{V}} \sum_{m = 1}^{N_{V}} C_{i, t}^{m, n} 1 (ρ_{i, t}^{m, n} > γ_{th}) 1 (κ_{i, t}^{m, n} \leq \frac{B w_{i, t}^{m, n}}{2})) .

(11)

Obviously, the value of $\bar{C_{i, t}}$ can be achieved by the uplink feedback of each VU at subframe $f_{i}^{sub}$ , time slot $t$ . The problem can be formulated by,

maximize C_{AV} = \frac{1}{t} \sum_{k = 1}^{t} \bar{C_{i, t}} subject to \bar{C_{i, t}} = \frac{1}{N_{V}} \sum_{m = 1}^{N_{V}} C_{i, t}^{m, n} 1 (ρ_{i, t}^{m, n} > γ_{th}) \times 1 (κ_{i, t}^{m, n} \leq \frac{B w_{i, t}^{m, n}}{2}) C_{i, t}^{m, n} = \log_{2} \det (I_{N_{RC}} + \frac{P_{TR}}{σ_{s}^{2}} \times H_{i, t}^{m, n} F_{i, t}^{m, n} {(F_{i, t}^{m, n})}^{H} {(H_{i, t}^{m, n})}^{H}) F_{i, t}^{m, n} = F_{P}^{a, b} \in CB .

(12)

Now, we can extend the action defined in (7) to

a_{i}^{t} = {a_{i, t}^{1}, a_{i, t}^{2}, \dots, a_{i, t}^{N_{V}}, b_{i, t}^{1}, b_{i, t}^{2}, \dots, b_{i, t}^{N_{V}}},

(14)

where $a_{i, t}^{m}$ and $b_{i, t}^{m}$ are equal to the $a$ and $b$ in $F_{i, t}^{m, n} = F_{P}^{{a, b}}$ , respectively.

The state of the environment at subframe $f_{i}^{sub}$ time slot $t$ can be expressed by (14), where $x_{n}^{Vu} (i, t)$ , $y_{n}^{Vu} (i, t)$ , $x_{m}^{Rs} (i, t)$ , $y_{m}^{Rs} (i, t)$ are the coordinates of $V u_{n}$ and the associated RSI at subframe $f_{i}^{sub}$ , and time slot $t$ , respectively. Also, by defining $r_{i}^{t} = \bar{C_{i, t}}$ , (12) can be solved as an RL problem.

The (target) actor-network and (target) critic-network used in the DDPG framework consist of three hidden layers; each having 300 neurons. The activation functions used in the NNs are Relu functions. The capacity of the replay memory is 10000, and the batch size is 256. The learning rate is set as 0.0001, the updating rate of the target networks is 0.005, the variance of the normally distributed exploration noise is 0.05, and the discounted factor is set at 0.99. The optimizer used in the DDPG framework is the Adam optimizer. The number of iterations in each episode is 2000. In this paper, we adopt an offline training phase. Thus, when applying the trained model to the actual network, the RL agent only needs to do forward propagation at each subframe. After 200 episodes’ training, the performance of the RL framework is shown in Fig. 6 where the transmission frequency is denoted by $F_{T}$ . For the sake of comparison, we consider the well-known Extended Kalman Filter (EKF) based method [49]. More specifically, the EKF is used to track the VUs with a $π / 64 = {2.8125}^{\circ}$ beam width. Once a VU has been tracked, the associated RSI will use a beam pattern with the same beam width for data transmissions.

Fig. 6. — Average spectral efficiency and tracking accuracy of DDPG framework.

As we can see, the corresponding tracking accuracy is only around 10% and the spectral efficiency is as low as 2bits/s/Hz, while the tracking accuracy and spectral efficiency of the EKF-based method are as high as 60% and 11bits/s/Hz. Obviously, the performance of the DDPG framework is not acceptable for V2I beam tracking and data transmissions. This may be caused by the following possibilities: 1) inability to accurately predict VUs’ locations; 2) the NNs used in the DDPG framework fails to capture the characteristics of VU’s mobility; 3) the task is beyond the capabilities of the DDPG framework.

s_{i}^{t} = {[\begin{matrix} v_{1} (i, t), x_{1}^{Vu} (i, t), y_{1}^{Vu} (i, t), x_{1}^{Rs} (i, t), y_{1}^{Rs} (i, t) \\ v_{2} (i, t), x_{2}^{Vu} (i, t), y_{2}^{Vu} (i, t), x_{2}^{Rs} (i, t), y_{2}^{Rs} (i, t) \\ ⋮ \\ v_{N_{V}} (i, t), x_{N_{V}}^{Vu} (i, t), y_{N_{V}}^{Vu} (i, t), x_{N_{V}}^{Rs} (i, t), y_{N_{V}}^{Rs} (i, t) \end{matrix}]}^{T},

(13)

A. Dependency Analysis

For such a challenging task, the performance of the regular DDPG framework is not acceptable. The problem may have been caused by the RL framework itself or the NN used in the RL framework. 1), The relationship between state $s_{i}^{t}$ , action $a_{i}^{t}$ , reward $r_{i}^{t}$ , and next state $s_{i + 1}^{t + Δ t}$ , is too complicated, which makes the DDPG framework with a simple structure incapable of handling it.; 2), the structure of the training data is too complex to handle for a regular fully connected NN. To improve the performance of the beam tracking accuracy and data transmission capacity, we analyze the structure of the training data. Since the action at each subframe is selected based on the current state $s_{i}^{t}$ and previous experiences (also including previous states), we choose state data to make the analysis. If the data has a strong temporal dependence, then a recurrent neural network (RNN) may be better than a fully connected NN to solve the problem in (12).

The Hurst exponent is commonly used to analyze the dependency of a given data set. It is a measure of long-term memory of time series. Studies involving the Hurst exponent were originally developed in hydrology for practical matters of determining optimum dam sizing for the Nile river’s volatile rain and drought conditions that had been observed over a long period of time [50].

Based on [51] and [52], the Hurst exponent can be estimated using three typical methods: 1) the Periodogram Method; 2) the Variance-Time Analysis Method; 3) and the Rescaled Adjusted Range Statistic (R/S) Method. Here we choose the R/S method to evaluate the Hurst exponent of the training data. For a given series $𝓧_{i}$ , we define the partial sum of $𝓧_{i}$ as,

𝓨 (k) = \sum_{i = 1}^{k} 𝓧_{i},

(15)

and sample variance is denoted by,

𝓢^{2} (k) = \frac{\sum_{i = 1}^{k} 𝓧_{i} - \frac{𝓨 {(k)}^{2}}{k}}{k}, k \geq 1.

(16)

Furthermore, the R/S statistic is defined as,

\frac{𝓡 (k)}{𝓢 (k)} = \frac{\max_{0 \leq h \leq k} (0, 𝓨 (h) - \frac{h}{k} 𝓨 (k))}{𝓢 (k)} - \frac{\min_{0 \leq h \leq k} (0, 𝓨 (h) - \frac{h}{k} 𝓨 (k))}{𝓢 (k)}, k \geq 1.

(17)

A log-log plot of the R/S statistic versus the number of points of the aggregated series should be a straight line with the slope being an estimate of the Hurst parameter. A value H in the range (0.5, 1.0) indicates that $𝓧_{i}$ has long-term positive autocorrelation, meaning that more high values are expected in the series. A value in the range (0, 0.5) indicates that $𝓧_{i}$ has a long-term switching behavior between high and low values in adjacent pairs. This indicates that a single high value is likely to be followed by a low value. Also, this tendency to switch between high and low values will continue over a long period of time.

By this method, we evaluate the Hurst exponent of position and velocity data of $V u_{n}$ . The Hurst exponent of the velocity data and position data are 0.14 and 0.92, respectively. The Hurst exponent for velocity is close to 0 because when a vehicle’s velocity is very high, it’s more likely that it will reduce the velocity, and vice versa. The Hurst exponent for position is close to 1 because the vehicle keeps moving towards the same direction in our simulation model (also, in most of the application scenarios). This result indicates that the training data series has a longer-term dependency, which justifies a use of RNN to make the prediction. Nonetheless, the main drawback is that it takes a much longer time to train an RNN, such as LSTM, than a fully connected NN.

B. LSTM/GRU Network

Since the mobility of vehicles has a temporal dependency, we use an RNN, instead of a regular fully connected NN to enhance the performance of the AI agent. We choose LSTM and GRU to assist the RL framework. For instance, by adopting gate functions into the cell structure, LSTM can handle the problem of long-term dependencies [53].

Gated recurrent units (GRUs) are a gating mechanism in RNNs and were introduced in [54]. Compared with LSTM, GRU has fewer parameters as it lacks an output gate. Although GRU has a simpler structure, the performance of the GRU was found to be similar to that of LSTM [55]. Furthermore, GRUs have been shown to exhibit better performance on smaller datasets [55]. The structures of NNs used in the actor network, target actor network, critic network, and target critic network will be evaluated next,

The performance of the configured DDPG framework is shown in Fig. 8, where LSTM/GRU is used in the DDPG framework, which has an input size and output sizes of 128, and a step size of 32. We use a hyperbolic tangent (tanh) function as the activation function for the LSTM/GRU. Other hyperparameters are the same as a regular DDPG framework. As we can observe from Figs 6 and 8, the performance of the configured DDPG framework is higher than the regular DDPG framework. However, the improved performance is still not sufficient to handle the V2I beam tracking and data transmissions.

C. TD3 Framework With LSTM/GRU

As we can see, sometimes DDPG can achieve a great performance, but is frequently brittle with respect to hyperparameters and other types of tuning. A common issue for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which can lead to policy breaking because it exploits errors in the Q-function. A Twin Delayed DDPG (TD3) algorithm can solve this issue by using the following three important configurations:

double critics: TD3 learns two Q-functions (estimated by critic networks) instead of one and uses the smaller one of the corresponding two Q-values to evaluate the targets in the Bellman error loss functions.
delayed policy updates: TD3 updates the policy (and target networks) less frequently than the Q-function. Paper [56] recommends one policy update for every two Q-function updates.
target policy noise: TD3 adds noise to the target action (also called policy noise). Thus, it’s harder for the policy to exploit Q-function errors by smoothing out Q-values.

Together, these three tricks result in substantially improved performance over baseline DDPG, which only contains a single critic, updates the policy and the Q-function simultaneously, and has no policy noise [48]. Although the TD3 has a higher learning capability, it also has a more complex structure, which requires more training time. Thus, as mentioned before, we adopt an offline training phase. Notice that for the regular reinforcement learning framework, a common approach to construct a minibatch is to randomly pick up $N_{ba}$ experiences from the replay buffer. However, this process will not work when the RNN is used. This is mainly because the RNN requires the training data to be temporally related and randomly picked $N_{ba}$ experiences will not fit this requirement. Thus, we need to make a change with the regular construction process of the minibatch, such as; 1), randomly pick up an experience $E x p_{k}$ from the replay buffer; 2), pick up the following $\frac{N_{ba}}{N_{sba}} - 1$ experiences from the replay buffer to construct a sub-batch, i.e., ${E x p_{k}, E x p_{k + 1}, \dots, E x p_{k + \frac{N_{ba}}{N_{sba}} - 1}, E x p_{k + \frac{N_{ba}}{N_{sba}}}}$ ; 3), repeat steps 1) and 2) for $N_{sba}$ times, to construct a mismatch with $N_{ba}$ experiences. With this process, the TD3 framework can make full use of the GRU network and further improve the performance of the RL agent.

In this paper, the normally distributed noise with a mean value of 0 and variance of 0.1 is added to the target action. In addition, no dropout and batch normalization are used. This is because both are not suitable for the RL framework with RNN. Also, based on [55], the policy is updated for every two Q-function updates. Other hyperparameters are the same as the regular DDPG framework with GRUs.

IV. Simulation Results

We carried out our simulations according to the parameters defined in Table-I.

TABLE I.

Values Of Symbols Used in Simulation

Symbol	Definition/explanation	Value

$l_{R}$	Length of the road	500 m
$W_{la}$	Width of the lane	4 m
$N_{V}$	Number of VUs	10
$N_{S}$	Number of RSIs	10
$μ_{s}$	SCS exponent	2
$D_{p}$	Protecting distance of the regular pattern	30 m
$D_{M}$	Protecting distance of the merging pattern	30 m
$v_{thr}$	Threshold of velocity gap	5 m/s
$τ_{n}$	Degree of randomness	0.9
$σ_{v}$	Variance of $e_{v}$	3 m/s
$σ_{lo}$	Variance of $e_{lo}$	0.1 m
$P_{TR}$	Transmit power of $R s_{m}$	0.1 W
$N_{TR}$	Number of transmit antennas	64 × 16
$N_{RC}$	Number of receive antennas	4 × 4
$γ_{th}$	SNR threshold	10 dB
$N_{ba}$	Capacity of minibatch	256
$N_{sba}$	Capacity of sub-minibatch	32

Open in a new tab

Based on [40], the maximum carrier bandwidth of FR2 when $μ_{s} = 2$ is 200 MHz. Thus, during the training process, the spectrum of the network is randomly chosen as 200 $N_{V}$ MHz bandwidth (i.e, 24.25 GHz, 52.6 GHz: 200 MHz bandwidth per VU). Also, in the simulation, we add a normally distributed noise with each coordinate to represent estimation errors on the location information. The average velocity of each VU ranges from 20 m/s to 30 m/s (72 km/h to 108 km/h, or 45 mile/h to 67.5 mile/h), and instant velocity, $v_{n} (i, t)$ , changes over time and is governed by the mobility model. Furthermore, we choose $μ_{s} = 2$ so that a time slot will not be too short for the RL agent to complete beam selection.

The average spectral efficiency of configured/regular TD3 assisted V2I communications is shown in Fig. 9. Compared with Figs. 6 and 8, the spectral efficiency of data transmissions is significantly improved. Also, the performance of TD3 with GRU/LSTM is better than that of the regular TD3 framework with the same transmission frequency and training episode. The performance of all RL methods increases with the training episode. For the same RL framework, when the training episode is fixed, the spectral efficiency of the 52.6 GHz transmission frequency is lower than that of the 24.25 GHz transmission frequency. This is because the path loss increases at higher carrier frequencies (i.e., 52.6 GHz). Also, as we can see, the performance of TD3 with GRU is better than TD3 with LSTM. This is because LSTM is more complicated than GRU. Thus, LSTM requires more time for training compared with GRU. Notice that the performance of the EKF-based method is still better than our proposed method. This is based on the fact that the TD3 framework adds noise to both actions and target actions, so the performance under training will be much lower than the performance while testing. The corresponding testing performance is displayed in Fig. 11.

Similarly, the tracking accuracy is significantly improved by changing the RL framework from regular DDPG to TD3 with GRU/LSTM. Furthermore, the performance of TD3 with GRU is superior to that of the regular TD3 framework and the LSTM-assisted TD3 framework, with the same transmission frequency and training episode. An interesting observation is that the tracking accuracy of TD3 with GRU for different transmission frequencies, i.e., 24.25 GHz and 52.6 GHz, are almost the same. This is because the GRUs help the TD3 framework to better predict the mobility of VUs, which makes the RSI select a better codeword, i.e., beam width and direction, to improve the tracking accuracy while considering the higher path loss on higher frequencies. Also, the corresponding testing performance is depicted in Fig. 11.

The average spectral efficiency and tracking accuracy of TD3 with GRU/LSTM while testing is shown in Fig. 11. As we can see, the testing performances of TD3 with GRU/LSTM are much higher than those while training. As mentioned before, this is due to the characteristics of the TD3 framework. As shown in Fig. 11, the average tracking accuracy is high enough, i.e., over 90%, to handle the V2I communication even under a 52.6 GHz transmission frequency with an average velocity (30 m/s). Also, the average spectral efficiency of TD3 with GRU/LSTM is much higher than that of the regular DDPG framework. Furthermore, all the RL based methods outperform the EKF-based method in the testing phase, which is opposite in Fig. 9 and Fig. 10. This is because Figs. 9 and 10 show the training performance while Fig. 11 depicts the testing performance. RL-based methods, such as DDPG and TD3, are configured to add noise to action $a_{i}^{t}$ during the training phase. Therefore, to encourage the RL agent to explore the action space sufficiently (i.e., the action $a_{i}^{t}$ during training phase may not be the action that can achieve the best performance due to noise). When the fully trained RL agent is used in testing, it does not need to explore the action space any longer hence, the action $a_{i}^{t}$ during the testing phase is the most suitable for the best performance. On the other hand, the EKF-based method does not use exploration noise to make decisions. Thus, the performance of the EKF-based method is better than RL-based methods during the training phase, but worse than the RL-based methods while testing. These observations indicate that the TD3 framework with GRU/LSTM can effectively capture the mobility of VUs and help RSIs to make a better codeword, i.e., beam pattern selection. Moreover, the performance of TD3 with GRU is better than TD3 with LSTM because of the higher requirement for training an LSTM network.

Fig. 10. — Average tracking accuracy of configured/regular TD3 framework.

The selecting probability of different beam widths is shown in Fig. 12. As we can observe, for the lower carrier frequency (24.25 GHz), the AI agent supported by TD3 with GRU is more likely to select a beam pattern with a wider beam width. When the transmission frequency is 52.6 GHz, the AI agent supported by TD3 with GRU starts to choose narrower beam width, e.g., $π / 128 = {1.40625}^{\circ}$ , to achieve a higher beamforming gain to compensate for the higher path loss. For the AI agent supported by TD3 with LSTM, the selection probability varies significantly with the transmission frequencies and beam widths. This is because the LSTM requires more training time compared to GRU, which is less effective in handling the tradeoff between spectral efficiency and tracking accuracy during the training phase. This observation indicates that the TD3 framework with GRU can successfully help the RSI to determine a more robust beam pattern while considering VU mobility and channel conditions.

The tracking latency of the proposed TD3 with the GRU/LSTM method and baseline method is shown in Fig. 13. The baseline method uses a pruned codebook to support the fast beam tracking in V2I communications [57]. Based on [57], this approach can reduce tracking latency by two orders of magnitude. As we can observe from Fig. 13, our method outperforms the baseline method, even when the size of the codebook is as small as 256. Furthermore, the tracking latency increases dramatically with the scale of the codebook when the baseline method is used but remains stable with our method. This is because our method uses the RL framework to determine the codeword directly without searching the codebook. Therefore, our method can use a larger codebook to assist beamforming to obtain higher beamforming gain and beam tracking accuracy. Moreover, the tracking latency for TD3 with LSTM is a little higher than that of TD3 with GRU. This is because LSTM has a more complicated structure than GRU.

Fig. 14 shows the average spectral efficiency and tracking accuracy of the proposed method when tested with the Intelligent Driving Model (IDM) [58]. IDM is a widely used model of vehicle group movement that can simulate well the following behavior of moving VUs. Here we choose TD3 with GRU as an example. As shown in Fig. 14, the test performance of our method is still good despite changing the mobility model to IDM. Compared with Fig. 11, the average spectral efficiency and tracking accuracy are both higher. This is because the mobility model in this paper considers more frequent changes in velocity and direction of moving VUs. Therefore, when using IDM to model VUs’ mobility, the RL agent can achieve a better prediction of the trajectories of moving VUs.

The standard deviation of spectral efficiency and tracking accuracy of TD3 with GRU while training is shown in Fig. 15. For TD3 with GRU, the standard deviation of spectral efficiency ranges from [0.41,0.97] with $F_{T} = 24.25 GHz$ , and [0.36,0.78] with $F_{T} = 52.6 GHz$ , respectively. The standard deviation of tracking accuracy ranges from [0.023, 0.048] with $F_{T} = 24.25 GHz$ , and [0.024,0.047] with $F_{T} = 52.6 GHz$ , respectively. Also, as we can see, the standard deviation of spectral efficiency/tracking accuracy increases with training episodes. This is because the average spectral efficiency/tracking accuracy increases gradually with the number of training episodes.

V. Conclusion

This paper mainly focuses on downlink beam tracking and data transmission for V2I communications. Due to the short duration of each time slot, high velocity of vehicles, and the estimation errors of vehicles’ locations, vehicles’ mobility is difficult to predict. Even more, the higher transmission frequency results in more issues, such as higher path loss and following trade-off between spectral efficiency and tracking accuracy. Thus, an RL-assisted method is proposed to address these issues while considering all the aspects of vehicles’ mobility and transmission frequency together. Furthermore, we make an analysis of the mobility of vehicles and find a high temporal dependency. Based on the analysis, we modify the regular DDPG framework to the TD3 framework with LSTM/GRU. The simulation results verify that the TD3 framework with GRU outperforms the regular DDPG framework, the DDPG with GRU, regular TD3 framework, and TD3 framework with LSTM. Furthermore, the proposed TD3 framework with GRU achieves high spectral efficiency whilst maintaining high tracking accuracy.

Fig. 4. — DDPG-assisted beam tracking process.

Fig. 7. — Structure of NNs used in the AI agent.

Biographies

graphic file with name nihms-1924031-b0016.gif

Junliang Ye (Member, IEEE) received the B.Sc. degree in communication engineering from the China University of Geosciences, Wuhan, China, in 2011, and the Ph.D. degree from the Huazhong University of Science and Technology, Wuhan, in 2018. He is currently a Guest Researcher with the National Institute of Standards and Technology (NIST), U.S. Department of Commerce, Gaithersburg, MD, USA. His research interests include heterogeneous networks, stochastic geometry, mobility-based access models of cellular networks, millimeter communication. Wave communications, and next-generation wireless communication.

graphic file with name nihms-1924031-b0017.gif

Hamid Gharavi (Life Fellow, IEEE) received the Ph.D. degree from Loughborough University, U.K., in 1980. He joined the Visual Communication Research Department, AT&T Bell Laboratories, Holmdel, NJ, USA, in 1982. He was then transferred to Bell Communications Research (Bellcore) after the AT&T-Bell divestiture, where he became a Consultant on video technology and a Distinguished Member of Research Staff. In 1993, he joined Loughborough University, as a Professor and the Chair of communication engineering. Since September 1998, he has been with the National Institute of Standards and Technology (NIST), U.S. Department of Commerce, Gaithersburg, MD, USA. His research interests include smart grids, wireless multimedia, mobile communications, wireless systems, mobile ad hoc networks, and visual communications. He was a Core Member of the Study Group XV (Specialist Group on Coding for Visual Telephony), International Communications Standardization Body CCITT (ITU-T) and a member of the IEEE 2030 standard working group. He received the Charles Babbage Premium Award from the Institute of Electronics and Radio Engineering in 1986 and the IEEE Circuits and Systems Society Darlington Best Paper Award in 1989. He was a recipient of the Washington Academy of Science Distinguished Career in Science Award in 2017. He was a TPC Co-Chair of IEEE SmartGridComm in 2010 and 2012. He served as a member for the Editorial Board of proceedings of the IEEE from January 2003 to December 2008. He has been a Guest Editor of a number of special issues of the proceedings of IEEE, including Smart Grid, Sensor Networks and Applications, Wireless Multimedia Communications, Advanced Automobile Technologies, and Grid Resilience. He was the Editorin-Chief of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY and IEEE WIRELESS COMMUNICATIONS. He served as a Distinguished Lecturer of the IEEE Communication Society.

References

[1].Garcia MHC et al. , “A tutorial on 5G NR V2X communications,” IEEE Commun. Surveys Tuts, vol. 23, no. 3, pp. 1972–2026, 3rd Quart., 2021. [Google Scholar]
[2].Gyawali S, Xu S, Qian Y, and Hu RQ, “Challenges and solutions for cellular based V2X communications,” IEEE Commun. Surveys Tuts, vol. 23, no. 1, pp. 222–255, 1st Quart., 2020. [Google Scholar]
[3].Ahmed E. and Gharavi H, “Cooperative vehicular networking: A survey,” IEEE Trans. Intell. Transp. Syst, vol. 19, no. 3, pp. 996–1014, Mar. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Zhou H, Xu W, Chen J, and Wang W, “Evolutionary V2X technologies toward the Internet of Vehicles: Challenges and opportunities,” Proc. IEEE, vol. 108, no. 2, pp. 308–323, Feb. 2020. [Google Scholar]
[5].Borcoci E, Dragulinescu A-M, Li FY, Vochin M-C, and Kjellstadli K, “An overview of 5G slicing operational business models for Internet of Vehicles, maritime IoT applications and connectivity solutions,” IEEE Access, vol. 9, pp. 156624–156646, 2021. [Google Scholar]
[6].Maraqa O, Rajasekaran AS, Al-Ahmadi S, Yanikomeroglu H, and Sait SM, “A survey of rate-optimal power domain NOMA with enabling technologies of future wireless networks,” IEEE Commun. Surveys Tuts, vol. 22, no. 4, pp. 2192–2235, 4th Quart., 2020. [Google Scholar]
[7].Kim H, Chae C-B, and Kim KS, “Hybrid precoding based on monopulse ratio for millimeter wave systems with limited feedback,” IEEE Access, vol. 8, pp. 175329–175346, 2020. [Google Scholar]
[8].Busari SA et al. , “Generalized hybrid beamforming for vehicular connectivity using THz massive MIMO,” IEEE Trans. Veh. Technol, vol. 68, no. 9, pp. 8372–8383, Sep. 2019. [Google Scholar]
[9].Brambilla M, Combi L, Matera A, Tagliaferri D, Nicoli M, and Spagnolini U, “Sensor-aided V2X beam tracking for connected automated driving: Distributed architecture and processing algorithms,” Sensors, vol. 20, no. 12, p. 3573, Jun. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Lien S, Kuo Y-C, Deng D-J, Tsai H-L, Vinel A, and Benslimane A, “Latency-optimal mmWave radio access for V2X supporting next generation driving use cases,” IEEE Access, vol. 7, pp. 6782–6795, 2018. [Google Scholar]
[11].Huang S, Zhang M, Gao Y, and Feng Z, “MIMO radar aided mmWave time-varying channel estimation in MU-MIMO V2X communications,” IEEE Trans. Wireless Commun, vol. 20, no. 11, pp. 7581–7594, Nov. 2021. [Google Scholar]
[12].Chen Y, Wang L, Ai Y, Jiao B, and Hanzo L, “Performance analysis of NOMA-SM in vehicle-to-vehicle massive MIMO channels,” IEEE J. Sel. Areas Commun, vol. 35, no. 12, pp. 2653–2666, Dec. 2017. [Google Scholar]
[13].Zhang R, Zhong Z, Zhao J, Li B, and Wang K, “Channel measurement and packet-level modeling for V2I spatial multiplexing uplinks using massive MIMO,” IEEE Trans. Veh. Technol, vol. 65, no. 10, pp. 7831–7843, Mar. 2016. [Google Scholar]
[14].Huang Z. and Cheng X, “A 3-D non-stationary model for beyond 5G and 6G vehicle-to-vehicle mmWave massive MIMO channels,” IEEE Trans. Intell. Transp. Syst, vol. 23, no. 7, pp. 8260–8276, Jul. 2022. [Google Scholar]
[15].Zhang Y, Xiang Z, Lu L, Han S, and Meng W, “Asymptotic performance analysis for mmWave V2X cellular networks,” in Proc. IEEE 94th Veh. Technol. Conf. (VTC-Fall; ), Sep. 2021, pp. 1–6. [Google Scholar]
[16].Ye J. and Gharavi H, “Attractor selection based limited feedback hybrid precoding for uplink V2I communications,” IEEE Trans. Veh. Technol, vol. 69, no. 4, pp. 3943–3953, Apr. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Li W, Yang W, Yang L, Xiong H, and Hui Y, “Bidirectional positioning assisted hybrid beamforming for massive MIMO systems,” IEEE Trans. Commun, vol. 69, no. 5, pp. 3367–3378, May 2021. [Google Scholar]
[18].Huang Y, Wu Q, Lu R, Peng X, and Zhang R, “Massive MIMO for cellular-connected UAV: Challenges and promising solutions,” IEEE Commun. Mag, vol. 59, no. 2, pp. 84–90, Feb. 2021. [Google Scholar]
[19].Ali A, Gonzlez-Prelcic N, and Ghosh A, “Millimeter wave V2I beam-training using base-station mounted radar,” in Proc. IEEE Radar Conf. (RadarConf), Apr. 2019, pp. 1–5. [Google Scholar]
[20].González-Prelcic N, Méndez-Rial R, and Heath RW Jr., “Radar aided beam alignment in mmWave V2I communications supporting antenna diversity,” in Proc. Inf. Theory Appl. Workshop (ITA), 2016, pp. 1–7. [Google Scholar]
[21].Ni Y, He J, Cai L, and Bo Y, “Data uploading in hybrid V2V/V2I vehicular networks: Modeling and cooperative strategy,” IEEE Trans. Veh. Technol, vol. 67, no. 5, pp. 4602–4614, May 2018. [Google Scholar]
[22].Saleem Y, Mitton N, and Loscri V, “A vehicle-to-infrastructure data offloading scheme for vehicular networks with QoS provisioning,” in Proc. Int. Wireless Commun. Mobile Comput. (IWCMC; ), Jun. 2021, pp. 1442–1447. [Google Scholar]
[23].Saleem Y, Mitton N, and Loscri V, “A QoS-aware hybrid V2I and V2V data offloading for vehicular networks,” in Proc. IEEE 94th Veh. Technol. Conf. (VTC-Fall; ), Sep. 2021, pp. 1–5. [Google Scholar]
[24].Choi DN, Jin SY, Lee J, and Kim BW, “Deep learning technique for improving data reception in optical camera communication-based V2I,” in Proc. 28th Int. Conf. Comput. Commun. Netw. (ICCCN; ), Jul. 2019, pp. 1–2. [Google Scholar]
[25].Kwon J. and Park H, “Reliable data dissemination strategy based on systematic network coding in V2I networks,” in Proc. Int. Conf. Inf. Commun. Technol. Converg. (ICTC; ), Oct. 2019, pp. 744–746. [Google Scholar]
[26].Yang L, Zhang L, He Z, Cao J, and Wu W, “Efficient hybrid data dissemination for edge-assisted automated driving,” IEEE Internet Things J, vol. 7, no. 1, pp. 148–159, Jan. 2020. [Google Scholar]
[27].Chattopadhyay A, Chandra A, and Bose C, “Impact of RSU height on 60 GHz mmWave V2I LOS communication in multi-lane highways,” in Proc. IEEE 93rd Veh. Technol. Conf. (VTC-Spring; ), Apr. 2021, pp. 1–5. [Google Scholar]
[28].Li T-H, Khandaker MRA, Tariq F, Wong K-K, and Khan RT, “Learning the wireless V2I channels using deep neural networks,” in Proc. IEEE 90th Veh. Technol. Conf. (VTC-Fall; ), Sep. 2019, pp. 1–5. [Google Scholar]
[29].Liao Y, Cai Z, Sun G, Tian X, Hua Y, and Tan X, “Deep learning channel estimation based on edge intelligence for NR-V2I,” IEEE Trans. Intell. Transp. Syst, vol. 23, no. 8, pp. 13306–13315, Aug. 2022. [Google Scholar]
[30].Liang L, Ye H, and Li GY, “Spectrum sharing in vehicular networks based on multi-agent reinforcement learning,” IEEE J. Sel. Areas Commun, vol. 37, no. 10, pp. 2282–2292, Oct. 2019. [Google Scholar]
[31].Bogale TE, Wang X, and Le LB, “Adaptive channel prediction, beamforming and scheduling design for 5G V2I network: Analytical and machine learning approaches,” IEEE Trans. Veh. Technol, vol. 69, no. 5, pp. 5055–5067, May 2020. [Google Scholar]
[32].Zhang L, Chen X, Fang Y, Huang X, and Fang X, “Learning-based mmWave V2I environment augmentation through tunable reflectors,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. [Google Scholar]
[33].Yang J, Li L, and Zhao M-J, “A blind CSI prediction method based on deep learning for V2I millimeter-wave channel,” in Proc. IEEE 28th Int. Conf. Netw. Protocols (ICNP), Oct. 2020, pp. 1–6. [Google Scholar]
[34].Yan L. et al. , “Machine learning-based handovers for sub-6 GHz and mmWave integrated vehicular networks,” IEEE Trans. Wireless Commun, vol. 18, no. 10, pp. 4873–4885, Oct. 2019. [Google Scholar]
[35].Alkhateeb A, Alex S, Varkey P, Li Y, Qu Q, and Tujkovic D, “Deep learning coordinated beamforming for highly-mobile millimeter wave systems,” IEEE Access, vol. 6, pp. 37328–37348, 2018. [Google Scholar]
[36].Yang Y, Gao Z, Ma Y, Cao B, and He D, “Machine learning enabling analog beam selection for concurrent transmissions in millimeter-wave V2V communications,” IEEE Trans. Veh. Technol, vol. 69, no. 8, pp. 9185–9189, Aug. 2020. [Google Scholar]
[37].Chen S, Vu K, Zhou S, Niu Z, Bennis M, and Latva-Aho M, “1 A deep reinforcement learning framework to combat dynamic blockage in mmWave V2X networks,” in Proc. 2nd 6G Wireless Summit (6G SUMMIT), Mar. 2020, pp. 1–5. [Google Scholar]
[38].Gui J, Liu Y, Deng X, and Liu B, “Network capacity optimization for cellular-assisted vehicular systems by online learning-based mmWave beam selection,” Wireless Commun. Mobile Comput, vol. 2021, pp. 1–26, Mar. 2021. [Google Scholar]
[39].Architecture Enhancements for V2X Services (V16.2.0, Release 16), document TS 23.285, 3GPP, Dec. 2019.
[40].Physical Channels and Modulation (V16.0.0, Release 16), document TS 38.211 NR, 3GPP, Mar. 2020.
[41].Wang Y, Yin X, Cai G, Wang G, Guo S, and Liang K, “Tunnel vehicle RSSI positioning algorithm based on LMLF model,” in Proc. 4th Int. Conf. Cloud Comput. Internet Things (CCIOT), Dec. 2019, pp. 29–32. [Google Scholar]
[42].Fan J. and Ma G, “Characteristics of GPS positioning error with nonuniform pseudorange error,” GPS Solutions, vol. 18, no. 4, pp. 615–623, Oct. 2014. [Google Scholar]
[43].Anjinappa CK and Guvenc I, “Millimeter-wave V2X channels: Propagation statistics, beamforming, and blockage,” in Proc. IEEE 88th Veh. Technol. Conf. (VTC-Fall; ), Aug. 2018, pp. 1–6. [Google Scholar]
[44].Wang Y, Narasimha M, and Heath RW Jr., “MmWave beam prediction with situational awareness: A machine learning approach,” in Proc. IEEE 19th Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC; ), Jun. 2018, pp. 1–5. [Google Scholar]
[45].Yuan H, Yang N, Yang K, Han C, and An J, “Hybrid beamforming for terahertz multi-carrier systems over frequency selective fading,” IEEE Trans. Commun, vol. 68, no. 10, pp. 6186–6199, Oct. 2020. [Google Scholar]
[46].Wu W, Liu D, Li Z, Hou X, and Liu M, “Two-stage 3D codebook design and beam training for millimeter-wave massive MIMO systems,” in Proc. IEEE 85th Veh. Technol. Conf. (VTC Spring; ), Jun. 2017, pp. 1–7. [Google Scholar]
[47].Lee J, Kim M-D, Park J-J, and Chong YJ, “Field-measurement-based received power analysis for directional beamforming millimeter-wave systems: Effects of beamwidth and beam misalignment,” ETRI J, vol. 40, no. 1, pp. 26–38, Feb. 2018. [Google Scholar]
[48].Ye J. and Gharavi H, “Deep reinforcement learning-assisted energy harvesting wireless networks,” IEEE Trans. Green Commun. Netw, vol. 5, no. 2, pp. 990–1002, Jun. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49].Va V, Vikalo H, and Heath RW Jr., “Beam tracking for mobile millimeter wave communication systems,” in Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), Dec. 2016, pp. 743–747.
[50].Hurst HE, “Long-term storage capacity of reservoirs,” Transl. Amer. Soc. Civil Eng, vol. 116, pp. 770–799, Jan. 1951. [Google Scholar]
[51].Ge X. et al. , “Wireless fractal cellular networks,” IEEE Wireless Commun. Mag, vol. 23, no. 5, pp. 110–119, Oct. 2016. [Google Scholar]
[52].Chen J, Ge X, and Ni Q, “Coverage and handoff analysis of 5G fractal small cell networks,” IEEE Trans. Wireless Commun, vol. 18, no. 2, pp. 1263–1276, Feb. 2019. [Google Scholar]
[53].Yu Y, Si X, Hu C, and Jianxun Z, “A review of recurrent neural networks: LSTM cells and network architectures,” Neural Comput, vol. 31, no. 7, pp. 1235–1270, Jul. 2019. [DOI] [PubMed] [Google Scholar]
[54].Cho K. et al. , “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” 2014, arXiv:1406.1078.
[55].Yang S, Yu X, and Zhou Y, “LSTM and GRU neural network performance comparison study: Taking yelp review dataset as an example,” in Proc. Int. Workshop Electron. Commun. Artif. Intell. (IWECAI), Jun. 2020, pp. 98–101. [Google Scholar]
[56].Fujimoto S, Hoof H, and Meger D, “Addressing function approximation error in actor-critic methods,” in Proc. Int. Conf. Mach. Learn, 2018, pp. 1587–1596.
[57].Wang S, Huang J, and Zhang X, “Demystifying millimeter-wave V2X: Towards robust and efficient directional connectivity under high mobility,” in Proc. 26th Annu. Int. Conf. Mobile Comput. Netw., Sep. 2020, pp. 1–14. [Google Scholar]
[58].Zhou M, Qu X, and Jin S, “On the impact of cooperative autonomous vehicles in improving freeway merging: A modified intelligent driver model-based approach,” IEEE Trans. Intell. Transp. Syst, vol. 18, no. 6, pp. 1422–1428, Jun. 2017. [Google Scholar]

[R1] [1].Garcia MHC et al. , “A tutorial on 5G NR V2X communications,” IEEE Commun. Surveys Tuts, vol. 23, no. 3, pp. 1972–2026, 3rd Quart., 2021. [Google Scholar]

[R2] [2].Gyawali S, Xu S, Qian Y, and Hu RQ, “Challenges and solutions for cellular based V2X communications,” IEEE Commun. Surveys Tuts, vol. 23, no. 1, pp. 222–255, 1st Quart., 2020. [Google Scholar]

[R3] [3].Ahmed E. and Gharavi H, “Cooperative vehicular networking: A survey,” IEEE Trans. Intell. Transp. Syst, vol. 19, no. 3, pp. 996–1014, Mar. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Zhou H, Xu W, Chen J, and Wang W, “Evolutionary V2X technologies toward the Internet of Vehicles: Challenges and opportunities,” Proc. IEEE, vol. 108, no. 2, pp. 308–323, Feb. 2020. [Google Scholar]

[R5] [5].Borcoci E, Dragulinescu A-M, Li FY, Vochin M-C, and Kjellstadli K, “An overview of 5G slicing operational business models for Internet of Vehicles, maritime IoT applications and connectivity solutions,” IEEE Access, vol. 9, pp. 156624–156646, 2021. [Google Scholar]

[R6] [6].Maraqa O, Rajasekaran AS, Al-Ahmadi S, Yanikomeroglu H, and Sait SM, “A survey of rate-optimal power domain NOMA with enabling technologies of future wireless networks,” IEEE Commun. Surveys Tuts, vol. 22, no. 4, pp. 2192–2235, 4th Quart., 2020. [Google Scholar]

[R7] [7].Kim H, Chae C-B, and Kim KS, “Hybrid precoding based on monopulse ratio for millimeter wave systems with limited feedback,” IEEE Access, vol. 8, pp. 175329–175346, 2020. [Google Scholar]

[R8] [8].Busari SA et al. , “Generalized hybrid beamforming for vehicular connectivity using THz massive MIMO,” IEEE Trans. Veh. Technol, vol. 68, no. 9, pp. 8372–8383, Sep. 2019. [Google Scholar]

[R9] [9].Brambilla M, Combi L, Matera A, Tagliaferri D, Nicoli M, and Spagnolini U, “Sensor-aided V2X beam tracking for connected automated driving: Distributed architecture and processing algorithms,” Sensors, vol. 20, no. 12, p. 3573, Jun. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Lien S, Kuo Y-C, Deng D-J, Tsai H-L, Vinel A, and Benslimane A, “Latency-optimal mmWave radio access for V2X supporting next generation driving use cases,” IEEE Access, vol. 7, pp. 6782–6795, 2018. [Google Scholar]

[R11] [11].Huang S, Zhang M, Gao Y, and Feng Z, “MIMO radar aided mmWave time-varying channel estimation in MU-MIMO V2X communications,” IEEE Trans. Wireless Commun, vol. 20, no. 11, pp. 7581–7594, Nov. 2021. [Google Scholar]

[R12] [12].Chen Y, Wang L, Ai Y, Jiao B, and Hanzo L, “Performance analysis of NOMA-SM in vehicle-to-vehicle massive MIMO channels,” IEEE J. Sel. Areas Commun, vol. 35, no. 12, pp. 2653–2666, Dec. 2017. [Google Scholar]

[R13] [13].Zhang R, Zhong Z, Zhao J, Li B, and Wang K, “Channel measurement and packet-level modeling for V2I spatial multiplexing uplinks using massive MIMO,” IEEE Trans. Veh. Technol, vol. 65, no. 10, pp. 7831–7843, Mar. 2016. [Google Scholar]

[R14] [14].Huang Z. and Cheng X, “A 3-D non-stationary model for beyond 5G and 6G vehicle-to-vehicle mmWave massive MIMO channels,” IEEE Trans. Intell. Transp. Syst, vol. 23, no. 7, pp. 8260–8276, Jul. 2022. [Google Scholar]

[R15] [15].Zhang Y, Xiang Z, Lu L, Han S, and Meng W, “Asymptotic performance analysis for mmWave V2X cellular networks,” in Proc. IEEE 94th Veh. Technol. Conf. (VTC-Fall; ), Sep. 2021, pp. 1–6. [Google Scholar]

[R16] [16].Ye J. and Gharavi H, “Attractor selection based limited feedback hybrid precoding for uplink V2I communications,” IEEE Trans. Veh. Technol, vol. 69, no. 4, pp. 3943–3953, Apr. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Li W, Yang W, Yang L, Xiong H, and Hui Y, “Bidirectional positioning assisted hybrid beamforming for massive MIMO systems,” IEEE Trans. Commun, vol. 69, no. 5, pp. 3367–3378, May 2021. [Google Scholar]

[R18] [18].Huang Y, Wu Q, Lu R, Peng X, and Zhang R, “Massive MIMO for cellular-connected UAV: Challenges and promising solutions,” IEEE Commun. Mag, vol. 59, no. 2, pp. 84–90, Feb. 2021. [Google Scholar]

[R19] [19].Ali A, Gonzlez-Prelcic N, and Ghosh A, “Millimeter wave V2I beam-training using base-station mounted radar,” in Proc. IEEE Radar Conf. (RadarConf), Apr. 2019, pp. 1–5. [Google Scholar]

[R20] [20].González-Prelcic N, Méndez-Rial R, and Heath RW Jr., “Radar aided beam alignment in mmWave V2I communications supporting antenna diversity,” in Proc. Inf. Theory Appl. Workshop (ITA), 2016, pp. 1–7. [Google Scholar]

[R21] [21].Ni Y, He J, Cai L, and Bo Y, “Data uploading in hybrid V2V/V2I vehicular networks: Modeling and cooperative strategy,” IEEE Trans. Veh. Technol, vol. 67, no. 5, pp. 4602–4614, May 2018. [Google Scholar]

[R22] [22].Saleem Y, Mitton N, and Loscri V, “A vehicle-to-infrastructure data offloading scheme for vehicular networks with QoS provisioning,” in Proc. Int. Wireless Commun. Mobile Comput. (IWCMC; ), Jun. 2021, pp. 1442–1447. [Google Scholar]

[R23] [23].Saleem Y, Mitton N, and Loscri V, “A QoS-aware hybrid V2I and V2V data offloading for vehicular networks,” in Proc. IEEE 94th Veh. Technol. Conf. (VTC-Fall; ), Sep. 2021, pp. 1–5. [Google Scholar]

[R24] [24].Choi DN, Jin SY, Lee J, and Kim BW, “Deep learning technique for improving data reception in optical camera communication-based V2I,” in Proc. 28th Int. Conf. Comput. Commun. Netw. (ICCCN; ), Jul. 2019, pp. 1–2. [Google Scholar]

[R25] [25].Kwon J. and Park H, “Reliable data dissemination strategy based on systematic network coding in V2I networks,” in Proc. Int. Conf. Inf. Commun. Technol. Converg. (ICTC; ), Oct. 2019, pp. 744–746. [Google Scholar]

[R26] [26].Yang L, Zhang L, He Z, Cao J, and Wu W, “Efficient hybrid data dissemination for edge-assisted automated driving,” IEEE Internet Things J, vol. 7, no. 1, pp. 148–159, Jan. 2020. [Google Scholar]

[R27] [27].Chattopadhyay A, Chandra A, and Bose C, “Impact of RSU height on 60 GHz mmWave V2I LOS communication in multi-lane highways,” in Proc. IEEE 93rd Veh. Technol. Conf. (VTC-Spring; ), Apr. 2021, pp. 1–5. [Google Scholar]

[R28] [28].Li T-H, Khandaker MRA, Tariq F, Wong K-K, and Khan RT, “Learning the wireless V2I channels using deep neural networks,” in Proc. IEEE 90th Veh. Technol. Conf. (VTC-Fall; ), Sep. 2019, pp. 1–5. [Google Scholar]

[R29] [29].Liao Y, Cai Z, Sun G, Tian X, Hua Y, and Tan X, “Deep learning channel estimation based on edge intelligence for NR-V2I,” IEEE Trans. Intell. Transp. Syst, vol. 23, no. 8, pp. 13306–13315, Aug. 2022. [Google Scholar]

[R30] [30].Liang L, Ye H, and Li GY, “Spectrum sharing in vehicular networks based on multi-agent reinforcement learning,” IEEE J. Sel. Areas Commun, vol. 37, no. 10, pp. 2282–2292, Oct. 2019. [Google Scholar]

[R31] [31].Bogale TE, Wang X, and Le LB, “Adaptive channel prediction, beamforming and scheduling design for 5G V2I network: Analytical and machine learning approaches,” IEEE Trans. Veh. Technol, vol. 69, no. 5, pp. 5055–5067, May 2020. [Google Scholar]

[R32] [32].Zhang L, Chen X, Fang Y, Huang X, and Fang X, “Learning-based mmWave V2I environment augmentation through tunable reflectors,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. [Google Scholar]

[R33] [33].Yang J, Li L, and Zhao M-J, “A blind CSI prediction method based on deep learning for V2I millimeter-wave channel,” in Proc. IEEE 28th Int. Conf. Netw. Protocols (ICNP), Oct. 2020, pp. 1–6. [Google Scholar]

[R34] [34].Yan L. et al. , “Machine learning-based handovers for sub-6 GHz and mmWave integrated vehicular networks,” IEEE Trans. Wireless Commun, vol. 18, no. 10, pp. 4873–4885, Oct. 2019. [Google Scholar]

[R35] [35].Alkhateeb A, Alex S, Varkey P, Li Y, Qu Q, and Tujkovic D, “Deep learning coordinated beamforming for highly-mobile millimeter wave systems,” IEEE Access, vol. 6, pp. 37328–37348, 2018. [Google Scholar]

[R36] [36].Yang Y, Gao Z, Ma Y, Cao B, and He D, “Machine learning enabling analog beam selection for concurrent transmissions in millimeter-wave V2V communications,” IEEE Trans. Veh. Technol, vol. 69, no. 8, pp. 9185–9189, Aug. 2020. [Google Scholar]

[R37] [37].Chen S, Vu K, Zhou S, Niu Z, Bennis M, and Latva-Aho M, “1 A deep reinforcement learning framework to combat dynamic blockage in mmWave V2X networks,” in Proc. 2nd 6G Wireless Summit (6G SUMMIT), Mar. 2020, pp. 1–5. [Google Scholar]

[R38] [38].Gui J, Liu Y, Deng X, and Liu B, “Network capacity optimization for cellular-assisted vehicular systems by online learning-based mmWave beam selection,” Wireless Commun. Mobile Comput, vol. 2021, pp. 1–26, Mar. 2021. [Google Scholar]

[R39] [39].Architecture Enhancements for V2X Services (V16.2.0, Release 16), document TS 23.285, 3GPP, Dec. 2019.

[R40] [40].Physical Channels and Modulation (V16.0.0, Release 16), document TS 38.211 NR, 3GPP, Mar. 2020.

[R41] [41].Wang Y, Yin X, Cai G, Wang G, Guo S, and Liang K, “Tunnel vehicle RSSI positioning algorithm based on LMLF model,” in Proc. 4th Int. Conf. Cloud Comput. Internet Things (CCIOT), Dec. 2019, pp. 29–32. [Google Scholar]

[R42] [42].Fan J. and Ma G, “Characteristics of GPS positioning error with nonuniform pseudorange error,” GPS Solutions, vol. 18, no. 4, pp. 615–623, Oct. 2014. [Google Scholar]

[R43] [43].Anjinappa CK and Guvenc I, “Millimeter-wave V2X channels: Propagation statistics, beamforming, and blockage,” in Proc. IEEE 88th Veh. Technol. Conf. (VTC-Fall; ), Aug. 2018, pp. 1–6. [Google Scholar]

[R44] [44].Wang Y, Narasimha M, and Heath RW Jr., “MmWave beam prediction with situational awareness: A machine learning approach,” in Proc. IEEE 19th Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC; ), Jun. 2018, pp. 1–5. [Google Scholar]

[R45] [45].Yuan H, Yang N, Yang K, Han C, and An J, “Hybrid beamforming for terahertz multi-carrier systems over frequency selective fading,” IEEE Trans. Commun, vol. 68, no. 10, pp. 6186–6199, Oct. 2020. [Google Scholar]

[R46] [46].Wu W, Liu D, Li Z, Hou X, and Liu M, “Two-stage 3D codebook design and beam training for millimeter-wave massive MIMO systems,” in Proc. IEEE 85th Veh. Technol. Conf. (VTC Spring; ), Jun. 2017, pp. 1–7. [Google Scholar]

[R47] [47].Lee J, Kim M-D, Park J-J, and Chong YJ, “Field-measurement-based received power analysis for directional beamforming millimeter-wave systems: Effects of beamwidth and beam misalignment,” ETRI J, vol. 40, no. 1, pp. 26–38, Feb. 2018. [Google Scholar]

[R48] [48].Ye J. and Gharavi H, “Deep reinforcement learning-assisted energy harvesting wireless networks,” IEEE Trans. Green Commun. Netw, vol. 5, no. 2, pp. 990–1002, Jun. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] [49].Va V, Vikalo H, and Heath RW Jr., “Beam tracking for mobile millimeter wave communication systems,” in Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), Dec. 2016, pp. 743–747.

[R50] [50].Hurst HE, “Long-term storage capacity of reservoirs,” Transl. Amer. Soc. Civil Eng, vol. 116, pp. 770–799, Jan. 1951. [Google Scholar]

[R51] [51].Ge X. et al. , “Wireless fractal cellular networks,” IEEE Wireless Commun. Mag, vol. 23, no. 5, pp. 110–119, Oct. 2016. [Google Scholar]

[R52] [52].Chen J, Ge X, and Ni Q, “Coverage and handoff analysis of 5G fractal small cell networks,” IEEE Trans. Wireless Commun, vol. 18, no. 2, pp. 1263–1276, Feb. 2019. [Google Scholar]

[R53] [53].Yu Y, Si X, Hu C, and Jianxun Z, “A review of recurrent neural networks: LSTM cells and network architectures,” Neural Comput, vol. 31, no. 7, pp. 1235–1270, Jul. 2019. [DOI] [PubMed] [Google Scholar]

[R54] [54].Cho K. et al. , “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” 2014, arXiv:1406.1078.

[R55] [55].Yang S, Yu X, and Zhou Y, “LSTM and GRU neural network performance comparison study: Taking yelp review dataset as an example,” in Proc. Int. Workshop Electron. Commun. Artif. Intell. (IWECAI), Jun. 2020, pp. 98–101. [Google Scholar]

[R56] [56].Fujimoto S, Hoof H, and Meger D, “Addressing function approximation error in actor-critic methods,” in Proc. Int. Conf. Mach. Learn, 2018, pp. 1587–1596.

[R57] [57].Wang S, Huang J, and Zhang X, “Demystifying millimeter-wave V2X: Towards robust and efficient directional connectivity under high mobility,” in Proc. 26th Annu. Int. Conf. Mobile Comput. Netw., Sep. 2020, pp. 1–14. [Google Scholar]

[R58] [58].Zhou M, Qu X, and Jin S, “On the impact of cooperative autonomous vehicles in improving freeway merging: A modified intelligent driver model-based approach,” IEEE Trans. Intell. Transp. Syst, vol. 18, no. 6, pp. 1422–1428, Jun. 2017. [Google Scholar]

PERMALINK

Deep Reinforcement Learning Assisted Beam Tracking and Data Transmission for 5G V2X Networks

Junliang Ye

Hamid Gharavi

Roles

Abstract

I. Introduction

Notation:

II. System Model

A. Network Architecture

Fig. 1.

B. V2I Beam Tracking and Communication

Fig. 2.

C. AI-Assisted V2I Beam Tracking and Data Transmission

1). codebook Based Beamforming:

2). RL Framework:

III. Problem Formulation

Fig. 6.

A. Dependency Analysis

B. LSTM/GRU Network

Fig. 8.

C. TD3 Framework With LSTM/GRU

IV. Simulation Results

TABLE I.

Fig. 9.

Fig. 11.

Fig. 10.

Fig. 12.

Fig. 13.

Fig. 14.

Fig. 15.

V. Conclusion

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 7.

Biographies

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases