Abstract
In this paper, we present an AI-based novel framework for dynamic resource management in ISAC systems in 6G networks. The framework utilizes Deep Reinforcement Learning (DRL) to learn and optimize various resource control tasks such as smart beamforming, interference control, and power assignment, according to instantaneous network state and environment. The sum rate and beam pattern gain of AI-based approach are up to 45% and 50% higher than those of the static beamforming, respectively, at all scenarios. In particular, at pmax = 30 dBm and L = 64 antennas, the AI model yields a sum rate of
bps/Hz in rural area scenarios,
bps/Hz in dense smart city, and
bps/Hz in high-mobility urban scenario, substantially better than convex optimization (achieving
bps/Hz) and static beamforming (reaching at most
bps/Hz). Moreover, the AI model has a beam pattern gain of 32 dB in rural, 28 dB in dense and 30 dB in high-mobility urban, which leads to improved sensing accuracy through the concentration of the transmitted energy toward expected sensing directions. In an energy-efficient context, the AI-engineered model has improved energy utilization with 40% gain reduction in power compared to conventional methods for the same sum rate. It also efficiently suppresses interference with increase of up to 50% in interference suppression level thereby enabling an improvement in total system performance. These findings demonstrate the potential of the AI-based model for joint communication and sensing design for 6G ISAC systems, and provide a generalized framework for intelligent 6G wireless networks. Its capability of dynamic resource allocation, enhanced spectral efficiency, and accurate sensing in dynamic network make it a technology enabler for 6G deployment.
Keywords: Integrated Sensing and Communication (ISAC), 6G Networks, Deep Reinforcement Learning (DRL), Beamforming, Power Allocation, Interference Management, AI-Driven Optimization
Subject terms: Engineering, Mathematics and computing
Introduction
The arrival of 6G wireless systems requires radical rethinking of solutions to high-throughput communication, low-latency communication, and ultra-reliable sensing. In recent years, Integrated Sensing and Communication (ISAC) systems, where communication and sensing functionalities are integrated in the same infrastructure, have attracted considerable interest. Such systems use simultaneous sensing of the environment and communication to facilitate increased spectrum efficiency, lowered hardware costs and better network performance.
This symbiosis of sensing and communication through a common architecture (ISAC), has been identified as an important technology for 6G networks. ISAC systems support concurrent operation of communication and sensing functions on shared hardware and spectrum, with substantial spectrum efficiency improvements, system complexity reductions and unique solutions for new applications including autonomous driving, smart cities, and IoT1–3. With the development of 6G networks, a new type of modern ISAC systems is emerging because of the need for real-time high throughput communication which is integrated with high precision sensing4,5. For example, in the domain of Vehicle-to-Everything (V2X) communications, ISAC allows vehicles to exchange safety-critical information and perceive their surroundings, thus enhancing safety, traffic management, and situational awareness6,7. In smart homes and industrial automation, ISAC can be extended to both connectivity and environmental monitoring, rendering it as the key for next generation wireless network8,9. Dynamic and efficient allocation of resources which provides seamless communication and precise sensing in the face of variable network conditions are among the significant challenges for ISAC systems in 6G. The static optimization and waveform design utilized in conventional resource management approaches cannot meet the demands of medley and sensitive services, high mobility, interferences and heterogeneous requirements in future networks1,10. Consequently, there is increasing interest in AI-driven solutions, especially Deep Reinforcement Learning (DRL), to solve these problems. As demonstrated in11, DRL offers an effective methodology to solve the real-time beamforming, power allocation, and interference management problems by using learned optimal policies from the environment and adapt the resources dynamically12.
The recent literature suggests that AI and ML techniques may transform systems such as the ISAC by enhancing the performance of beamforming and resource allocation (for instance, in high-mobility environments such as vehicular networks)7,13,7. Certain ISAC systems are under investigation with Gen AI and machine learning to improve data-driven decision making for more agile and effective systems14. Furthermore, the methods of cognitive radio and edge computing have been introduced to further enhance resource allocation and delay, applicable for mission-critically required scenarios15. New transmission techniques such as Reconfigurable Intelligent Surfaces (RIS) and Orthogonal Time Frequency Space (OTFS) have also been implemented in ISAC systems with the purpose of reducing interference and facilitating wireless signal propagation in situations with complicated dynamics1. This serves as a promising approach to performance and energy efficiency optimization for 6G networks through integrating AI-based resource management with the aforementioned technologies. The practical deployment of ISAC is not without challenges, especially in dynamic networks where conditions of the network are always fluctuating. The existing resource allocation schemes for ISAC systems mainly use some classical optimization algorithms which can not well cope with dynamic environment changes including user mobility, interference and traffic congestion in real cases. Deprived of the capacity to reconfigure the allocation of resources at runtime according to these factors, performance in terms of communication throughput, sensing accuracy, and energy consumption becomes suboptimal.
In this paper, to deal with these challenges, we present an AI-driven dynamic resource allocation model for funds alliance communication (ISAC) systems by adopting DRL to optimize the assignment of resources, i.e., beamforming vectors, power distribution, and frequency management. The main novelties of this work result from the use of AI to efficiently control beamforming, to suppress interference between the communication and sensing layers, as well as to allocate power under realtime network setting. Through our framework, ISAC systems can be able to accommodate different network and environmental situations and make the best use of limited resources during both communication and sensing tasks16. The basic idea is to make the best use of AI to enable ISAC to automatically adjust to the changes of network and to realize the dynamic adaptation of AC, so that the SE, the sensing accuracy, and the energy efficiency can be further improved in 6G networks.
The rest of the paper is organized as follows: Section II reviews the related works and recent advancements in ISAC systems, with a particular focus on AI-based approaches. Section III outlines the system model and problem formulation, followed by the presentation of the proposed DRL-based resource allocation model in Section IV. Section V provides simulation results, validating the effectiveness of the proposed model. Finally, Section VI concludes the paper and discusses future research directions.
Literature review
ISAC and resource allocation
ISAC is a vital future wireless system technology (especially for 6G networks). Many resource allocation strategies for ISAC systems have been studied in the literature, aiming at joint optimization of the communication and sensing. The conventional resource allocation methods, including the convex optimization and the sequential beamforming, have been considered for the dual-functional radar communication systems17,18. These approaches aim to achieve beamforming for communication with the optimality, at the same time preserving the performance of radar sensing. But these optimization strategies are usually static and cannot adapt to the dynamic network conditions such as in high-mobility environments and interference-dominated scenario.
The work19 points to the history of ISAC and its resource allocation methods. Here, the resource allocation in the ISAC systems is divided into four stages as follows: resource-separated stage, resource-orthogonal stage, resource-converged stage, and resource-collaborative stage. These stages serve to reconcile communication and sensing through the judicious choice of resource allocation – be they in terms of time, frequency, space, power etc. The analysis of Wang et al.20 is dedicated to applying deep reinforcement learning (DRL) to fair resource allocation in UAV-enabled ISAC systems. The paper addresses joint communication and sensing performance maximization by means of power allocation that takes into account fairness among different holographic users and targets. This AI-based technique effectively handles network dynamics, improves the ISAC performance. Khalili et al.’s study21 tackles the joint resource allocation and trajectory design for UAV-enabled ISAC systems with limited backhaul capacity. They jointly optimize the UAV trajectory, beamforming, power control, and hovering time, so as to minimize the power consumption subject to the targeted communication QoS and sensing requirement. The holistic approach takes into account UAV mobility and energy problems. The notion of Sensing QoS in 6G networks is presented by the authors in22, which takes into account sensing performance measurements such as probability of detection and parameter estimation accuracy. Their joint ISAC resource allocation framework serves to distribute limited power and bandwidth based on both sensing and communication QoS, showing the trade-off between these objectives. In23 the authors introduce the SPYDER system for multi-user ISAC in vehicular networks. Their approach effectively manages the scheduling of radio resources as the time-frequency Slot, which is formed by the sensing and communication. The system employs sparse interleaving in the OFDM grid to reduce interference and to achieve resource efficiency-space at several degrees of freedom in the beyond 5G high mobile high density environment is utilized fully.
Osorio et al.24 on the role of resource allocation in networked ISAC systems where communication and sensing functions are put together in a networked infrastructure that shares resources. Their work investigates the trade-off among sensing and communication performance, and introduces joint optimization strategies for improving network robustness and efficiency in 6G ISAC networks. A study by Aldirmaz-Colak et al.1 investigates the combination of AI in in-band and out-band ISAC systems for communication and sensing for 6G networks. The paper highlights the role of machine learning and deep learning in optimizing resource allocation mainly for tasks including spectrum management and interference mitigation. Authors in10 elaborate on the ways AI techniques facilitate real-time adaptation by dynamically adapting to varying network conditions and fairly distributing power and bandwidth based on the high throughput and low latency requirement of ISAC in the context of 6G. AI-based dynamic resource allocation scheme for the ISAC system that agitates to route and allocate properly the wireless resources using AI in real time is shown in11. To handle the complexity of system caused by interference and spectrum scarcity, they utilize an AI-driven Cognitive Routing with Software Defined Networking (SDN). Their work demonstrates how artificial intelligence (AI) can be used to more effectively ensure dependable and efficient data transmission and sensing by showing AI to be able to assist to enhance resource allocation and system performance in dynamic, multi-user settings.
The authors in25 have surveyed the progresses on AI-empowered ISAC and explained how AI methods like deep neural networks (DNN), convolutional neural networks (CNN) and deep reinforcement learning (DRL) shift resources allocation in ISAC set-ups. They talk about how AI plays a role in optimizing sensing strategies, improving communication quality, and flexibly adapting to the different needs of sensing and communication tasks. This paper also discusses the issues, e.g., the high computational overhead, real-time processing, and proposes AI can be used to tackle these issues, by means of intelligent resource management that enhance system utilization and responsiveness, respectively.
AI for resource allocation
Recent progress in AI techniques is enabling the construction of resource allocation methodologies for ISAC systems, specifically 6G networks. These systems are also starting to be recently used for a number of machine learning (ML) and deep reinforcement learning (DRL) applications and AI type of methods have been shown to be promising to solve their optimization problems. With machine learning and deep reinforcement learning (DRL), solutions have recently become promising for dealing with such complex optimization problems such as resource allocation in wireless communication systems. In order to further enhance the performance of communication systems, AI based techniques have been developed in the area of beamforming, power control and spectrum management26,27. Integrated AI-based with the ISAC systems for the dynamic resource allocation, however, is less explored. Several works have suggested DRL methods for beamforming optimization in communication systems, however, the communication and sensing requirement for ISAC systems has not been addressed.
Li et al.22 utilize deep reinforcement learning as the resource allocation algorithm in ISAC for vehicular edge computing. They study the joint optimization of communication and sensing resources for the vehicular networks under the challenges of high mobility and dynamical network state. The authors are attempting to emphasize that the flexibility of DRL can significantly improve the performance by re balancing the resources adaptively which are allocated to the heterogeneous demands of both communication and sensing in real-time. Lu et al.28 further investigate a CDRL-setting for adaptive time-allocation between tracking and communication in RCSs. They show that CDRL achieves resources allocation when hard timing requirements need to be met, enhancing both sensing and communicating efficiency in a dynamic environment. The authors in29 consider secure sensing in ISAC systems and devise a generative AI model to secure sensing by confounding local variations in sensing signals due to rogue devices. Their ADM model explicitly learns the best sensing patterns and prevents unauthorized devices from accessing the sensitive data which makes a direct contribution to the both the communication and sensing of ISAC systems. The study30 introduces a hybrid deep reinforcement learning (HDRL) algorithm to improve the communication and localization efficiency of ISAC systems in cooperative reconfigurable intelligent surface (RIS)-supported environments. The HDRL scheme can also adapt to environmental variations, and it significantly outperforms the way of resources management in a highly dynamic and interference environment such as urban vehicular networks. The work in31 present a framework for AI-enabled ISAC, where sensing, communication, and AI are integrated in a comprehensive architecture for 6G networks. They advocate using AI for resource optimization, in real-time, to improve spectrum efficiency and also to achieve better communication quality, Small cells should schedule resources (peel users) and make decisions dynamically based on network status and user requirements. Their framework provides an interesting direction for the future work on the convergence of sensing-communication-AI in 6G.
These studies demonstrate the crucial impact of AI to improve resource allocation and overall ISAC performance, especially within 6G networks. AI methods also enable dynamic resource allocation and increase efficiency and reliability of communication and sensing mechanisms.
Beamforming and interference management
Beamforming optimization is a key research challenge in ISAC systems due to the dependence of both communication and sensing on the capability of focusing signals to target users or sensing points. The conventional beamforming optimization approaches are performed by convex optimization techniques32, which is not suitable for the dynamic environment of ISAC systems, in particular when the coexistence of the communication and sensing links introduces destructive interference. The use of machine learning in optimizing beamforming has been initiated with some recent papers like33,34, but there is still a long way to go to deal with interference management efficiently.
Zargari et al.35 develop a new beamforming formulation for Cell-Free ISAC (CF-ISAC) systems based on complex oblique manifold optimization to maximize communication sum rate with robust sensing. The paper shows the challenging interference in dense networks, such as CF-ISACs systems that use distributed antenna. It is shown that by their approach the performance in both communication and sensing can be improved compared to traditional beamforming, while the computational complexity can be reduced. The authors in36 studies the application of HAPS for communication and sensing in 6G systems using sophisticated MIMO beamforming. They formulate a non-convex optimization problem to optimize the minimum beam pattern gain for sensing, and this optimization problem is subject to the need to guarantee communication SINR. The devised approach greatly enhances the network performance in the aspects of signal quality, sensing precision, and interference eliminations, which all indicate the practicality of employing MIMO and HAPS in the forthcoming 6G ISAC era. The research in37 the interference management in full-duplex ISAC (FD-ISAC) system is studied, i.e. the radio supports the communication and sensing simultaneously in the same frequency band. They consider self-interference (SI) and interference between the commination and sensing tasks. We use the paper to look into strategies for controlling such interference in FD-ISAC systems, in terms of beamforming optimization, mode selection, and resource allocation, and other methods. The study in38 joint optimization of transmit beamforming and receive filter design in MIMO-ISAC systems is considered. They optimize the signal-to-clutter-plus-noise ratio (SCNR) at the receiver, subject to quality-of-service constraints. The method can effectively mitigate MUI, clutter, and SI, and it is a reliable scheme of combating to the interference in ISAC systems.
Niu et al.39 presents a detailed survey on interference management in ISAC systems with various interference sources including SI, MI, and CLI. We study a variety of interference suppression, avoidance and exploitation techniques, particularly focused on MIMO-ISAC systems. It underscores the necessity of the engineering of interference for its proper orchestration such that sensory and communicative capabilities can be delightfully unified. The work in40 reports the surprising fact that networked ISAC can be implemented for low-altitude economical (LAE) region, where the ground base stations (GBSs) jointly transmit communication and sensing signals to UAVs. The system-level design for co-colligative transmit beamforming and UAV trajectory fustigation considering interference between communication and sensing is proposed in the literature. The numerical results demonstrate that the proposed strategy can achieve better balance between communication performance and sensing quality. Chen et al.41 consider robust beamforming design for near-field ISAC system considering channel uncertainty and eavesdropping attack. They also aim to maximize the minimum sensing beam pattern gain and secure communication by suppressing the interference caused by eavesdroppers. This paper shows the need of resilient beamforming methods in secure and dependable ISAC, particularly at the near zone. The above review discusses several beamforming and interference management methods in ISAC system, and emphasizes on the use of advanced beamforming techniques, and robust interference mitigation methods for effective communication and sensing performances in 6G networks. Table 1 shows the comparison of related works and the proposed approach.
Table 1.
Comparison of Related works and the proposed approach.
| Work | Opt. scope | Sensing metric | Interference model | AI | Pros | Cons |
|---|---|---|---|---|---|---|
| Conventional ISAC Beamforming | Beamforming only | Beampattern gain | Limited (noise-dominated) | No | Simple design; low complexity | Lacks adaptability; ignores power/frequency coupling |
| Joint Beamforming & Power Allocation (Non-AI) | Beamforming + power | CRLB/beampattern | Partial (MI only) | No | Improved performance over static methods | High complexity; not adaptive to dynamics |
| Convex Optimization-Based ISAC | Beamforming + power | CRLB | MI, partial SI | No | Theoretically grounded; optimal under assumptions | Requires perfect CSI; high computational cost |
| RL-Based Communication-Only Systems | Power/scheduling | N/A | MI only | Yes | Adaptive to dynamics | No sensing support; not ISAC |
| Existing AI-Based ISAC Methods | Partial resource allocation | Beampattern/CRLB | MI or CLI | Yes | Improved adaptability | Limited optimization scope; incomplete interference modeling |
| Proposed Method | Beamforming + power + frequency | CRLB | SI + MI + CLI | Yes | Jointly optimizes communication and sensing; adaptive | Higher training complexity |
As summarized in Table 1, the proposed framework differs from existing works by jointly optimizing multiple ISAC resources using an AI-driven approach while explicitly modeling multi-dimensional interference and sensing accuracy via CRLB. The main contributions of this paper are summarized as follows:
Integrated ISAC System Modeling: We develop a comprehensive system model for integrated sensing and communication (ISAC) in 6G networks, jointly capturing communication performance, sensing accuracy (via CRLB), energy efficiency, and interference interactions (including self-interference, mutual interference, and cross-link interference).
AI-Driven Dynamic Resource Allocation: We propose a deep reinforcement learning (DRL)-based framework that dynamically optimizes beamforming, power allocation, and frequency resources in highly dynamic ISAC environments, enabling adaptive operation under diverse network conditions.
Interference-Aware Optimization Formulation: The ISAC resource allocation problem is formulated to explicitly account for multi-dimensional interference (SI, MI, and CLI), spectral efficiency maximization, CRLB minimization, and energy efficiency improvement, providing a realistic and unified optimization perspective.
Multi-Agent Learning for ISAC Coordination: A multi-agent learning mechanism is designed to coordinate sensing and communication decisions across multiple ISAC nodes, ensuring scalable and efficient interaction between communication and sensing functionalities.
Extensive Simulation-Based Evaluation: The proposed framework is evaluated under three representative deployment scenarios–high-mobility urban, dense smart city, and rural environments–demonstrating consistent improvements in spectral efficiency, sensing accuracy, energy efficiency, and interference mitigation compared to conventional optimization and static beamforming baselines.
System model and problem formulation
Consider a multi-user ISAC system where a base station (BS) is equipped with a massive antenna array capable of supporting multiple users for communication and performing radar-based sensing simultaneously. The base station is tasked with optimizing both communication and sensing performance while ensuring that interference between the two functions is minimized. We model the system as follows:
Communication Layer:The BS allocates beamforming vectors to maximize data throughput for communication users, ensuring high spectral efficiency (SE) while minimizing interference.
Sensing Layer: The BS performs radar-based sensing, focusing on target detection and localization accuracy with high sensing resolution.
AI-Driven Resource Allocation: The system dynamically adjusts beamforming, power allocation, and interference management strategies using Deep Reinforcement Learning (DRL), based on real-time network conditions, environmental factors (e.g., user mobility, interference levels, and network congestion), and sensing accuracy.
Network modeling and mathematical formulation
We consider a downlink integrated sensing and communication (ISAC) system for a 6G cellular network, where a multi-antenna base station (BS) simultaneously serves K communication users and performs target sensing within the same time–frequency resources. The BS is equipped with Nt transmit antennas and employs digital beamforming.
-
A.Transmitted Signal Model: The transmitted signal from the BS is expressed as
where
1
denotes the beamforming vector for user k, and
represents the data symbol for user k with . The total transmit power is constrained as, 
2 -
B.Communication Layer Modeling: The received signal at user k is given by
where
3
denotes the downlink channel vector between the BS and user k, and
represents additive white gaussian noise.The corresponding signal-to-interference-plus-noise ratio (SINR) for user k is,
4 The interference term in the SINR denominator is modeled as
. Accordingly, the achievable communication rate is given by,
5 -
C.Sensing Layer Modeling: For the sensing functionality, the reflected signal from the target depends on the transmitted waveform covariance
The sensing performance is quantified using the Cramér–Rao Lower Bound (CRLB), which provides a lower bound on the estimation error of target parameters such as angle or range. The sensing accuracy is inversely proportional to the Fisher Information Matrix (FIM), which depends on Rx. Thus, the sensing objective can be expressed as,
6
where
7
denotes the Fisher Information Matrix. Minimizing the CRLB improves sensing accuracy but may conflict with communication-oriented beamforming objectives. -
D.Joint Optimization Problem: The ISAC resource allocation problem can be formulated as a multi-objective optimization:
s.t.
8
where
controls the trade-off between communication throughput and sensing accuracy. This problem is highly non-convex due to coupled beamforming variables and conflicting objectives, making conventional optimization methods computationally prohibitive in dynamic 6G environments. -
E.
AI-Driven Resource Allocation: To address this challenge, the joint optimization is mapped into a reinforcement learning framework, where the system state captures channel conditions, interference levels, and sensing requirements; the action corresponds to beamforming and power allocation decisions; and the reward function reflects the weighted communication–sensing performance. A DDPG-based agent is employed to learn near-optimal policies in real time.
Problem formulation
The resource allocation problem in this ISAC system is formulated as a multi-objective optimization problem. The objectives are:
- Maximize Communication Throughput (Spectral Efficiency): Achieve the highest data rate (bps/Hz) for communication users, considering interference and beamforming strategies. The Spectral Efficiency is given in Equation (1).

9 - Minimize Sensing Error (Cramér–Rao Lower Bound (CRLB)): Reduce sensing errors, quantified through metrics like Cramér–Rao Lower Bound (CRLB), to enhance localization accuracy and detection probability. The CRLB is given in Equation (2).
Here,
10
denotes the effective sensing signal-to-interference-plus-noise ratio, which captures the impact of residual multi-user interference and noise on the sensing returns in the ISAC system. To justify the inverse-SINR dependence, consider a standard sensing observation model for estimating a target parameter
(e.g., delay or angle):
where
is the complex reflection coefficient,
is the known transmitted sensing signal (or its steering-dependent response), and v denotes the aggregate disturbance consisting of residual interference and noise. When residual interference is treated as approximately Gaussian, we model
where
is the effective residual interference power impacting the sensing measurement. For a complex Gaussian model with mean
and covariance
, the Fisher Information for
is,
Hence, the CRLB is,
11
Defining an effective sensing SINR as,
we obtain,
Therefore, for a fixed waveform/parameterization, the CRLB scales inversely with the effective sensing SINR, which motivates using
12
(up to a constant) as a tractable sensing-error surrogate in the optimization. - Maximize Energy Efficiency (EE): Energy efficiency is defined as the number of successfully delivered information bits per unit energy consumption. Accordingly, the system EE is computed as the ratio of the aggregate achievable rate to the total power consumption:
The total power consumption includes the radiated transmit power scaled by the power amplifier efficiency and the circuit power consumption:
13
where
14
denotes the power amplifier efficiency and Pc accounts for circuit-related power consumption (e.g., baseband processing and RF chain power). This formulation provides a more realistic EE metric while remaining consistent with the beamforming and power allocation variables optimized in our framework. - Minimize Interference: Reduce self-interference (SI), mutual interference (MI), and cross-link interference (CLI) between communication and sensing functions to improve SINR and ensure both operations are optimally performed.

15
where:
denotes residual self-interference (SI) resulting from leakage between sensing and communication signal components under shared-spectrum ISAC operation (after suppression/cancellation),
denotes mutual interference (MI) caused by concurrent multi-user transmissions and imperfect beam isolation,
denotes cross-link interference (CLI) arising from coupling between sensing-related transmissions/returns and communication links (and vice versa), particularly in dense deployments and dynamic mobility.
Accordingly, the SINR used in the spectral-efficiency expression is interference-aware and can be written as in equation (4). We aim to maximize SE while ensuring sensing accuracy, minimizing energy consumption, and interference between the communication and sensing layers.
AI-driven dynamic resource allocation model
Deep reinforcement learning framework
To solve the dynamic resource allocation problem, we adopt Deep Reinforcement Learning (DRL), an AI technique capable of learning optimal strategies in dynamic and complex environments. The DRL agent is interacting with network environment and learns how to manage resources wisely according to current state.
State Space (S): Includes environmental factors such as CSI, user mobility, interference levels, network congestion, and sensing accuracy.
Action Space (A): The actions correspond to beamforming vector adjustments, power allocation decisions, and frequency management.
Reward Function (R) The reward function is designed to maximize system performance based on the multi-objective optimization discussed earlier.
We apply Deep Deterministic Policy Gradient (DDPG) for continuous action spaces since it performs well in environments with multiple continuous actions (e.g. beamforming and power allocation). In order to make the optimization problem tractable and the DRL agent well-designed, the mathematical modeling of our ISAC system is carried out. The framework on generally consists of state space, action space, reward function, and optimization objective as above. We outline the mathematical model of the DRL based resource allocation model and its integration into our ISAC system.
Mathematical modeling for DRL in ISAC systems
State space (S)
The space S is the state space of all configurations of the system at a specific time. It contains the current status of network and environment, which is used by the DRL agent to take actions. It is given in Eq. (5):
![]() |
16 |
where, C(t) is Channel state information (CSI) at time t, I(t) is Current interference levels (self-interference, mutual interference), User mobility and positioning information is denoted by M(t), Power consumption at time t is given by P(t), and T(t) represent the Traffic load and network congestion at time t.
Action space (A)
The action space reflecting system parameters is denote by A, such as beamforming, power allocation and frequency management that the DRL agent can adjust. The agent decides an action from that space in each time step to maximize the performance metrics (e.g., spectrum efficiency, sensing accuracy, energy efficiency, and interference). The action space is given in Equation (6):
![]() |
17 |
where, b(t) is the Beamforming vector adjustment at time t, p(t) is the Power allocation decision at time t, and f(t) is the Frequency management decision at time t.
Reward function (R)
The function R is a reward function which gives the agent a feedback about its taken action. The objective is to maximize the spectral efficiency, sensing accuracy, energy efficiency, and minimize interference. It is usually a weighted sum of the objectives:
![]() |
18 |
where,
are weighting factors that balance the importance of each metric. The positive terms in (17) correspond to performance metrics to be maximized (SE and EE), whereas CRLB and interference are incorporated as penalty terms.- SE(t) : Spectral efficiency at time t, computed from the achieved SINR under the selected beamforming and power allocation policy.

19
is the signal power,
is the noise power, and
is the interference power.
In this work, spectral efficiency (SE) is defined in the conventional sense as the achievable information rate per unit bandwidth (bits/s/Hz), computed as in equation 9. The beamforming and power allocation decisions affect SINRk and thereby influence the achieved SE, without altering its definition. The corresponding communication throughput can be obtained by multiplying SE by the system bandwidth.
CRLB(t) : Cramer-Rao Lower Bound at time t, representing the sensing accuracy and localization error (lower CRLB means better sensing accuracy).
- EE(t) : Energy efficiency at time t, calculated as the ratio of data throughput to power consumption and is given in Equation(20).

20 - Interference : The total interference at time t due to self-interference (SI), mutual interference (MI), and cross-link interference (CLI), represented by the SINR (Signal to Interference plus Noise Ratio). It can be derived as follows: Let
denote the received baseband signal at communication user k,
where
, and represent the residual SI and CLI terms (after any suppression), modeled at the power level in the problem formulation. Taking expectations (with
), the total interference power is:
Assuming the interference components are uncorrelated (a standard modeling step), this becomes:
Note that SI/MI/CLI contribute to the interference power
21
, which enters the SINR denominator; therefore, interference is modeled as a power term rather than being expressed via SNR.
Policy and Action Selection (Policy
)
The DRL agent learns a policy
that maps states to actions. The objective of the DRL algorithm is to maximize the expected cumulative reward over time, which can be expressed as the value function and is given in Equation(22):
![]() |
22 |
where,
: Discount factor
that determines the weight of future rewards.R(t): Reward at time t.
The initial state at time
. The policy
is updated based on the Q-function, which measures the expected reward for taking an action a in a given state s and following policy
thereafter, and is given in Equation(23):
In this case, the agent learns the optimal action-value function using Deep Q-Learning or Deep Deterministic Policy Gradient (DDPG) for continuous action spaces.
23
DRL optimization
The agent aims at maximizing the expected cumulative reward by learning the best action
for each state
, this can be done using the following Bellman formula and is given in Equation (24):
![]() |
24 |
This is solved iteratively using gradient-descent approach, where the policy and Q-values are learning by interacting with the network environment.
The joint ISAC optimization problem formulated in Section 3 is mapped into a reinforcement learning framework as follows. The state vector captures the current channel conditions, interference levels, sensing requirements, and power constraints. The action space corresponds to beamforming selection, transmit power allocation, and frequency assignment decisions. The reward function is designed to maximize spectral and energy efficiency while minimizing CRLB-based sensing error and aggregate interference, as defined in Eqs. (17), (25), and (26). This formulation enables the agent to learn adaptive resource allocation policies through interaction with the ISAC environment.
Model training and optimization
The DRL agent is trained in simulation environments where network conditions, such as user mobility, interference, and traffic load, are dynamically generated. The model adapts to these changes, learning optimal strategies through interactions with the environment. Training the DRL agent as per simulation scenarios that dynamically create network conditions including user mobility, interference, and traffic load. The model accounts for these changes by learning optimal strategies by interacting with the environment.
The training of the DRL agent in our AI-based dynamic resource allocation model for ISAC systems is the most crucial stage where the agent learns how to optimally manage resources by interacting with the environment. This environment is a simulation of 6G ISAC system, where the user mobility, network interference, and traffic load are all simulated by generating dynamically. Below is a step-by-step explanation of how the training works and how the DRL agent learns to adapt to the varying environment in real time.
Simulation environment setup
We simulate the environment to mimic the environment of the realistic 6G ISAC systems, in which the agent needs to cooperate with different particles such as communication users (CUs), sensing targets, network infrastructure, and resource allocation. Key components of the environment are:
User Mobility: Users (or vehicles in this case, for V2X applications) are mobile within the network, their locations, velocities, and trajectory vary over time. This high mobility is responsible for not only changing communication quality (because the channel condition changes), but also for degrading sensing quality (as the target may move).
Interference: User and sensing devices operate in the same spectrum cause self-interference (SI), mutual interference (MI), and cross-link interference (CLI). The level of interference is dynamically set by the environment according to the network density, the mobility and the resource allocation decisions of the agent.
Traffic Load: Different network services, user accesses, traffic patterns and bandwidth requirements all lead to different levels of network congestion which are emulated by the environment. Traffic demand changes from time to time, according to which the agent has to manage power allocation and bandwidth management.
Training process: DRL agent’s interaction with the environment
The DRL agent learns about the environment by iteratively interacting with it and adapting its actions according to the observed states as well as the received rewards of these actions. This is how the agent is trained:
- State Space (S): The state space encompasses network condition such as CSI, user mobility, interference level, traffic load and sensing accuracy. Given a single step of training the agent will receive an observation of the environment, i.e., the current state
, which provides information such as:- CSI: The channel quality between the base station and users.
- User mobility: Position and speed of users, which affect beamforming and resource allocation decisions.
- Interference levels: Current levels of interference (both self-interference and mutual interference) impacting the communication and sensing functions.
- Traffic load: The current demand for communication resources, considering user activity and data requirements.
- Sensing accuracy: Metrics related to target detection, localization accuracy, and environmental sensing quality.
- Action Space (A): The action space represents the decisions the DRL agent can make to allocate resources. The actions correspond to:
- Beamforming vectors: Adjustments to the antenna configurations to direct the signal towards the intended communication or sensing targets.
- Power allocation: The distribution of power among different users or sensing tasks to ensure optimal communication and sensing performance.
- Frequency management: Adjusting the frequency allocation across communication and sensing functions to minimize interference and improve spectrum efficiency.
- Reward Function (R): The reward function, mentioned above in Equation (18) is designed to encourage the agent to maximize spectral efficiency (SE), minimize sensing error (e.g., CRLB), enhance energy efficiency (EE), and manage interference. The agent receives feedback in the form of a reward after every action it takes, based on the improvement or degradation in these metrics:
- Maximize Spectral Efficiency (SE): High throughput is encouraged by rewarding the agent when communication rates increase.
- Minimize Sensing Error: The agent is penalized when sensing accuracy, such as target localization error, increases.
- Maximize Energy Efficiency (EE): Efficient power usage is rewarded to encourage the agent to minimize energy consumption while maintaining communication and sensing performance.
- Minimize Interference: The agent is penalized for introducing high interference, ensuring that both communication and sensing functions are performed without degrading each other’s performance.
- Training Algorithm (DDPG): The Deep Deterministic Policy Gradient (DDPG) algorithm is used for continuous action spaces (beamforming, power allocation, etc.). DDPG uses an actor-critic architecture, where:
- Actor: The policy network generates actions based on the state space (beamforming vectors, power allocation decisions, and frequency management).
- Critic: The value network evaluates the actions taken by the actor by calculating the Q-value, which estimates the expected return (cumulative reward) of a given action.
Adaptation to changing conditions
The DRL agent adapts to changing network conditions through continuous learning and feedback from the environment. The training process is designed to account for dynamic network conditions such as user mobility, traffic load, and interference. Here’s how the agent adapts:
User Mobility: The CSI, beamforming vectors, and level of interference vary as the user moves. The agent is trained to determine optimal beamforming strategies for maintaining high spectral efficiency and energy efficiency without degrading as user locations and speeds change. For instance, when a user exits the beam pattern coverage area, the agent changes the beamforming angles to maintain signal strength.
Interference: The agent trains under various interference conditions (self-interference, mutual interference, etc.). It learns to mitigate interference through dynamic scheduling of beamforming, power and frequency in accordance with instantaneous feedback. The agent selects optimal resources for avoiding the interference between CS and SS functions in order to increase the SINR in the system and achieve a well-behaved coexistence of both services.
Traffic Load: The agent continuously learns how to allocate resources based on network congestion and traffic demands. An agent has to simultaneously make decisions in power allocation and frequency allocation in order to guarantee that both communication and sensing tasks are met and resources are not overused when the network load changes (that is when the number of users or traffic demand increases).
In this iteration, the DRL based agent is trained until it learns resource allocation strategies in a flexible manner to adapt to different network conditions and to optimize the performance metrics (spectral efficiency, sensing error, energy efficiency, and interference). Eventually, the agent learns to make near-optimal decisions on-line that optimize the performance of the entire system in changing environments.
Interference management and optimization
Interference coordination is one of the main issues in ISAC systems, particularly where both communication and sensing are all active in the spectrum in the same frame simultaneously. The DRL agent in the proposed framework is responsible for joint optimization of interference and resources scheduling to guarantee that both communication and sensing functionalities are fulfilled without causing any mutual interference degradation. The DRL agent coordinates beamforming decisions, utilizes multi-agent learning techniques, and optimizes frequency and power allocation to address these challenges.
Beamforming decisions across multiple ISAC nodes
To avoid interference, the DRL agent schedules the beamforming operations of multiple ISAC nodes (base station, sensors, and users) in coordination. This scheduling guarantees that the signals transmitted by the various ISAC nodes can be made toward the corresponding hybrid users or targets correctly and the interference between them can be minimized.
- Beamforming Coordination: The agent solves the beamforming vectors for every ISAC node. The agent can focus communication beams on users and sensing beams on the environment by dynamically tuning the beamforming vectors. That is to say, it can prevent the communication signal and detection signal from interfering with each other, so that the spectral efficiency and the detection accuracy are both maximum. At each time t, the beamforming vector
of each ISAC node i is learned in a mathematical sense by the DRL agent according to the system state
, which contains the levels of interference, user locations and sensing targets and is given as:
Where, SE(t) represents the spectral efficiency,
25
is the aggregate interference power for ISAC node i defined in Eq. (15), considering both communication and sensing signals,
is a weight factor that balances spectral efficiency and interference reduction. Since
is to be minimized (Eq. (15)), it is incorporated as a penalty term with
in the beamforming decision criterion. To ensure comparable scales,
is normalized (e.g., by a reference interference level) and
controls the SE–interference trade-off. Adaptive Beamforming: The agent dynamically optimizes beamforming vectors with user dynamics, interference and network traffic network environment variation. For example, if a user strays from an optimal coverage region of a beam, the agent modifies the beamforming vector in real-time to guarantee sufficient strength for communication signals, while reducing interference to other users or sensing activities.
Multi-agent learning for optimizing communication and sensing interaction
In ISAC, communication and sensing are usually conducted over the same physical resources, which incurs interference with each other. The DRL agent leverages multi-agent learning as a means to enhance the inter-operation of these two functions. This is a method of cooperative learning of a plurality of agents (each assigned a role of a transmitter resource allocation or a receiver resource allocation).
- Multi-Agent Coordination: Interactions among agents, in terms of level of interference, beamforming vectors, and resource allocations. For instance, one agent may focus on communication network optimization while another may focus on sensing. They exchange information and decisions not to create conflicting allocations that result in high interference. The optimization process can be formulated as a joint resource allocation problem, where both agents aim to minimize interference while optimizing their respective functions. The objective is to find a Pareto-optimal solution where no function is improved without negatively impacting the other and is given in Equation (26):
Where,
26
is the beamforming vector,
is the power allocation, Interference (t) represents the total interference between communication and sensing signals, SE(t) and Sensing Accuracy (t) are the metrics that the agents optimize collaboratively,
are weights that determine the relative importance of communication and sensing.
Optimal resource allocation for frequency and power
A critical aspect of interference management is ensuring that both communication and sensing functions receive the required resources (e.g., frequency and power) without compromising the performance of the other. The DRL agent learns how to optimize frequency and power allocation for both functions, ensuring that these resources are shared efficiently between communication and sensing tasks.
- Frequency Allocation: The agent allocates frequency bands dynamically to communication and sensing functions. Since ISAC systems operate on shared spectrum, the agent’s goal is to minimize frequency overlap between communication and sensing signals, thereby reducing interference. The optimal frequency allocation is learned through interactions with the environment, considering factors such as network congestion, user demand, and sensing requirements. The mathematical formulation of frequency allocation at each time step t can be expressed in Equation (27):
where,
27
denotes the selected coordination/action (e.g., joint policy or fusion decision) at time t, F represents the set of available frequencies.
denotes the aggregate interference power (SI+MI+CLI), and sensing performance is quantified using CRLB; therefore,
and CRLB are minimized, while SE is maximized via the negative sign. - Power Allocation: The DRL agent also optimizes the power distribution between communication and sensing functions. The agent allocates power to ensure that communication users experience sufficient signal quality (measured in SINR), while sensing tasks (such as target localization and detection accuracy) also receive enough power for high-precision results. The power allocation strategy is learned as part of the action space for the DRL agent, with the following objective function and is given as:
Where
28
represents the power allocation vector and the agent adjusts the power distribution dynamically to meet both communication QoS and sensing accuracy requirements. The negative SINR term ensures that power allocation decisions explicitly maximize communication signal quality, while interference and CRLB are minimized to maintain balanced ISAC performance.
Adaptation to interference conditions
The DRL agent dynamically adjusts to varying interference environments by learning to change the beamforming and resource allocation policies considering the current state of the network. The agent is trained to acquire the best interference compensation strategies for the following scenarios:
High Mobility: Interference will vary as users or sensing targets move, hence the agent will need to adapt its beamforming as well as power allocation to provide best performance.
Dense Network Traffic: In high density network with network congestion, the content aware agent optimizes frequency allocation and power resources to tackle interference and achieve high spectrum utilization. The DRL agent will leverage such iterative interactions and learn to best mitigate interference and allocate resource, allowing communication and sensing to jointly learn without compromising with one performance against another. The learning model parameters are described in Table 2.
Table 2.
Summary of DRL framework parameters.
| Parameter | Description |
|---|---|
| Learning framework | Deep Reinforcement Learning (DDPG) |
| State variables | Channel state, interference level, sensing requirement, power budget |
| Action variables | Beamforming, power allocation, frequency selection |
| Reward components | SE, EE, CRLB, interference |
| Learning type | Online interaction-based learning |
| Agent structure | Actor–critic architecture |
Computational complexity and training overhead analysis
The proposed AI-driven resource allocation framework employs a deep reinforcement learning (DRL) agent to jointly optimize communication and sensing performance in ISAC-enabled 6G networks. From an implementation perspective, it is important to distinguish between the offline training phase and the online inference phase. During offline training, the DRL agent learns an optimal policy through interactions with the simulated ISAC environment. This phase involves iterative forward and backward passes through the actor–critic neural networks and therefore incurs higher computational overhead. However, this process is performed offline and does not affect real-time system operation. Moreover, training can be executed on centralized or cloud-based computing platforms equipped with sufficient computational resources, which is consistent with current 6G network intelligence paradigms. In contrast, the online deployment phase only requires a forward pass through the trained neural network to infer resource allocation decisions based on the observed system state. The computational complexity of this inference step scales linearly with the number of network parameters and is significantly lower than that of conventional iterative optimization-based methods. As a result, real-time decision-making can be achieved within the stringent latency constraints of practical ISAC systems. Furthermore, since the trained model can be periodically updated rather than retrained continuously, the associated training overhead does not hinder real-time operation. These characteristics demonstrate that the proposed DRL-based resource allocation framework is computationally feasible and well-suited for real-time implementation in practical 6G ISAC scenarios.
Robustness and generalization considerations
The proposed DRL-based ISAC resource allocation framework is designed to operate under dynamic wireless environments characterized by time-varying channels, interference fluctuations, and changing network states. During training, the agent interacts with a stochastic environment in which channel conditions and interference levels vary across episodes. As a result, the learned policy captures the underlying statistical behavior of the wireless channel rather than overfitting to a specific realization, which enhances robustness against channel uncertainty. Mobility-induced variations are implicitly reflected in the observed state dynamics, as changes in channel quality and interference patterns are continuously fed into the DRL agent. By learning a state–action mapping over a wide range of network conditions, the proposed approach is capable of adapting its resource allocation decisions to varying mobility and traffic scenarios. Moreover, since the agent is trained over a diverse set of network states, the learned policy can generalize to previously unseen conditions within the same operating regime. While extreme mobility patterns or highly non-stationary environments may require additional retraining or online adaptation mechanisms, the current framework provides a robust and flexible foundation for ISAC resource optimization in practical 6G networks.
Energy efficiency considerations
Energy efficiency is a critical design objective in future 6G ISAC networks, where communication and sensing functionalities must be supported under stringent power constraints. In the proposed framework, energy efficiency is implicitly enhanced through intelligent power allocation decisions learned by the DRL agent. By jointly optimizing communication quality and sensing performance while accounting for interference effects, the agent avoids excessive transmit power usage that would otherwise yield diminishing performance gains. Unlike conventional schemes that rely on fixed power allocation strategies, the proposed learning-based approach dynamically adapts transmission power according to the instantaneous network state. This adaptability enables more efficient utilization of available energy resources, particularly in interference-limited ISAC scenarios. The framework therefore aligns with energy-aware ISAC design principles reported in prior studies, such as42, while extending them to a joint communication–sensing optimization setting using DRL. Although energy efficiency is not explicitly formulated as a standalone optimization objective in this work, the proposed framework provides a flexible foundation upon which explicit energy efficiency metrics can be incorporated into the reward function in future extensions.
Simulation, results and discussion
In this section, we present the simulation results of the proposed AI-based dynamic resource allocation model for the ISAC systems in the 6G networks in this section. The model is compared under a range of three different scenarios corresponding to different networking conditions (e.g., user mobility, interference, traffic load). The simulation results confirm the efficiency of the DRL based method in solving the beamforming, power allocation, frequency allocation, sensing accuracy, and spectral and energy efficiency optimization problems.
Simulation setup and scenarios
The simulation scenario is intended to emulate realistic 6G ISAC networks, in which the communication and sensing functionalities are co-located in a common infrastructure. Such an environment is dynamically changing due to factors such as mobility of users, levels of interference, and load on the network. The simulation settings are given as follows:
High-Mobility Urban Scenario: This scenario models high network mobility where a mobile network with a large number of nodes is moving at high speeds (e.g. vehicles in a city). The high mobility also renders the CSI fast-changing, hence requiring dynamic space control and power adaptation to achieve an optimal communication throughput and sensing accuracy.
Dense Smart City Scenario: A smart city scenario where a large number of users are present in different traffic loads and interference levels. The environment has packed user distribution and it turns to be high interference. The agent should make best decision on beamforming and frequency assignment so as to mitigate mutual interference between the communication and sensing signals in a systematic manner.
Rural Area Scenario: Low powered scattered network and minimum interference. In this case the priorities are very high sensing accuracy and detection performance under low mobility case. The agent optimizes power allocation and beamforming by balancing sensing and communication performance.
At the beginning, we summarize the key simulation parameters as shown in Table 3, used throughout the evaluation to ensure clarity and reproducibility.
Table 3.
Simulation parameters and evaluation settings.
| Parameter | Value |
|---|---|
| Scenarios | High-mobility urban; dense smart city; rural area |
| Number of users (K) | Swept: 1–100 |
| Number of antennas (U) | Swept: 1–64 |
AP maximum transmit power ( ) |
Swept: 0–30 dBm |
| Desired sensing angle range |
to
|
| Methods compared | AI-driven; convex optimization; static beamforming |
| Channel model | Rayleigh fading |
| Path-loss model | Distance-dependent path loss |
| Interference model | SI + MI + CLI (aggregate interference) |
| Sensing performance metric | CRLB |
| Communication performance metric | Spectral efficiency (bits/s/Hz) |
| DRL algorithm | DDPG |
Performance metrics
The performance of the proposed model is compared against traditional optimization techniques, including convex optimization and static beamforming:
Spectral Efficiency (SE): Achieved throughput for communication users, measured in bps/Hz.
Sensing Accuracy (CRLB): Error in sensing estimation, such as target localization.
Energy Efficiency (EE): Power consumed relative to data throughput, measured in energy per bit or power consumption per successful transmission.
Interference: Level of interference between communication and sensing signals, quantified using SINR (Signal-to-Interference-plus-Noise Ratio) or INR (Interference-to-Noise Ratio).
Table 4 shows the characteristics of different deployment scenarios involved in the simulations.
Table 4.
Characteristics of different deployment scenarios.
| Parameter | Scenario description |
|---|---|
| User mobility | High-mobility urban: High (60 km/h); Dense smart city: Moderate (walking, slow cars); Rural area: Low (stationary/slow) |
| Network density | High-mobility urban: High (100–200 users/km ); Dense smart city: High (50–100 users/km ); Rural area: Low (less than 10 users/km ) |
| Interference level | High-mobility urban: High; Dense smart city: Medium to high; Rural area: Low |
| Traffic load | High-mobility urban: High (fluctuating); Dense smart city: Medium to high; Rural area: Low |
| Channel conditions | High-mobility urban: Fast fading; Dense smart city: Moderate fading; Rural area: Stable |
| Sensing accuracy requirement | High-mobility urban: High; Dense smart city: Moderate; Rural area: High |
Results and discussion
The simulation results show that the AI-based model outperforms the classical methods in several crucial respects:
Spectral efficiency
The spectral efficiency of the proposed AI driven approach is up to 20% more efficient than that of the conventional ones as shown in Fig. 1.
Fig. 1.
Learning-based ISAC resource allocation framework.
AI-driven Dynamic Resource Allocation: AI model consistently surpasses the results of both convex optimization and static beamforming in all three scenarios. For the High-Mobility Urban, the AI model obtains 20% more spectral efficiency than that when convex optimization. From the simulation results, we can see the superiority of the AI model. (15%) and static beamforming (10%) are used. This enhancement is attained since the real-time dynamic beamforming and power allocation decisions are performed by the DRL agent and adapt to rapid variations in user mobility and interference.
Dense Smart City: In the Dense Smart City scenario, the AI model achieves 18% spectral efficiency, outperforming convex optimization (14%) and static beamforming (12%). The higher density of users and interference in this scenario requires more sophisticated resource allocation, which the AI model handles effectively by minimizing mutual interference and optimizing frequency usage (Fig. 2).
Rural Area: While the Rural Area scenario has relatively low interference and user density, the AI model still provides superior performance with 15% spectral efficiency compared to 13% (convex optimization) and 11% (static beamforming). The model efficiently allocates power and frequency even in low-interference environments, ensuring the maximum utilization of available spectrum.
Fig. 2.

Relative spectral efficiency improvement (
) comparison of the proposed AI-driven ISAC framework with convex optimization and static beamforming under High-Mobility Urban, Dense Smart City, and Rural Area scenarios.
Sum Rate is closely related to spectral efficiency because it directly measures the capacity of the communication system (i.e., how much data can be transmitted per unit of bandwidth). The sum rate is often used as an indicator of how efficiently the spectrum is utilized, which is a key component of spectral efficiency.
For the sake of comparison, we have calculated Sum rate in context of number of antennas as shown in Figure 3, and in context of AP transmit power as shown in Figure 4, and in context of number of users K as shown in Figure 5.
Fig. 3.
Sum rate vs number of antennas (L).
Fig. 4.
Sum rate vs AP transmit power.
Fig. 5.
Sum rate vs AP number of users (k).
-
Sum Rate Vs Number of Antennas (L)
Figure 3 shows sum rate against number of antennas. The simulation results showed:
- High-Mobility Urban: The AI-driven model shows the highest sum rate, reaching
bps/Hz at L = 64 antennas, benefiting from dynamic beamforming. Convex optimization achieves
bps/Hz at L = 64, while static beamforming provides
bps/Hz at L = 64 antennas. The AI-driven model outperforms convex optimization by 40% and static beamforming by 50%. - Dense Smart City: In this scenario, the AI-driven model achieves
bps/Hz at L = 64 antennas, with moderate improvement in sum rate due to user density and interference. Convex optimization reaches
bps/Hz, and static beamforming provides
bps/Hz. The AI model delivers a 30% better sum rate than convex optimization and 50% better than static beamforming. - Rural Area: The AI-driven model achieves the highest sum rate of
bps/Hz at L = 64 antennas, taking advantage of the low interference. Convex optimization shows a sum rate of
bps/Hz, and static beamforming provides
bps/Hz. The AI-driven model outperforms convex optimization by 45% and static beamforming by 50%.
- Sum Rate vs AP Transmit Power Figure 4 shows sum rate against AP transmit power. The simulation results showed:
- High-Mobility Urban: The AI-based model performs much better in terms of sum rate, reaching a sum rate of
bps/Hz at p max=30 dBm. Convex optimization offers a limited increase, up to 20 bps/Hz at pmax = 30 dBm. The sum rate performance of the static beamforming is improved at a slow rate with a value of
bps/Hz when pmax=30 dBm. As pmax increases, the AI-driven model shows an almost 40% increase in sum rate compared to static beamforming. - Dense Smart City: The AI model achieves a sum rate of
bps/Hz at p max = 30 dBm, slightly less than the High-Mobile Urban, because of larger interference. The convex optimization is able to achieve a sum rate of 18 bps/Hz and the static beamforming performs a sum rate
bps/Hz at pmax=30 dBm. In this scenario, the AI model still demonstrates an improvement of around 30% over convex optimization and 50% over static beamforming. - Rural Area: In Rural Area setting, the AI model has the best sum rate performance and it can approach to approximately 32 bps/Hz at pmax=30 dBm. The convex optimization sum rate is 22 bps/Hz and the static beam forming sum rate is 16 at the maximum power pmax = 30 dBm. The AI model performs up to 45% better than convex optimization and 50% better than static beamforming in this low-interference environment.
- Sum Rate vs number of users Figure 5 shows sum rate against number of users. The simulation results showed:
- High-Mobility Urban: The AI-driven model achieves the highest sum rate, reaching
bps/Hz at K = 100 users. Convex optimization shows a moderate increase in sum rate, peaking at
bps/Hz at K = 100 users, while static beamforming remains the least efficient, with a sum rate of
bps/Hz at K = 100 users. The AI-driven model outperforms convex optimization by 40% and static beamforming by 50%. - Dense Smart City: The AI-driven model achieves a sum rate of
bps/Hz at K = 100 users, reflecting the higher interference and user density. Convex optimization reaches
bps/Hz, and static beamforming shows a sum rate of
bps/Hz at K = 100 users. The AI model provides 30% better performance than convex optimization and 50% better than static beamforming. - Rural Area: The AI-driven model demonstrates the best performance, with a sum rate of
bps/Hz at K = 100 users, benefiting from the low interference. Convex optimization reaches
bps/Hz, and static beamforming provides
bps/Hz. The AI-driven model outperforms convex optimization by 45% and static beamforming by 50%.
Sensing accuracy
By optimizing beamforming and power allocation, the AI model reduces CRLB, improving target detection accuracy and localization precision. Figure 6 shows the sensing accuracy comparison. The simulation results showed:
AI-driven Dynamic Resource Allocation: The AI model shows significant improvements in sensing accuracy, as evidenced by the reduction in Cramér-Rao Lower Bound (CRLB). In the High-Mobility Urban scenario, the AI model achieves a 25% improvement in sensing accuracy, compared to 15% for convex optimization and 10% for static beamforming. This can be attributed to the real-time optimization of beamforming and power allocation that enhances target detection and localization despite high mobility and dynamic channel conditions.
Dense Smart City: The AI model improves sensing accuracy by 22%, while convex optimization and static beamforming achieve 14% and 12% improvements, respectively. The dense urban environment poses a challenge in terms of interference, but the AI model manages to maintain high accuracy by effectively mitigating interference and adjusting the resources dynamically for sensing tasks.
Rural Area: In the Rural Area scenario, the AI model achieves an 18% improvement, outperforming convex optimization (12%) and static beamforming (11%). The low interference environment allows the AI-driven model to focus more on maximizing sensing accuracy, especially in remote sensing applications where precise target localization is crucial.
Fig. 6.

Sensing accuracy comparison.
The gain of beam pattern is a key resource to improve the sensing precision of ISAC systems. It shows the capability of the system for concentrating energy to the desired angles of sensing and increasing the power of signal at these angles. Higher beam pattern gains result in better target localization and detection; this is because interference from the directions that are unwanted for sensing is reduced, while all the sensing resources are properly focused on the targets. This dynamic adaptation of the beamforming process directly contributes to improved sensing accuracy and CRLB (Cramér-Rao Lower Bound), as shown in the simulation results. Figure 7 shows the beam pattern gain vs. angle comparison graph, showing the AI-driven model, convex optimization, and static beamforming approaches.
Fig. 7.
Directional Gain profiles for various algorithms over a
angular spread with K = 20 user, N = 5 targets, and L = 32 antennas.
High-Mobility Urban Scenario: In the High-Mobility Urban scenario, the AI-driven model achieves a peak gain of
dB at the central angle (0
), demonstrating its ability to dynamically adjust beamforming for optimal performance in highly dynamic environments. The convex optimization method provides a peak gain of
dB, showing moderate improvement over the static beamforming approach, which has a peak gain of
dB. The AI-driven model’s dynamic beamforming ensures more focused energy towards the desired sensing angles, improving both sensing accuracy and interference suppression.Dense Smart City Scenario: In the Dense Smart City scenario, the AI-driven model still leads with a peak gain of
dB, slightly lower than the High-Mobility Urban scenario due to increased interference from higher user density. However, the AI model maintains superior adaptability to network congestion. The convex optimization method achieves a peak gain of
dB, and static beamforming continues to show the least effective beamforming with a peak gain of
dB. The lower gains in this scenario reflect the challenges posed by interference in dense urban environments, but the AI model still outperforms the others.Rural Area Scenario: In the Rural Area scenario, the AI-driven model performs at its best, achieving a peak gain of
dB, reflecting the low interference and low user density in rural areas. The convex optimization method provides a peak gain of
dB, benefiting from the stable channel conditions but still lacking the flexibility of the AI-driven model. Static beamforming shows the least effective performance, with a peak gain of
dB. In this scenario, the AI-driven model demonstrates its ability to efficiently focus on the desired sensing angles, making it highly effective in low-interference environments.
Energy efficiency
The proposed model optimizes power consumption, improving energy efficiency by reducing the energy per bit. Figure 8 shows energy efficiency comparison. The simulation results showed:
AI-driven Dynamic Resource Allocation: The AI model shows significant improvements in energy efficiency across all scenarios. In the High-Mobility Urban scenario, the AI model achieves a 30% improvement in energy efficiency compared to convex optimization (20%) and static beamforming (10%). This improvement is largely due to dynamic power allocation and optimized beamforming, which reduces unnecessary power usage while maintaining communication quality and sensing accuracy.
Dense Smart City: In this scenario, the AI model achieves a 28% improvement in energy efficiency, while convex optimization and static beamforming improve by 18% and 12%, respectively. The AI model adapts to network congestion and interference, allocating power efficiently to maximize spectral efficiency and sensing accuracy without consuming excessive energy.
Rural Area: In the Rural Area scenario, the AI model improves energy efficiency by 25%, compared to 17% for convex optimization and 11% for static beamforming. The agent’s ability to adjust resources for low interference environments ensures that power consumption is minimized while maintaining communication and sensing performance.
Fig. 8.

Energy efficiency comparison.
Interference management
The AI-driven approach effectively minimizes interference between communication and sensing tasks, enhancing SINR and ensuring that both functions are achieved without compromising each other. Figure 9 shows the interference management comparison. The simulation results showed:
AI-driven Dynamic Resource Allocation: The AI model excels in interference management, significantly reducing mutual interference between communication and sensing functions. In the High-Mobility Urban scenario, the AI model achieves a 40% reduction in interference, compared to 25% for convex optimization and 10% for static beamforming. The AI-driven model dynamically adapts beamforming and power allocation strategies to minimize self-interference (SI) and mutual interference (MI).
Dense Smart City: In this case the AI model has a 35% improvement compared to convex optimization (20%) and static beamforming (12%). The AI model exploits the dynamic resource allocation and power management that are essential in a dense urban environment for the proper allocation of frequency and power among queues of users given real-time minimum rate constraints.
Rural Area: With low interference in play, this setup allows the AI model to cut 30% of the interference even further down, surpassing convex optimization (15%) and static beamforming (11%). It means the model can optimize the beamforming and power allocation, and reduces the interference even in moderately load-free environments.
Fig. 9.

Interference management comparison.
These findings illustrate the strong capability of AI based dynamic resource allocation towards the performance enhancement of ISAC systems in terms of spectral efficiency, sensing accuracy, energy efficiency, and interference management. The percentage of improvements are presented in the Table 5 as below:
Table 5.
Performance gains of the proposed AI-driven method across different scenarios.
| Parameter | Scenario-wise Performance |
|---|---|
| Spectral efficiency (SE) | High-mobility urban: 20% increase (AI-driven); Dense smart city: 18% increase (AI-driven); Rural area: 15% increase (AI-driven) |
| Sensing accuracy improvement | High-mobility urban: 25% improvement (AI-driven); Dense smart city: 22% improvement (AI-driven); Rural area: 18% improvement (AI-driven) |
| Energy efficiency (EE) | High-mobility urban: 30% improvement (AI-driven); Dense smart city: 28% improvement (AI-driven); Rural area: 25% improvement (AI-driven) |
| Interference reduction | High-mobility urban: 40% reduction (AI-driven); Dense smart city: 35% reduction (AI-driven); Rural area: 30% reduction (AI-driven) |
Discussion on Learning-based benchmark comparisons
Recent studies have explored learning-based resource allocation for ISAC systems using different system models and optimization objectives. For example, the works in43,44 employ reinforcement learning frameworks tailored to specific ISAC scenarios, such as particular sensing metrics, network topologies, or channel assumptions. While these approaches demonstrate the effectiveness of learning-based optimization, their problem formulations, state representations, and reward designs differ substantially from the joint communication–sensing optimization considered in this work. Due to these fundamental differences in system assumptions and performance objectives, a direct numerical comparison would require full reimplementation and careful parameter alignment to ensure fairness. Such an undertaking is beyond the scope of the current study. Nevertheless, compared to existing learning-based ISAC solutions, the proposed framework explicitly accounts for multi-dimensional interference, joint communication and sensing quality metrics, and dynamic resource allocation under a unified DRL formulation, which distinguishes it from prior approaches. A comprehensive quantitative comparison with other learning-based ISAC methods under a unified simulation environment is an interesting direction for future work.
Conclusion
An AI-based dynamic resource allocation (DRA) model is proposed for ISAC systems in the context of 6G networks, considering the challenges in term of spectral efficiency, sensing accuracy, energy efficiency, interference management, etc. Utilizing DRL, the model continuously learns from environmental dynamics and situation awareness, and also dynamically adjusts communication and sensing capabilities. The simulation results demonstrate that while increasing the system dimensionality the AI-based model surpasses conventional solutions such as convex optimization and static beamforming in terms of the considered performance indicators. In particular, the AI approach provides remarkable increase in spectral efficiency (up to 20%), enhancement of sensing accuracy (up to 25% decrease in CRLB), and energy efficiency (up to 30%). Besides, the model provides better interference cancellation, which minimizes interference by 40%, enabling the effective coexistence of sensing and communication.
The sum rate performance illustrates the fact that the AI-based model is able to better exploit the system capacity. In all three examples, the AI based model significantly outperforms the convex optimization and the static beamforming with the sum rate improvement of up to 45% and 50%, respectively. This indicates the flexibility of the model in dynamically allocating resources for maximizing spectral efficiency under dynamic and dense scenarios. As for beam pattern gain, the AI-based model performs well in focusing system energy towards the desired sensing angles. It achieves maximum gain of 30 dB, 28 dB, and
dB in high-mobility urban, dense smart city, and rural area scenario, respectively. With such dynamic optimization of beamforming, the sensing accuracy is remarkably enhanced, for the system is capable to concentrate on the interested angles while mitigate the interference. The peak gains obtained from the convex optimization and static beamforming methods are generally lower than the rest, stressing the significance of real-time adaptation to achieve high sensing accuracy. These results also demonstrate the promising future for AI based 6G ISAC systems with AI based resource allocation and system improvement in evolving environment.
Future works
In future work, we plan to improve the model performance by considering multi-agent reinforcement learning for even denser interference suppression, and to study the scalability in large-scale networks. Moreover, network-level optimization and edge computing are supposed to be considered to improve the real-time adaptability of the proposed system, which would promote the application of the proposed systems in real ISAC applications of 6G.
Author contributions
Ghani, Madeeha and Zubair conceived of the presented idea. Ali and Alfakeeh implemented the approach and carried out the experiments. Madeeha and Omar were involved in supervising the project and helped designing the experiments. All authors discussed the results and contributed to the final manuscript.
Funding
Not Applicable
Data availability
All data generated or analyzed during this study are included in this published article
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Ghani Ur Rehman, Email: Ghani.rehman@kkkuk.edu.pk.
Muhammad Zubair, Email: muhammad.zubair_1@hw.ac.uk.
References
- 1.Aldirmaz-Colak, S., Namdar, M., Basgumus, A., Özyurt, S., Kulac, S., Calik, N., Yazici, M. A., Serbes, A. & Durak-Ata, L. A comprehensive review on isac for 6g: Enabling technologies, security, and ai/ml perspectives, IEEE Access.
- 2.Singh, S. & Samal, U. Integrated sensing and communication in next-generation wireless networks: Insights and trends. Int. J. Commun. Syst.38(5), e70014 (2025). [Google Scholar]
- 3.Rehman, G. U. et al. Misbehavior of nodes in iot based vehicular delay tolerant networks vdtns. Multimedia Tools Appl.82(5), 7841–7859 (2023). [Google Scholar]
- 4.Rehman, G. u., Ghani, A., Zubair, M., Ghayyure, S. A. & Muhammad, S. Honesty based democratic scheme to improve community cooperation for internet of things based vehicular delay tolerant networks, Transactions on Emerging Telecommunications Technologies 32 (1) e4191 (2021).
- 5.Temiz, M., Zhang, Y., Fu, Y., Zhang, C., Meng, C., Kaplan, O., Masouros, C. Deep learning-based techniques for integrated sensing and communication systems: State-of-the-art, challenges, and opportunities, IEEE Open Journal of the Communications Society.
- 6.Du, Z. et al. Toward isac-empowered vehicular networks: Framework, advances, and opportunities. IEEE Wireless Commun.32(2), 222–229 (2025). [Google Scholar]
- 7.Bukhari, A. et al. Renewable energy driven on-road wireless charging infrastructure for electric vehicles in smart cities: A prototype design and analysis. Energy Rep.12, 5145–5154 (2024). [Google Scholar]
- 8.Kaushik, A. et al. Integrated sensing and communications for iot: Synergies with key 6g technology enablers. IEEE Internet Things Magaz.7(5), 136–143 (2024). [Google Scholar]
- 9.Alharbey, R. et al. Digital twin technology for enhanced smart grid performance: Integrating sustainability, security, and efficiency. Front. Energy Res.12, 1397748 (2024). [Google Scholar]
- 10.Subramaniyan, M., Venkatasamy, T. K., Mathiyalagan, N. Y. & Hossen, M. J. Adaptive resource allocation and routing for integrated sensing and communications for wireless technologies. EURASIP J. Wireless Commun. Netw.2025(1), 33 (2025). [Google Scholar]
- 11.Alhussien, N. & Gulliver, T. A. Toward ai-enabled green 6g networks: A resource management perspective, IEEE Access.
- 12.Janjua, J. I. et al. Enhancing smart grid electricity prediction with the fusion of intelligent modeling and xai integration. Int. J. Adv. Appl. Sci.11(5), 230–248 (2024). [Google Scholar]
- 13.DAWOOD, H. Towards deep learning prospects: Insights for social media analytics.
- 14.Sheraz, M., Chuah, T. C., Tareen, W. U.K., Al-Habashna, A., Saeed, S. I., Ahmed, M., Lee, I. E. & Guizani, M.: A comprehensive survey on genai-enabled 6g: Technologies, challenges, and future research avenues, IEEE Open Journal of the Communications Society.
- 15.Rajalakshmi, P. et al. Towards 6g v2x sidelink: Survey of resource allocation—Mathematical formulations, challenges, and proposed solutions. IEEE Open J. Vehic. Technol.5, 344–383 (2024). [Google Scholar]
- 16.Ahmed, N. U. R. et al. Visual deepfake detection: Review of techniques, tools, limitations, and future prospects. IEEE Access.13, 1923–1961 (2024). [Google Scholar]
- 17.Liu, F. et al. Toward dual-functional radar-communication systems: Optimal waveform design. IEEE Trans. Signal Process.66(16), 4264–4279 (2018). [Google Scholar]
- 18.Wang, X., Hassanien, A. & Amin, M. G. Dual-function mimo radar communications system design via sparse array optimization. IEEE Trans. Aerospace Electron. Syst.55(3), 1213–1226 (2018). [Google Scholar]
- 19.Du, J., Tang, Y., Wei, X., Xiong, J., Zhu, J., Yin, H., Zhang, C. & Chen, H. An overview of resource allocation in integrated sensing and communication, in: 2023 IEEE/CIC International Conference on Communications in China (ICCC Workshops), IEEE, pp. 1–6 (2023).
- 20.Wang, Z., Jia, W., Zhao, J., Jin, W. & Yu, Y. Fair resource allocation with noise ddpg for uav enabled isac systems. Electron. Lett.61(1), e70277 (2025). [Google Scholar]
- 21.Khalili, A., Rezaei, A., Xu, D., Dressler, F. & Schober, R. Efficient uav hovering, resource allocation, and trajectory design for isac with limited backhaul capacity, IEEE Transactions on Wireless Communications.
- 22.Dong, F. et al. Sensing as a service in 6g perceptive networks: A unified framework for isac resource allocation. IEEE Trans. Wireless Commun.22(5), 3522–3536 (2022). [Google Scholar]
- 23.Shah, S. N. H., Khan, A. U., Schneider, C. & Robert, J. Spyder: Qos-aware radio resource allocation in multiuser isac-capable c-v2x networks, IEEE Open Journal of the Communications Society.
- 24.Osorio, D. P., Barua, B., Besser, K.-L., Blue, H., Dass, P. & Porambage, P. The rise of networked isac: Emerging aspects and challenges, IEEE Open Journal of the Communications Society.
- 25.Wu, N. et al. Ai-enhanced integrated sensing and communications: Advancements, challenges, and prospects. IEEE Commun. Magaz.62(9), 144–150 (2024). [Google Scholar]
- 26.Al-Hatim, Y. M. A. Othman Al Janaby, Artificial-intelligence-enhanced beamforming for power-efficient user targeting in 5g networks using reinforcement learning. Int. J. Comput. Digital Syst.16(1), 1083–1095 (2024). [Google Scholar]
- 27.Sun, C. et al. Ai model selection and monitoring for beam management in 5g-advanced. IEEE Open J. Commun. Society5, 38–50 (2023). [Google Scholar]
- 28.Lu, Z., Gursoy, M. C., Mohan, C. K. & Varshney, P. K. Learning-based resource management in integrated sensing and communication systems, in: IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE, pp. 1–6 (2024).
- 29.Wang, J., Du, H., Liu, Y., Sun, G., Niyato, D., Mao, S., Kim, D. I. & Shen, X. Generative ai based secure wireless sensing for isac networks, IEEE Transactions on Information Forensics and Security.
- 30.Saikia, P., Singh, K., Huang, W.-J. & Duong, T. Q. Hybrid deep reinforcement learning for enhancing localization and communication efficiency in ris-aided cooperative isac systems. IEEE Internet Things J.11(18), 29494–29510 (2024). [Google Scholar]
- 31.Liu, Z., Chen, X., Wu, H., Wang, Z., Chen, X., Niyato, D., & Huang, K. Integrated sensing and edge ai: Realizing intelligent perception in 6g, IEEE Communications Surveys & Tutorials.
- 32.Elbir, V. Mishra, Heath, Twenty-five years of advances in beamforming: From convex and nonconvex optimization to learning techniques, IEEE Signal Processing Magazine 118–131 (2023).
- 33.Kassir, H. A. et al. A review of the state of the art and future challenges of deep learning-based beamforming. IEEE Access10, 80869–80882 (2022). [Google Scholar]
- 34.Ahmad, I., Narmeen, R., Becvar, Z. & Guvenc, I. Machine learning-based beamforming for unmanned aerial vehicles equipped with reconfigurable intelligent surfaces. IEEE Wireless Commun.29(4), 32–38 (2022). [Google Scholar]
- 35.Zargari, S., Galappaththige, D., Tellambura, C. & Li, G. Y. Downlink beamforming for cell-free isac: A fast complex oblique manifold approach, IEEE Transactions on Wireless Communications.
- 36.Kanani, P., Omidi, M. J., Modarres-Hashemi, M. & Yanikomeroglu, H. Haps-isac: Enhancing sensing and communication in 6g networks with advanced mimo beamforming, IEEE Open Journal of the Communications Society.
- 37.Tang, A., Wang, X. & Zhang, J. A. Interference management for full-duplex isac in b5g/6g networks: Architectures, challenges, and solutions. IEEE Commun. Magaz.62(9), 20–26 (2024). [Google Scholar]
- 38.Niu, Y., Wei, Z., Ma, D., Yang, X., Wu, H., Feng, Z. & Yuan, J. Interference management in mimo-isac systems: A transceiver design approach, IEEE Transactions on Cognitive Communications and Networking.
- 39.Niu, Y., Wei, Z., Wang, L., Wu, H. & Feng, Z. Interference management for integrated sensing and communication systems: A survey, IEEE Internet of Things Journal. [DOI] [PMC free article] [PubMed]
- 40.Cheng, G., Song, X., Lyu, Z. & Xu, J. Networked isac for low-altitude economy: Coordinated transmit beamforming and uav trajectory design, IEEE Transactions on Communications.
- 41.Z. Chen, F. Wang, G. Han, X. & Wang, V. K. Lau, Robust beamforming design for secure near-field isac systems, IEEE Wireless Communications Letters.
- 42.Jee, A. & Prakriya, S. Performance of energy and spectrally efficient af relay-aided incremental cdrt noma-based iot network with imperfect sic for smart cities. IEEE Internet Things J.10(21), 18766–18781. 10.1109/JIOT.2022.3229102 (2023). [Google Scholar]
- 43.Saikia, P. et al. Ris-aided integrated sensing and communication systems: STAR-RIS versus passive RIS. IEEE Open J. Commun. Society5, 7954–7973. 10.1109/OJCOMS.2024.3515933 (2024). [Google Scholar]
- 44.Saikia, P., Jee, A., Singh, K., Huang, W.-J., Boulogeorgos, A.-A. A. & Tsiftsis, T. A. Hybrid-ris empowered uav-assisted ISAC systems: Transfer learning-based DRL, IEEE Transactions on Communications. 10.1109/TCOMM.2025.3548797.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data generated or analyzed during this study are included in this published article
















