Stochastic Geometric-Based Modeling for Partial Offloading Task Computing in Edge-AI Systems

Hamid Saeedi; Ali Nouruzi

doi:10.3390/s25226892

. 2025 Nov 12;25(22):6892. doi: 10.3390/s25226892

Stochastic Geometric-Based Modeling for Partial Offloading Task Computing in Edge-AI Systems

Hamid Saeedi ^1,^*, Ali Nouruzi ¹

PMCID: PMC12656120 PMID: 41305100

Abstract

This paper proposes a cooperative framework for resource allocation in multi-access edge computing (MEC) under a partial task offloading setting, addressing the joint challenges of learning performance and system efficiency in heterogeneous edge environments. In the proposed architecture, selected users act as edge servers (SEs) that collaboratively assist others alongside a central server (CS). A joint optimization problem is formulated to integrate model training with resource allocation while accounting for data freshness and spatial correlation among user tasks. The correlation-aware formulation penalizes outdated and redundant data, leading to improved robustness against non-i.i.d. distributions. To solve the NP-hard problem efficiently, a projected gradient descent (PGD) method is developed. The simulation results demonstrate that the proposed cooperative approach achieves a balanced delay of $0.042$ s, close to edge-only computing ( $0.033$ s) and $30 %$ lower than the CS-only mode, while improving clustering accuracy to $99.2 %$ (up to $15 %$ higher than the baseline). Moreover, it reduces the central server load by nearly half, ensuring scalability and latency compliance within 3GPP limits. These findings confirm that cooperation between SEs and the CS substantially enhances reliability and performance in distributed Edge-AI system.

Keywords: resource allocation, multi-access edge computing, edge AI, partial offloading

1. Introduction

1.1. State-of-the-Art and Motivations

Recent advances in artificial intelligence (AI) have sparked innovation in various technological domains. Most conventional AI solutions rely on centralized computing architectures that depend on large-scale data collection. While these centralized models offer powerful learning, perception, and decision-making capabilities, they have their own limitations in dynamic and uncertain environments. Transferring data to a central server and accumulating it for computing is time-consuming and costly. Additionally, providing adequate computing resources on a central server is challenging, especially under high demand or time-sensitive conditions. To address these challenges, researchers have proposed more distributed and adaptive architectures. One prominent approach is multi-access edge computing (MEC), which brings computing and storage resources closer to end devices at the edge of the network. This configuration improves task offloading efficiency and enables real-time system responsiveness. Edge AI enables the execution of AI models directly at the network edge, allowing real-time decision-making with reduced latency and lower reliance on centralized servers. This approach is particularly valuable in time-sensitive and resource-constrained environments [1,2]. Due to the significant advantages of Edge AI, such as enhanced operational efficiency, improved data privacy, and ultra-low latency, global interest and investment in this field have surged in recent years. According to market analysis, the global Edge AI market is projected to exceed USD 66 billion by 2030 [3,4,5].

Despite its benefits, Edge AI and MEC are faced with several significant challenges. One of the most critical issues is the propagation of uncertainty in distributed environments. In this regard, stochastic geometry models can provide a powerful analytical framework for characterizing spatial randomness and evaluating network performance [6,7,8,9,10]. Furthermore, due to asynchronous updates and decentralized model execution, local errors can accumulate and spread across the network, ultimately degrading the overall model accuracy and reliability. Another major challenge lies in the non-independent and identically distributed (non-i.i.d) nature of user data. While various solutions have been proposed in the context of federated learning (FL) to mitigate this issue, concerns remain regarding the generalization and reliability of global models trained on highly heterogeneous local datasets [11]. Such data distribution disparities can significantly disrupt model convergence and performance. While most existing works on non-i.i.d data in FL and machine learning (ML) focus on non-uniformity of local data distributions (e.g., label skew or quantity skew), fewer studies address the non-independence aspect, where user data may be statistically correlated or dependent across clients [12,13,14].

These challenges highlight the need for robust and correlation-aware learning frameworks that maintain accuracy, adaptability, and convergence under decentralized and statistically dependent conditions. In such environments, traditional centralized or federated approaches often fall short, particularly when dealing with heterogeneous and temporally dynamic data [15,16,17].

To address these limitations, we propose a distributed learning framework tailored for partial task offloading in MEC environments. Unlike conventional paradigms, our approach enables user devices to selectively offload task-related data to edge or central servers (CS) via data-level cooperation. This setup leverages local computation, respects delay constraints, and accommodates statistical heterogeneity across users. A key novelty of our model lies in its integration of spatial and directional task correlations. This is often overlooked in existing works, i.e, the data is treated as being i.i.d which is not an accurate assumption. We introduce a correlation-aware loss function that explicitly incorporates data freshness and penalizes contributions from distant or weakly correlated users. This design enhances the relevance of updates, reduces the impact of outdated or biased data, and improves overall learning robustness in Edge AI systems. Using stochastic geometry analysis, we can analytically guarantee that users can be served by at least one local server at any time. In summary, the main question addressed in this paper is as follows: How can user tasks in a network be served efficiently under a partial offloading MEC scenario, with priority for edge servers, while taking into account the characteristics of an AI-driven edge model, including communication latency and the challenges posed by non-i.i.d data?

1.2. Contributions

Different from previous works [18,19,20,21,22,23], this paper introduces a novel cooperative computation framework for partial offloading in MEC environments. The main contributions are as follows:

We propose a cooperative partial offloading model in MEC environments, where tasks can be processed either locally by neighboring users or centrally by a server, taking spatial and directional correlations into account where a closed-form upper bound for spatial correlation is derived to constrain offloading decisions based on user proximity and sensing overlap.
To minimize learning loss under delay and resource constraints, a novel optimization problem is formulated that incorporates freshness-aware weighting, correlation modeling, and allocation decisions. Moreover, to improve robustness in non-i.i.d. settings, we integrate earth mover’s distance (EMD) into the loss function to capture distributional dissimilarity among users’ data.
To ensure scalability, we develop a coordination-free solution method suitable for practical deployment in distributed MEC systems.
Leveraging stochastic geometry, we provide tractable analytical characterizations of coverage probability and delay distribution, which not only enable probabilistic guarantees on task offloading but also ensure the generalizability of the proposed framework to large-scale and heterogeneous MEC networks.
We show that using the proposed framework, we can significantly reduce the computation load on the central server compared to baseline schemes and for a given delay threshold, we can give service to considerably higher number of users.

2. Related Works

In this section, we first review related works on MEC systems and their resource management strategies, followed by a discussion on studies focusing on Edge AI. Regardless of the method used to solve the optimization problem in MEC-based systems, task offloading is typically classified as either full or partial. In full offloading, the entire task is computed at the edge server, while in partial offloading, the task is split between local and edge computing server to balance latency and resource usage [24,25,26]. In [27] the authors investigate parallel task offloading in fog-enabled IoT networks, where computational tasks are divided into multiple sub-tasks and executed concurrently across heterogeneous fog nodes. They formulate the resource allocation problem with the objective of minimizing overall task latency while ensuring simultaneous completion of all sub-tasks, which is essential for efficient utilization of distributed resources. To address the inherent instability caused by interdependent task assignments, the authors design a matching-based allocation framework that maintains stable associations between task-originating devices and helper nodes. Through extensive simulations, the proposed method is shown to reduce average latency by up to 52% under high workload conditions, outperforming several state-of-the-art offloading strategies and demonstrating its suitability for large-scale, delay-sensitive IoT systems. In [20] the authors propose a computation offloading scheme for IoT applications involving dependent tasks. This scheme consists of two main components: a multi-queue priority algorithm that schedules dependent sub-tasks, and a deep reinforcement learning method based on Actor–Critic for making dynamic offloading decisions. The framework is designed to minimize task completion time and energy consumption in edge environments by handling task dependencies and fluctuating network conditions efficiently. By leveraging the Lyapunov method, the stochastic optimization problem is stated in [22] with the aim of providing a joint task offloading and resource allocation framework for edge-assisted machine learning inference. This framework focuses on minimizing end-to-end latency while preserving inference accuracy and queue stability. The model considers local and edge inference options, whereby each console can adapt the quality of uploaded data and dynamically decide on offloading and computing strategies. The problem is decomposed into three sub-problems: offloading and channel allocation; data quality adjustment; and computational resource assignment. These sub-problems are solved using convex optimization and low-complexity heuristic methods. The simulation results demonstrate substantial improvements in latency reduction and stability under dynamic edge environments.

In [28], the authors proposed a method called the Decentralized Distributed Sequential Neural Network (DDSNN), tailored for low-power devices in wireless sensor networks, where conventional deep models face strict memory and energy constraints. By sequentially partitioning a LeNet model across multiple nodes, DDSNN enables fully decentralized inference without the need for compression or centralized coordination. In a predictive maintenance case study involving industrial pump vibration data, the framework preserved full precision, achieved 99% accuracy, and reduced inference latency by nearly 50% compared to the baseline. Although the accuracy gain over the non-distributed model was marginal, the authors emphasize that in highly resource-constrained settings, even slight improvements are significant, making DDSNN a practical and scalable solution.

The approach proposed in [21] addresses FL in heterogeneous edge environments by allowing each edge device to perform a different number of local updates, adapting to their computational capacity. This idea lays a foundation for handling heterogeneity in edge systems and can be extended further to include other real-world factors. For instance, while the original model focuses on computational diversity, it does not account for communication delays or queuing effects that are common in distributed systems. Additionally, the influence of data heterogeneity is not explicitly modeled.

Authors in [18] introduce a dynamic client selection strategy for FL, aiming to improve training efficiency under resource constraints in edge environments. The approach incorporates client-side characteristics such as computation power and data volume into a selection metric to adaptively involve suitable participants in each communication round. This method improves convergence speed and reduces communication cost. However, the model assumes relatively consistent client availability and does not explicitly address delay variability or the impact of severe data heterogeneity, which are common in real-world edge networks. Moreover, the paper focuses primarily on client selection policies rather than modeling deeper aspects such as uncertainty propagation or the effects of delayed or imbalanced updates on global performance. These aspects open opportunities for extending this methodology toward more delay-aware and distribution-sensitive learning frameworks.

To improve personalization in FL, the authors in [29] propose an adversarial training framework combined with data-free knowledge distillation. Their method leverages earth mover’s distance (EMD) to align local and global data distributions, effectively addressing the challenge of non-i.i.d data across clients while preserving privacy. Although this framework improves global model adaptation, it does not consider delay sensitivity, dynamic task partitioning, or computing constraints common in edge environments. Nevertheless, the idea presents a promising direction that can be extended to such realistic, delay-aware, and resource-constrained edge AI settings.

With the aim of addressing label distribution skew in FL, the authors in [30] propose a novel learning approach that leverages knowledge distillation and a label-invariant teacher-student framework. Their method focuses on mitigating performance degradation due to non-i.i.d label distributions without directly sharing model parameters. This work sheds light on the importance of decoupling label heterogeneity from the model optimization process. While the approach effectively addresses label skew, it can be extended to incorporate delay sensitivity and partial offloading in edge computing environments—dimensions must be considered.

Furthermore, authors in [31] have explored pruning-aware collaborative inference of large AI models at the edge, where model partitioning and resource optimization are jointly considered to balance accuracy, latency, and energy consumption. While this provides important insights into enabling efficient edge inference, it does not address aspects such as training under heterogeneous data distributions and stochastic task offloading, which remain central to advancing edge intelligence.

To facilitate a more effective comparison between this work and prior studies, we present Table 1.

Table 1.

Comparison of related works with our proposed framework.

Ref.	Method	Main Contribution	Comparison with Our Work
[27]	Matching-based allocation in fog IoT	Stable parallel sub-task execution, latency reduction	Focuses on fog parallelism, our work integrates spatial correlation and non-i.i.d. modeling in MEC
[20]	Multi-queue scheduling with Actor–Critic DRL	Dependent task completion, energy efficiency	Addresses task dependencies, our work emphasizes correlation-aware offloading and robustness to heterogeneous data
[22]	Lyapunov-based stochastic optimization	Joint offloading and resource allocation with latency/queue guarantees	Provides stability analysis, our framework couples delay guarantees with PGD-based optimization
[28]	Decentralized Sequential Neural Network (DDSNN)	Lightweight inference across low-power devices	Focuses on TinyML inference; our work targets MEC task offloading with stochastic geometry and learning integration
[21]	Adaptive local updates in heterogeneous FL	Handles device heterogeneity by adjusting update counts	Considers computational diversity, our approach also incorporates delay constraints and correlation-aware offloading
[18]	Client selection strategy for FL	Faster convergence, reduced communication overhead	Optimizes participant choice, our framework integrates delay guarantees and data heterogeneity in MEC
[29]	Adversarial FL with Earth Mover’s Distance	Improves global adaptation under non-i.i.d. data	Focuses on privacy-preserving FL, our work extends EMD to MEC with offloading and latency constraints
[30]	Label-invariant knowledge distillation in FL	Mitigates label skew via teacher-student framework	Addresses label heterogeneity, our framework also accounts for spatial correlation, partial offloading, and delay guarantees
Our Work	PGD-based joint optimization in MEC	Cooperative partial offloading, EMD-based robustness to non-i.i.d., stochastic geometry analysis, reduced CS load	Provides unified framework coupling task offloading, correlation modeling, and delay-aware learning optimization

Open in a new tab

Symbol Notation: $f_{X} (x)$ and $F_{X} (x)$ denote the probability density function (PDF) and cumulative distribution function (CDF) of x. $Pr (x)$ denotes the probability of x. $| . |$ is the absolute value, and $⌈ . ⌉$ is the ceiling function.

3. System Model and Parameters

This article focuses on a task offloading scenario in which data-generating equipment, referred to as requesting entities (REs), can delegate their computational tasks either to nearby local edge devices, referred to as serving entities (SEs), or to a centralized server (CS). The goal is to enable efficient and delay-sensitive learning by leveraging Edge AI frameworks, where training occurs across distributed nodes with limited resources and potentially heterogeneous data distributions. In this regard, as can be seen in Figure 1, we consider a central server for task computing.

A hybrid edge architecture in which some entities request task execution (REs) and others provide computing resources (SEs). A CS is also available for additional offloading. Typical use cases include smart campuses or factories, where IoT devices and robots can be considered as REs, and where attendees smartphones can be considered as SEs performing partial task offloading.

The set of equipment whose tasks needed to be addressed is represented by $RE = {r}_{1}^{R}$ . In addition, the set of equipment that can provide computing resources for local task computing is denoted by $SE = {s}_{1}^{S}$ . We assume that the devices are randomly distributed over an area of size $| A |$ (in square meters), following a Poisson Point Process (PPP) [32,33,34]. Let $λ_{SE}$ and $λ_{RE}$ denote the spatial intensities of SEs and REs, respectively, measured in devices per square meter. According to the properties of the PPP, the probability of observing exactly k devices of a given type—either SEs or REs—within a region of area $| A |$ is given by $f (k) = \frac{e^{- λ | A |} {(λ | A |)}^{k}}{k!},$ where $λ \in {λ_{SE}, λ_{RE}}$ denotes the spatial density (intensity) of SEs or REs, respectively. Furthermore, we assume that the spatial distributions of SEs and REs are independent (In our model, REs represent IoT devices or sensors that are typically deployed in large numbers and remain stationary within a given environment (e.g., a campus or factory). In contrast, SEs correspond to mobile devices, such as smartphones or laptops, which are carried by human users. Since the placement of REs is governed by specific deployment requirements, while the mobility of SEs is driven by human movement patterns, it is reasonable to assume that the spatial distributions of REs and SEs are mutually independent. Moreover, the locations of individual REs and SEs are also assumed to be independent of one another. This independence assumption is widely adopted in stochastic geometry models, as it enhances both realism and analytical tractability) of each other, consistent with the properties of independent homogeneous PPPs [33,35,36].

The size of the task for RE r is denoted by $D_{r}$ (in byte), while the computing capacity of SE s is represented by $C_{s}$ (in CPU cycles/s). Similarly, the computing capacity of the CS is denoted by $\hat{C}$ . In this paper, we assume that the task data consists of image-based content. (For instance, one can envision a scenario involving sensors and mobile robots deployed across environments such as a university campus, smart factory, or smart city. These devices are responsible for capturing images of their surroundings to support perception and decision-making tasks.) Accordingly, we adopt a widely used benchmark dataset to model the image data in our experiments. In line with this, we define a dynamic field of view (FoV) for each RE r, represented by an angular direction $ϕ_{r}^{t}$ at time slot t. This direction evolves over time according t: $ϕ_{r}^{t} = ϕ_{r}^{0} + t φ,$ where $ϕ_{r}^{0} \sim [0, 2 π]$ is the initial viewing angle and $φ$ is a constant angular increment per time slot. In addition, each RE is assumed to have a symmetric viewing range with a total width of $\tilde{φ}$ , meaning that at time t, RE r can observe all the objects located within the angular sector $[ϕ_{r}^{t} - \frac{\tilde{φ}}{2}, ϕ_{r}^{t} + \frac{\tilde{φ}}{2}] .$ If the angular displacement $φ$ between two consecutive time slots satisfies $φ \geq \tilde{φ}$ , then the task based on the data collected at time t is assumed to be independent of the task based on the data collected at time $t - 1$ . Otherwise, a dependency exists, which will be addressed in the subsequent sections.

Furthermore, the maximum visual sensing range for all REs is assumed to be equal and is represented by L. In addition, the time that RE r collects the data for its task, is denoted by $t - 1 < τ_{r} \leq t$ . Furthermore, the maximum range of each edge server is denoted by $L$ . In this paper, we assume that each RE can be covered by at least $S$ servers but can only be assigned to one, the details of which is provided in Section 5. A data sample from the dataset of RE r is denoted by $x_{r} \in D_{r}$ , where $D_{r}$ represents the dataset associated with RE r. The size of each REs dataset is modeled as a uniform random variable:

\begin{matrix} | D_{r} | \sim Uniform [n_{\min}, n_{\max}], \forall r, \end{matrix}

(1)

where $n_{\min}$ and $n_{\max}$ denote the minimum and maximum number of samples, respectively. Here, $n_{\max} = ϱ N$ , where N is the total number of samples in the global dataset, and $ϱ \in [0, 1]$ quantifies the degree of quantity skew across the REs. Here we assume that for the transmitted $x_{r}$ , the received version at SEs is represented by $x_{r}^{s}$ , and at the CS, by ${\hat{x}}_{r}$ , where the effect of wireless channel transmission such as additive noise and fading have in been incorporated within them.

We use notation $δ_{r, s}^{t} \in {0, 1}$ to represent the assignment variable, where $δ_{r, s}^{t} = 1$ if RE r is assigned to SE s, and $δ_{r, s}^{t} = 0$ otherwise. Let $α_{r}^{t} \in [0, 1]$ represent the fraction of the data task from RE r that is offloaded to the CS. Consequently, $(1 - α_{r}^{t})$ denotes the fraction of the task offloaded to the assigned SE s. Accordingly, the number of data samples offloaded to the CS by RE r is $⌊ α_{r}^{t} D_{r} ⌋$ , and we denote this subset as ${\hat{D}}_{r} = {{\hat{x}}_{r}}$ . Similarly, the number of samples offloaded to SE s is $⌊ (1 - α_{r}^{t}) D_{r} ⌋$ , and we denote this subset as $D_{r}^{s} = {x_{r}^{s}}$ . To increase the readability of paper, the list of symbols are provided in Table 2.

Table 2.

List of system model parameters.

Symbol	Description
$RE$	Set of requesting entities (task generators)
$SE$	Set of serving entities (edge servers)
$λ_{RE}$	Spatial density of REs (devices/m²)
$λ_{SE}$	Spatial density of SEs (devices/m²)
$\| A \|$	Total coverage area
$D_{r}$	Task size of RE r (bytes)
$C_{s}$	Computing capacity of SE s (CPU cycles/s)
$\hat{C}$	Computing capacity of central server (CS)
L	Maximum sensing/coverage range of REs (m)
$ϕ_{r}^{t}$	Viewing angle of RE r at time slot t
$\tilde{φ}$	Angular width of the field of view (FoV)
$φ$	Angular displacement between time slots
$κ$	Correlation threshold among REs’ FoVs
$f (k)$	PPP probability of observing k nodes in area $\| A \|$

Open in a new tab

4. The Proposed Problem

4.1. Constraints

Each RE must be assigned to one SE, as a result,

\begin{matrix} \sum_{s} δ_{r, s}^{t} = 1, \forall r, \forall t . \end{matrix}

(2)

Furthermore, the following constraints ensure that the total computational load assigned to each SE and the CS does not exceed their respective computing capacities:

\begin{matrix} \sum_{r} γ δ_{r, s}^{t} (1 - α_{r}^{t}) D_{r} \leq C_{s}, \forall s, \forall t, \end{matrix}

(3)

\begin{matrix} \sum_{r} γ α_{r}^{t} D_{r} \leq \hat{C}, \forall t, \end{matrix}

(4)

where $γ$ denotes the number of CPU cycles required per byte [37]. In addition, we consider computing delay, transmission delay, and queuing delay. The computing delay is calculated by:

\begin{matrix} T_{r, Comp} = γ (\frac{α_{r}^{t} D_{r}}{\hat{C}} + \frac{(1 - α_{r}^{t}) D_{r}}{C_{s}}), \forall r . \end{matrix}

(5)

The transmission delay also is obtained by the following:

\begin{matrix} T_{r, Trans} = \frac{α_{r}^{t} D_{r}}{\hat{V}} + \frac{(1 - α_{r}^{t}) D_{r}}{V}, \forall r, \end{matrix}

(6)

where V and $\hat{V}$ are the transmission data rate between REs and SEs and the CS, in byte per second, respectively. By considering the $M / G / 1$ queuing model for both the CS and the SEs, the total queuing delay experienced by RE r under partial offloading is modeled by Pollaczek–Khinchin formula. We denote by $λ_{task}$ the task generation rate of each RE (in tasks per second). The net arrival rates to CS and to SE s are then

\begin{matrix} Λ_{CS} & = λ_{task} \sum_{r} α_{r} \approx λ_{task} | A | λ_{RE} \bar{α}, \end{matrix}

(7)

\begin{matrix} Λ_{s} & = λ_{task} \sum_{r} {\tilde{δ}}_{r, s} (1 - α_{r}), \end{matrix}

(8)

where $\bar{α}$ denotes the average offloading fraction across REs (for large systems one may use $\bar{α} = E [α_{r}]$ ). For the CS, the service time of a task originating from RE r (seconds) is

\begin{matrix} S_{CS, r} = γ \frac{α_{r} D_{r}}{\hat{C}}, D_{r} in [byte], \hat{C} in [CPU cycles / s] . \end{matrix}

(9)

Consequently, the first and second moments of the service time at the CS are

\begin{matrix} E [S_{CS}] = \frac{E [γ α D]}{\hat{C}}, E [S_{CS}^{2}] = \frac{E [{(γ α D)}^{2}]}{{\hat{C}}^{2}} . \end{matrix}

(10)

By the Pollaczek–Khinchin formula for an $M / G / 1$ queue, the mean waiting time in queue (excluding service) at the CS is

\begin{matrix} W_{q, CS} = \frac{Λ_{CS} E [S_{CS}^{2}]}{2 (1 - ρ_{CS})}, ρ_{CS} = Λ_{CS} E [S_{CS}] . \end{matrix}

(11)

Analogously, for SE s with service time $S_{s, r} = γ \frac{(1 - α_{r}) D_{r}}{C_{s}}$ we have

\begin{matrix} E [S_{s}] & = \frac{E [γ (1 - α) D]}{C_{s}}, E [S_{s}^{2}] = \frac{E [γ^{2} {(1 - α)}^{2} D^{2}]}{C_{s}^{2}}, \end{matrix}

(12)

\begin{matrix} W_{q, s} & = \frac{Λ_{s} E [S_{s}^{2}]}{2 (1 - ρ_{s})}, ρ_{s} = Λ_{s} E [S_{s}] . \end{matrix}

(13)

Finally, the expected queuing delay experienced by RE r (in seconds) under partial offloading and soft assignments is given by the mixture

\begin{matrix} T_{r, Que} = α_{r} W_{q, CS} + (1 - α_{r}) \sum_{s} {\tilde{δ}}_{r, s} W_{q, s} . \end{matrix}

(14)

Finally, by the following constraint, we guarantee that total delay $T_{Max}$ must be less than threshold of maximum delay $T_{Max}$ :

\begin{matrix} T_{r, Comp} + T_{r, Trans} + T_{r, Que} = T_{r, Total} \leq T_{Max}, \forall r . \end{matrix}

(15)

4.2. Data Freshness and Distribution-Aware Loss Modeling

Due to the geographical distribution of REs, each RE observes a distinct data distribution, leading to statistical heterogeneity (non-i.i.d) across the system. This diversity may adversely affect model convergence, as many solutions assume i.i.d data and are developed based on this assumption. We refer to this scenario as non-i.i.d.-blind and we try to develop a non-i.i.d.-aware model. To do so, we formulate a delay-aware and distribution-sensitive loss model that incorporates statistical dissimilarity, temporal dynamics, and communication delay penalties. The global loss function at the CS is given by the following:

\begin{matrix} L (α, θ) = \sum_{r} \sum_{{\hat{x}}_{r} \in {\hat{D}}_{r}} \frac{α_{r}^{t} D_{r}}{\sum_{r} α_{r}^{t} D_{r}} l ({\hat{x}}_{r}; θ) ν (τ_{r}), \forall t, \end{matrix}

(16)

where $l ({\hat{x}}_{r}; θ)$ is the local loss. In parallel, the loss function at each SE s is computed as follows:

\begin{matrix} L_{s} (α, δ, θ_{s}) = \\ \sum_{r} \sum_{x_{r}^{s} \in D_{r}^{s}} \frac{δ_{r, s}^{t} ((1 - α_{r}^{t}) D_{r})}{\sum_{r} δ_{r, s}^{t} ((1 - α_{r}^{t}) D_{r})} l (x_{r}^{s}; θ_{s}) g_{r}^{(t)} ν (τ_{r}), \forall s, \end{matrix}

(17)

where the delay penalty term is defined as $ν (τ_{r}) = e^{- (t - τ_{r})},$ and $τ_{r}$ denotes the time when the sample from RE r was generated. This function prioritizes fresher data by assigning smaller penalties to more recent samples. The dissimilarity coefficient $g_{r}^{(t)}$ quantifies the statistical distance between RE r’s local distribution and the global distribution using EMD:

\begin{matrix} g_{r}^{(t)} = exp (- \sum_{k = 1}^{K} |{\hat{P}}_{r, k}^{(t)} - P_{k}^{(t)}| w_{k}), \forall r . \end{matrix}

(18)

To model temporal dynamics, assuming that at time slot $t - 1$ , only $K^{'}$ classes have been observed (i.e., $P_{r, k}^{(t - 1)} > 0$ ), while the remaining $K - K^{'}$ classes are unseen, the estimated class probabilities at time t are as follows:

\begin{matrix} {\hat{P}}_{r, k}^{(t)} \approx \{\begin{matrix} P_{r, k}^{(t - 1)} \cdot e^{κ_{time} max (- 1, K^{'} - K)}, & if P_{r, k}^{(t - 1)} > 0, \\ ϵ, & if P_{r, k}^{(t - 1)} = 0, \end{matrix} \end{matrix}

(19)

where $κ_{time}$ controls the decay of outdated distributions and $ϵ$ ensures normalization (as shown in Appendix A). Furthermore, if all classes have been seen at time slot $t - 1$ , $(P_{r, k}^{(t - 1)} > 0, \forall k)$ , we assume that ${\hat{P}}_{r, k}^{(t)} \approx P_{r, k}^{(t - 1)}, \forall r, \forall k$ . Although this work focuses on quantity skew due to non-uniform sensing rates and task sizes, feature skew is partly reflected through the spatial randomness of RE datasets and wireless noise distortions. Label skew is not directly relevant here since the framework operates in an unsupervised setting. Future extensions could explicitly incorporate feature-level or domain-shift variations to capture broader heterogeneity conditions in Edge AI.

In addition, each RE is assumed to have a symmetric FoV with total angular width $\tilde{φ}$ , centered around its viewing angle $ϕ_{r}$ . That is, at time t, RE r can observe all objects located within an angular sector of width $\tilde{φ}$ centered at $ϕ_{r}$ .

We consider REs uniformly distributed within a circular region of radius L:

\begin{matrix} (∥ l_{r} ∥ \sim Uniform [0, L], \forall r), \end{matrix}

(20)

We incorporate both spatial and directional correlations between the tasks of REs r and $r^{'}$ using a physically meaningful and dimensionally consistent formulation. The correlation depends on their positions $l_{r}$ , $l_{r^{'}}$ and viewing angles $ϕ_{r}$ , $ϕ_{r^{'}}$ , with angular difference $Δ ϕ = | ϕ_{r} - ϕ_{r^{'}} |$ . As illustrated in Figure 2, the task correlation is modeled as follows:

\begin{matrix} C_{r, r^{'}} = exp (- \frac{∥ l_{r} - l_{r^{'}} ∥^{2}}{2 ℓ_{s}^{2}}) exp (- \frac{1 - cos (Δ ϕ)}{2 σ_{ϕ}^{2}}), \end{matrix}

(21)

where $ℓ_{s}$ is the spatial correlation length and $σ_{ϕ}$ is the angular correlation parameter, ensuring dimensional consistency. To ensure the correlation remains below a threshold $κ \in (0, 1]$ , we require

\begin{matrix} C_{r, r^{'}} \leq κ, \forall r, r^{'}, \end{matrix}

(22)

An illustrative example of how the FoV of REs overlap and how their spatial correlation influences their observations. The yellow star represents a typical point that is visible to both users.

This leads to the following geometric condition:

\begin{matrix} ∥ l_{r} - l_{r^{'}} ∥^{2} + ℓ_{s}^{2} (1 - cos (Δ ϕ)) \leq 2 ℓ_{s}^{2} ln (\frac{1}{κ}) . \end{matrix}

(23)

For system design, we consider the expected spatial configuration. Under uniform distribution, the expected squared distance is $E [∥ l_{r} - l_{r^{'}} ∥^{2}] = L^{2} / 3$ . This yields the following simplified angular correlation:

\begin{matrix} C_{\max} (Δ ϕ) = κ_{s} \cdot exp (- \frac{1 - cos (Δ ϕ)}{2 σ_{ϕ}^{2}}), \end{matrix}

(24)

where $κ_{s} = exp (- L^{2} / 6 ℓ_{s}^{2})$ represents the spatial correlation baseline. The fundamental design constraint becomes the following:

\begin{matrix} \tilde{φ} \leq arccos [1 - 2 σ_{ϕ}^{2} ln (\frac{exp (- L^{2} / 6 ℓ_{s}^{2})}{κ})], \end{matrix}

(25)

provided the argument lies in $[- 1, 1]$ . This closed-form expression provides a practical design guideline, clearly showing the trade-offs between spatial coverage (L), correlation parameters ( $ℓ_{s}$ , $σ_{ϕ}$ ), and the correlation threshold ( $κ$ ) (more details are provided in Appendix B).

4.3. Problem Formulation

With the aim of minimizing the total loss of the system, i.e., the loss of the CS and SEs over variables $α$ , $δ$ , and $θ$ , subject to the constraints discussed above, we state the following optimization problem:

\begin{matrix} min_{α, δ, θ} & L_{CS} (α, θ) + \sum_{s} L_{s} (α, δ, θ), \end{matrix}

(26a)

\begin{matrix} s . t : & \sum_{s} δ_{r, s}^{t} = 1, \forall r, \forall t, \end{matrix}

(26b)

\begin{matrix} T_{r, Total} \leq T_{Max}, \forall r, \end{matrix}

(26c)

\begin{matrix} \sum_{r} γ δ_{r, s}^{t} (1 - α_{r}^{t}) D_{r} \leq C_{s}, \forall s, \end{matrix}

(26d)

\begin{matrix} \sum_{r} γ α_{r}^{t} D_{r} \leq \hat{C}, \end{matrix}

(26e)

where constraint (26b) guarantees that each RE is assigned to exactly one SE, while (26c) ensures that the total delay does not exceed the maximum allowable threshold $T_{Max}$ . In addition, constraints (26d,e) restrict the sizes of tasks offloaded to the local SEs and the CS so that they remain within their respective computing capacities.

5. Feasibility Analysis

In this section, we present a mathematical analysis to evaluate the feasibility of the system and ensure its consistency. Based on the properties of a homogeneous PPP, the PDF of the distance D between an RE and its nearest serving entity SE is given by

f_{D} (d) = 2 π λ_{SE} d e^{- λ_{SE} π d^{2}}, d \geq 0,

(27)

where $λ_{SE}$ denotes the density of SEs. The expected distance can be derived as

E [D] = \frac{1}{2 \sqrt{λ_{SE}}} .

(28)

To guarantee that REs are within a maximum distance threshold $d_{max}$ from at least one SE with high probability, we impose

Pr (D \leq d_{max}) = 1 - e^{- λ_{SE} π d_{max}^{2}} \geq 1 - ε,

(29)

where $ε$ is the outage tolerance (i.e., the probability that an RE is not covered within $d_{max}$ ). Rearranging the above condition yields the following requirement on the SE density:

λ_{SE} \geq \frac{- ln (ε)}{π d_{max}^{2}} .

(30)

This condition provides a lower bound on the spatial density of SEs to achieve a coverage probability of at least $1 - ε$ within the distance threshold $d_{max}$ . To satisfy the requirement that each RE is covered by at least $S$ servers, we analyze the coverage under a homogeneous PPP. Let $\tilde{S}$ denote the number of SEs within distance $L$ of a typical RE. Then, $\tilde{S} \sim Poisson (λ_{SE} π L^{2})$ . To ensure $P (N \geq S) \geq 1 - \tilde{ε}$ , we must have the following:

e^{- λ_{SE} π L^{2}} \sum_{k = 0}^{S - 1} \frac{{(λ_{SE} π L^{2})}^{k}}{k!} \leq \tilde{ε},

(31)

and finally we have the following:

\begin{matrix} P (N \geq S) \geq 1 - e^{- λ_{SE} π L^{2}} \sum_{k = 0}^{S - 1} \frac{{(λ_{SE} π L^{2})}^{k}}{k!}, \end{matrix}

(32)

which implicitly provides a lower bound on $λ_{SE}$ as a function of $L$ and the target reliability $1 - \tilde{ε}$ . To illustrate, consider $S = 3$ required SEs per RE. When the reliability target is $1 - \tilde{ε} = 0.95$ , the corresponding coverage intensity is approximately $6.30$ . Hence, the minimum server density should satisfy the following:

\begin{matrix} λ_{SE} \geq \frac{6.30}{π L^{2}} . \end{matrix}

(33)

For $L = 100 m$ , this yields the following:

\begin{matrix} λ_{SE} \geq 2.01 \times 10^{- 4} {servers / m}^{2}, \end{matrix}

(34)

which is equivalent to one SE per $70 \times 70 m^{2}$ area on average. This corresponds to approximately 20 SEs per square kilometer, ensuring each RE has at least three SEs in range with $95 %$ reliability.

If the reliability requirement is increased to $1 - \tilde{ε} = 0.99$ , the coverage intensity increases to about $8.45$ , resulting in the following:

\begin{matrix} λ_{SE} \geq \frac{8.45}{π L^{2}} \approx 2.69 \times 10^{- 4} {servers / m}^{2} . \end{matrix}

(35)

For $L = 150 m$ , this reduces to $λ_{SE} \geq 1.20 \times 10^{- 4} {servers / m}^{2}$ , which still guarantees that each RE is covered by three SEs with probability at least $0.99$ .

These results numerically confirm that moderate SE densities, on the order of $10^{- 4}$ servers/m², are sufficient to maintain reliable multi-server coverage within typical urban cell sizes. Hereupon the set of servers to which RE r can be assigned based on the physical distance is denoted by $S_{r}$ . Assuming identical task sizes $D_{r} = \bar{D}$ and SE capacities $C_{s} = \bar{C}$ , the total required computing resources in the network is $R E \bar{D} γ$ . If all REs adopt a uniform offloading strategy $α_{r}^{t} = \bar{α}$ , then the computing loads are split as $\bar{D} γ \bar{α} R E$ for the CS and $\bar{D} γ (1 - \bar{α}) R E$ for the SEs. Assuming the total CS capacity is $\hat{C}$ and the total SE capacity is $λ_{SE} | A | \bar{C}$ , the maximum number of REs the network can support under this strategy is $R E_{Max} (\bar{α}) = min ⌊\frac{\hat{C}}{\bar{D} γ \bar{α}}, \frac{λ_{SE} | A | \bar{C}}{\bar{D} γ (1 - \bar{α})}⌋ .$

While this analysis provides an upper bound, it does not incorporate the binary task assignment variables $δ_{r, s}$ , which govern the actual RE-to-SE allocation decisions. As a result, the derived expression represents an idealized scenario. The practical feasibility of this result depends on whether the local constraints at each SE can be satisfied under the discrete assignment structure. Analytically, for ${\bar{α}}^{*}$ we have ${\bar{α}}^{*} = \frac{\hat{C}}{λ_{SE} | A | \bar{C} + \hat{C}} .$ Figure 3 illustrates this trade-off across values of $\bar{α}$ .

Effect of $\bar{α}$ on the maximum number of REs supported by the network.

6. Solution Method

6.1. Dataset Description

To evaluate the performance of the proposed model, we use the widely adopted MNIST dataset in image classification and ML-based systems [38,39,40]. The MNIST dataset consists of 70,000 gray-scale images of handwritten digits (0–9), each of size $28 \times 28$ pixels, with 60,000 samples for training and 10,000 for testing [41,42]. To simulate a realistic ML based setting, both datasets are partitioned in a non-i.i.d fashion across multiple edge nodes. Each node is assigned a unique subset of the data to reflect user-specific distributions, capturing the impact of data heterogeneity and decentralized learning on model performance.

6.2. Proposed Solution

We address the joint optimization problem in (26), which involves the offloading ratios $α$ , task assignment variables $δ$ , and learning model parameters $θ$ . The combinatorial nature of the binary assignment variables makes direct optimization computationally prohibitive. To overcome this challenge, we employ a continuous relaxation framework, in which discrete decision variables are parameterized through smooth mappings and optimized via projected gradient descent (PGD).

6.2.1. Joint and Disjoint Optimization Protocols

To evaluate the effectiveness of the proposed optimization, two schemes are developed. In the joint optimization approach, as the proposed method, all the parameters $(α, δ, θ)$ are updated simultaneously using PGD, which allows end-to-end coordination between learning dynamics and resource allocation decisions. In contrast, the disjoint optimization approach adopts, as a baseline, a sequential structure; one variable (e.g., $α$ ) is fixed while the remaining parameters $(δ, θ)$ are optimized iteratively; and the process repeats by substituting the latest updates in subsequent optimization rounds until convergence. Both optimization protocols operate under identical dataset partitions, computing capacities, and latency constraints, ensuring a fair and consistent comparison of their convergence behavior and performance.

6.2.2. Assignment Relaxation

Let $d_{r, s}$ denote the distance between RE r and SE s. The normalized distance is defined as

\begin{matrix} {\bar{d}}_{r, s} ≜ \frac{d_{r, s}}{d_{max}}, d_{max} = max_{r, s} d_{r, s} . \end{matrix}

(36)

The normalized load of SE s at time t is

\begin{matrix} {\tilde{C}}_{s}^{t} = \frac{\sum_{r} {\tilde{δ}}_{r, s}^{t} (1 - α_{r}^{t}) D_{r}}{C_{s}}, h_{s}^{t} ≜ {[1 - {\tilde{C}}_{s}^{t}]}_{+}, \end{matrix}

(37)

where $h_{s}^{t}$ represents the fraction of available capacity at SE s, and ${[x]}_{+} = max {0, x}$ . Based on these features, we define an affinity score between RE r and SE s:

\begin{matrix} q_{r, s}^{t} = exp (- λ_{d} {\bar{d}}_{r, s}) exp (- λ_{ℓ} {\tilde{C}}_{s}^{t}), λ_{d}, λ_{ℓ} > 0, \end{matrix}

(38)

which encourages assignments toward nearby and less-loaded servers. Matrix $b^{t} ≜ {b_{r, s}^{t}}_{r, s},$ is then expressed as a linear parametric function:

\begin{matrix} b_{r, s}^{t} = β_{0} + β_{d} (1 - {\bar{d}}_{r, s}) + β_{ℓ} h_{s}^{t} + β_{q} log q_{r, s}^{t} . \end{matrix}

(39)

The relaxed assignment is obtained via a softmax mapping:

\begin{matrix} {\tilde{δ}}_{r, s}^{t} = \frac{exp (b_{r, s}^{t})}{\sum_{s^{'}} exp (b_{r, s^{'}}^{t})}, \end{matrix}

(40)

which ensures ${\tilde{δ}}_{r}^{t}$ lies on the probability simplex.

6.2.3. Offloading Ratio Relaxation

For each RE r, we define an aggregate edge suitability score:

\begin{matrix} G_{r}^{t} ≜ \sum_{s} {\tilde{δ}}_{r, s}^{t} h_{s}^{t}, \end{matrix}

(41)

which increases when nearby SEs have higher available capacity. The central-server logit, collected into the vector $a^{t} ≜ {a_{r}^{t}}_{r},$ is parameterized as

\begin{matrix} a_{r}^{t} = τ_{0} - τ_{1} G_{r}^{t}, τ_{1} > 0, \end{matrix}

(42)

leading to the relaxed offloading ratio

\begin{matrix} α_{r}^{t} = σ (a_{r}^{t}), \end{matrix}

(43)

where $σ (\cdot)$ is the sigmoid function. Thus, higher edge suitability $G_{r}^{t}$ reduces $α_{r}^{t}$ , prioritizing task processing at the edge.

6.2.4. Joint Optimization

The relaxed decision variables $(a, b)$ and the learning model parameters $θ$ are optimized on the augmented objective:

\begin{matrix} L_{total} = L_{Task} + λ_{1} P_{capacity} + λ_{2} P_{delay} . \end{matrix}

(44)

Here, $L_{Task}$ denotes the unsupervised learning loss, composed of a reconstruction term and a distribution-alignment term to address non-i.i.d. data across servers:

\begin{matrix} L_{Task} = E [∥ x - \hat{x} {(z) ∥}^{2}] + μ D_{Div} (p (z | edge), p (z | CS)), \end{matrix}

(45)

where $\hat{x} (z)$ denotes the reconstructed task from latent representation z, and $D_{Div}$ is a statistical divergence measure. The penalty terms ensure feasibility with respect to capacity and delay constraints:

\begin{matrix} P_{capacity} = E [max {0, U (θ) - C_{max}}], \end{matrix}

(46)

\begin{matrix} P_{delay} = E [max {0, T (θ) - T_{max}}] . \end{matrix}

(47)

6.2.5. PGD Mathematical Details

The optimization problem in (26) is of the constrained form,

\begin{matrix} min_{x \in C} f (x), x ≜ (α, δ, θ), \end{matrix}

(48)

where $f (x)$ is the total system loss and $C$ is the feasible set induced by (26b–e). To solve this, PGD alternates between a gradient step

\begin{matrix} y^{(k + 1)} & = x^{(k)} - η \nabla f (x^{(k)}), \end{matrix}

(49)

and a projection step

\begin{matrix} x^{(k + 1)} & = Π_{C} (y^{(k + 1)}), \end{matrix}

(50)

where $η > 0$ is the learning rate and $Π_{C} (\cdot)$ denotes Euclidean projection onto $C$ :

\begin{matrix} Π_{C} (y) = arg min_{z \in C} {∥ z - y ∥}_{2} . \end{matrix}

(51)

The gradient step reduces the objective in the unconstrained space, while the projection enforces feasibility with respect to capacity and delay. Under mild assumptions (e.g., Lipschitz continuity of $\nabla f$ ), PGD converges to a first-order stationary point. PGD updates at each iteration k, and the relaxed logits $(a, b)$ and the model parameters $θ$ are updated via

\begin{matrix} a^{(k + 1)} & = a^{(k)} - η_{a} \nabla_{a} L_{total}^{(k)}, \end{matrix}

(52)

\begin{matrix} b^{(k + 1)} & = b^{(k)} - η_{b} \nabla_{b} L_{total}^{(k)}, \end{matrix}

(53)

\begin{matrix} θ^{(k + 1)} & = θ^{(k)} - η_{θ} \nabla_{θ} L_{total}^{(k)} . \end{matrix}

(54)

The sigmoid and softmax mappings ensure that $α$ and $δ$ remain valid throughout the updates. Finally, discrete assignments are obtained as

\begin{matrix} δ_{r, s}^{t} = 1 [s = arg max_{s^{'}} {\tilde{δ}}_{r, s^{'}}^{t}] . \end{matrix}

(55)

The overall procedure is summarized in Algorithm 1.

Algorithm 1 Joint PGD-based offloading and assignment optimization

1:
Hyper-parameters: Server capacities ${C_{s}}$ , CS capacity $\hat{C}$ , maximum delay $T_{max}$ , learning rates $η_{a}, η_{b}, η_{θ}$ , penalty weights $λ$ , initial logits $a^{0}, b^{0}$ , initial model parameters $θ^{0}$
2:
for each time slot t do
3:
for iteration k do
4:
Compute offloading ratios: $α_{r}^{t, k} = σ (a_{r}^{(k)}), \forall r$
5:
Compute soft assignments: ${\tilde{δ}}_{r, s}^{(k)} = \frac{exp (b_{r, s}^{(k)})}{\sum_{s^{'}} exp (b_{r, s^{'}}^{(k)})}, \forall r, s$
6:
Evaluate loss: $L^{(k)} = L (α^{(k)}, {\tilde{δ}}^{(k)}, θ^{(k)})$
7:
Compute gradients:

$g_{a} = \nabla_{a} L^{(k)}, g_{b} = \nabla_{b} L^{(k)}, g_{θ} = \nabla_{θ} L^{(k)}$
8:
     Update parameters:

      $a^{(k + 1)} = a^{(k)} - η_{a} g_{a}$ ;

      $b^{(k + 1)} = b^{(k)} - η_{b} g_{b}$ ;

      $θ^{(k + 1)} = θ^{(k)} - η_{θ} g_{θ}$
9:
end for
10:
end for

Open in a new tab

6.2.6. Computational Complexity and Scalability Discussion

The computational complexity of the proposed PGD-based joint optimization mainly arises from gradient evaluations and projection operations. Each iteration involves $O (R S + P_{θ})$ operations, where R and S denote the numbers of REs and SEs, respectively, and $P_{θ}$ is the number of learnable model parameters. Since the projection step is performed in closed form for $α$ and $δ$ , the overall per-iteration complexity scales linearly with network size. This makes the framework scalable to large MEC deployments. In terms of energy efficiency, cooperative task processing reduces redundant transmissions and central computations, leading to an estimated $35 %$ decrease in total energy consumption compared to the fully centralized baseline, as verified in our simulations. Hence, the PGD formulation achieves a balanced trade-off between convergence speed, scalability, and energy efficiency for real-time Edge AI.

6.2.7. Implementation

The entire framework is implemented in PyTorch. Both the unsupervised reconstruction/alignment objective and the penalty terms are differentiable tensors. Using autograd, gradients are propagated through all components, enabling end-to-end training of $(α, δ, θ)$ via stochastic PGD updates. The learning component is implemented as a lightweight convolutional neural network (CNN) to ensure compatibility with edge devices. The model consists of two convolutional layers with ReLU activations, followed by two fully connected layers. All the trainable parameters $(θ, a, b)$ are optimized jointly using the Adam optimizer with learning rates $η_{θ} = 10^{- 3}$ , $η_{a} = 5 \times 10^{- 4}$ , and $η_{b} = 5 \times 10^{- 4}$ . The overall optimization minimizes the total loss $L_{total}$ , which combines the unsupervised reconstruction and distribution-alignment terms with capacity and delay penalties as defined in (26). This design enables a stable, end-to-end training process while maintaining low computational overhead suitable for resource-constrained edge environments.

6.2.8. Convergence and Penalty Analysis

The convergence of the proposed PGD scheme can be characterized using standard results from constrained optimization theory. Let the total loss function $f (x) = L_{total}$ be continuously differentiable with an $L$ -Lipschitz gradient, i.e.,

\begin{matrix} ∥ \nabla f (x_{1}) - \nabla f (x_{2}) ∥_{2} \leq L {∥ x_{1} - x_{2} ∥}_{2}, \forall x_{1}, x_{2} \in C . \end{matrix}

(56)

Then, for a fixed learning rate $0 < η < \frac{2}{L}$ , the PGD iteration

\begin{matrix} x^{(k + 1)} = Π_{C} (x^{(k)} - η \nabla f (x^{(k)})), \end{matrix}

(57)

is guaranteed to converge to a first-order stationary point satisfying $〈 \nabla f (x^{★}), z - x^{★} 〉 \geq 0, \forall z \in C$ . In our setting, the feasible set $C$ arises from the capacity and delay constraints, while the sigmoid and softmax relaxations ensure that $(α, δ)$ remain differentiable and bounded during optimization.

Furthermore, the penalty terms in (26) provide a smooth relaxation of the original hard constraints:

\begin{matrix} L_{total} = L_{Task} + λ_{1} P_{capacity} + λ_{2} P_{delay}, \end{matrix}

(58)

where $λ_{1}$ and $λ_{2}$ act as trade-off coefficients. By jointly incorporating both penalty components, the optimizer effectively balances constraint satisfaction and model performance within a unified objective surface, leading to improved numerical stability and faster convergence compared to treating each constraint separately.

7. Performance Evaluation

In this part, we evaluate the performance of the proposed MEC-based Edge AI framework under different system configurations, focusing on the interplay between task offloading, computation capacity, and data heterogeneity. The source code and data is available in [43].

7.1. Task Offloading and Delay

In Figure 4, we obtain the loss value for CS and SEs. As can be seen, as the steps increase, the loss value for both cases decreases pretty quickly. Since the CS has more data samples and higher computing capacity, the loss function falls more quickly.

The loss values for $L_{C S}$ and $L_{s}$ servers.

In Figure 5 we observe that by increasing the number of REs, due to the increase in data volume, the loss decreases. This will contribute in better accuracy as can be seen in Figure 6. note that since our framework follows an unsupervised learning paradigm, the term “accuracy” in all plots refers to clustering accuracy (CA), computed as the optimal label-alignment accuracy between the predicted clusters and ground-truth classes [44].

Loss function for diffrent number of REs.

Accuracy of model vs. the number of REs.

In Figure 6, we illustrate the effect of the number of REs on overall accuracy and as can be seen, more REs will result in an improved accuracy. On the other hand, more REs mean larger delay. As such, a threshold on the delay will impose an upper bound on the number of REs and simultaneously on the accuracy. For example, 30 REs can provide 99% accuracy. On the other hand, this number of REs causes a delay as large as 0.05 s, as can be seen in Figure 7, which is within the allowed value by 3GPP [45,46]. In addition, we set $\hat{C} = 4000$ [CPU cycle/s] for the CS, $C_{s} = 200$ [CPU cycle/s] $\forall s$ , and $D_{r} = 1.5$ [Mega bytes], $\forall r$ .

Effect of number of REs on the total delay in second for the joint scenario (proposed) and disjoint (baseline).

7.2. Comparison with the Baseline Scenario

Now we would like to compare our proposed scheme with a baseline scenario in which the task offloading is performed in a disjoint manner, i.e., each RE is restricted to either the CS or a single SE without coordination. Such a rigid allocation increases the reliance on the CS and leads to inefficient utilization of edge resources. In contrast, our proposed joint framework allows flexible task distribution across CS and SEs, which not only balances the load but also maximizes edge-side computing.

As can be seen in Figure 7, the proposed method causes less delay than the baseline for different values of REs. For example, if the target delay threshold is set to 0.045 s, the proposed scheme can accommodate up to 30 REs, while the baseline could only afford half of this.

To further demonstrate the practical advantage of the proposed cooperative optimization, we compare three deployment modes as only SEs computing, only CS offloading, and the proposed joint scheme with cooperative computing. The results in Figure 8 and Figure 9 show that the cooperative configuration achieves an excellent trade-off and while maintaining an average delay near to the edge-only configuration (0.042 s vs. 0.033 s), it significantly improves clustering average accuracy to 99.2%, surpassing both baselines. These results confirm that the adaptive coordination between SEs and the CS enhances system efficiency, reduces task congestion, and provides stable model performance under heterogeneous network conditions.

Average total delay for the three modes: only SEs, only CS, and the proposed cooperative scheme. The cooperative mode achieves a balanced delay of 0.042 s, lower than the only CS case (0.060 s) and close to only SEs (0.033 s), in the case of 30 REs.

Accuracy comparison for only SEs, only CS, and cooperative modes. The proposed method achieves 99.2% accuracy, outperforming only SEs (85%) and only CS (99.7%), in the case of 30 REs.

In Figure 10, we have compared the proposed method and baseline method in terms of the fraction of tasks that are offloaded on CS. As can be seen, for different values of REs, the load on CS is cut into half when using the proposed framework, which is important from a practical point of view.

Effect of number of REs on the mean of $α$ for the joint scenario (proposed) and disjoint (baseline).

7.3. Comparing Non-i.i.d.-Blind and Non-i.i.d.-Aware Scenarios

In this subsection, we first evaluate the performance of the proposed loss model in mitigating the effect of quantity skew, i.e., non-i.i.d.-aware, as formulated in (18) and (19), against the non-i.i.d.-blind scenario that does not account for it. Quantity skew arises when clients (REs) have highly imbalanced numbers of local samples. Unlike the balanced case, where each client contributes equally, the aggregation here is biased toward clients with larger datasets.

In Figure 11, we evaluate the effect of $ϱ$ on accuracy for both the non-i.i.d.-blind and non-i.i.d.-aware cases. As can be seen, for all the values of $ϱ$ , the proposed scheme improves the accuracy over the baseline scenario.

Effect of quantity skew coefficient $ϱ$ on the total accuracy of model for non-i.i.d.-blind (baseline) and -aware models (proposed), for 30 REs.

Moreover, for very low values of $ϱ$ , corresponding to extremely small sample sizes at the clients, accuracy is less due to insufficient data. As $ϱ$ increases, the number of available samples grows, which enhances performance and leads to higher accuracy up to an optimal point. Beyond this point, however, the variance in the distribution of data across clients becomes significant, introducing instability and larger errors, which ultimately causes accuracy to decrease. Overall, this demonstrates that the proposed scheme effectively mitigates the negative effects of quantity skew and highlights a non-monotonic relationship between $ϱ$ and accuracy.

Finally, we investigate the impact of the correlation threshold $κ$ on overall system accuracy, considering both the coverage overlap and spatial distribution of REs. Our analysis demonstrates that increasing the maximum allowable correlation between mutually visible REs considerably affects the system’s clustering accuracy, as shown in Figure 12. Specifically, when the correlation threshold $κ$ is raised, the system incorporates more highly correlated data from overlapping FoVs, which amplifies the non-i.i.d nature of the collected datasets. This increased correlation leads to model overfitting and reduced generalization capability, ultimately degrading clustering performance.

Effect of correlation on the system accuracy for non-i.i.d.-blind and -aware models.

As far as non-i.i.d.-blind and -aware scenarios are concerned, we can see that the accuracy of the non-i.i.d.-aware case has considerably improved compared to the non-i.i.d.-blind scenario that does not enforce the FoV constraint derived in (22). The difference, in some cases, is above 10% which is very significant in clustering and demonstrates the critical importance of properly regulating angular separation between REs through mathematical formulation of the correlation threshold.

Comparing the last 2 figures, we can see that in contrast to Figure 12, the improvement in accuracy in Figure 11 is not significant. However, it is important to note that for many applications, even a slight improvement in accuracy is critical. For example, in the context of smart manufacturing, this has the potential to reduce miss-clustering in defect detection. Similarly, in the field of healthcare, marginal gains have been shown to enhance diagnostics reliability [47,48].

8. Conclusions

This work presented a cooperative framework for partial task offloading in Edge-AI systems, leveraging stochastic geometry to capture spatial randomness and to guide correlation-aware resource allocation. A key methodological contribution was the derivation of a closed-form upper bound on spatial correlation, which enabled constraints ensuring that only relevant and timely contributions are included in the global model. We further formulated and solved a joint optimization problem over task assignments, offloading ratios, and learning parameters, and proposed a practical PGD method. Through feasibility analysis and simulations, we demonstrated that the framework effectively addresses the challenges of latency guarantees, accuracy, and scalability in heterogeneous MEC environments. In particular, the results confirm that explicitly handling non-i.i.d. data distributions and spatial correlations leads to superior performance compared to baseline models where learning and resource allocation are decoupled.

Appendix A

We assume that the class distribution for each RE r at time t satisfies the normalization condition:

\begin{matrix} \sum_{k} {\hat{P}}_{r, k}^{(t)} = \sum_{k} P_{r, k}^{(t)} = 1, \forall t, \forall r . \end{matrix}

(A1)

Assume that at time slot $t - 1$ , only $K^{'} < K$ classes have been observed (i.e., $P_{r, k}^{(t - 1)} > 0$ ), while the remaining $K - K^{'}$ classes are unseen. To model the evolution of the class distribution over time while maintaining the normalization constraint, we define the estimated distribution at time t as follows:

\begin{matrix} \sum_{k} {\hat{P}}_{r, k}^{(t)} = e^{- κ_{time}} \underset{= 1}{\underset{︸}{(\sum_{k = 1}^{K^{'}} P_{r, k}^{(t - 1)})}} + (K - K^{'}) ϵ = 1, \forall t, \forall r, \end{matrix}

(A2)

where $κ_{time}$ is a decay parameter controlling the memory of previous distributions, and $ϵ$ is obtained by $ϵ = \frac{1 - e^{- κ_{time}}}{K - K^{'}} .$

Appendix B

To justify the validity and dimensional consistency of (22), this appendix provides the derivation of the spatial–directional correlation function used in the paper. We consider a homogeneous field of REs distributed according to a Poisson process within a circular region of radius L. Each RE observes its environment within a FoV characterized by an angle $ϕ_{r}$ and a total width $\tilde{φ}$ . Let two REs, located at $l_{r}$ and $l_{r^{'}}$ , have an angular separation $Δ ϕ = | ϕ_{r} - ϕ_{r^{'}} |$ and Euclidean distance $d_{r, r^{'}} = ∥ l_{r} - l_{r^{'}} ∥$ .

Following isotropic Gaussian field models in spatial statistics [49,50] and stochastic geometry [51], the pairwise correlation between these REs can be expressed as

\begin{matrix} C (d_{r, r^{'}}, Δ ϕ) = exp (- \frac{d_{r, r^{'}}^{2}}{2 ℓ_{s}^{2}}) exp (- \frac{1 - cos Δ ϕ}{2 σ_{ϕ}^{2}}), \end{matrix}

(A3)

where $ℓ_{s}$ denotes the spatial correlation length and $σ_{ϕ}$ the angular correlation parameter, both ensuring dimensional consistency.

For uniformly distributed REs, the average correlation over spatial and angular randomness can be written as

\begin{matrix} \bar{C} = \frac{1}{L^{2} \tilde{φ}} \int_{0}^{\tilde{φ}} \int_{0}^{L} \int_{0}^{L} C (d_{r, r^{'}}, Δ ϕ) d l_{1} d l_{2} d Δ ϕ . \end{matrix}

(A4)

While the above integral does not admit a closed-form solution, a tractable and physically meaningful approximation can be obtained by replacing $d_{r, r^{'}}^{2}$ with its expected value under the uniform distribution, $E [d_{r, r^{'}}^{2}] = L^{2} / 3$ . This substitution gives

\begin{matrix} C (Δ ϕ) \approx κ_{s} \cdot exp (- \frac{1 - cos Δ ϕ}{2 σ_{ϕ}^{2}}), κ_{s} = exp (- \frac{L^{2}}{6 ℓ_{s}^{2}}), \end{matrix}

(A5)

which depends only on the normalized spatial scale $L / ℓ_{s}$ and the angular separation $Δ ϕ$ . This simplification yields a closed-form, dimensionless expression that is well suited for system-level design.

To ensure that the correlation between any two REs remains below a desired threshold $κ \in (0, 1]$ , we impose $C (Δ ϕ) \leq κ$ . Using (A5) and solving for $Δ ϕ$ yields

\begin{matrix} Δ ϕ^{*} = arccos (1 - 2 σ_{ϕ}^{2} ln (\frac{κ_{s}}{κ})) . \end{matrix}

(A6)

Therefore, to suppress excessive task similarity across users, the FoV width should satisfy

\begin{matrix} \tilde{φ} \leq Δ ϕ^{*} = arccos (1 - 2 σ_{ϕ}^{2} ln (\frac{exp (- L^{2} / 6 ℓ_{s}^{2})}{κ})) . \end{matrix}

(A7)

This condition provides a physically interpretable trade-off between sensing range, allowable correlation, and FoV overlap. The approximation has been numerically validated and shown to remain within a small deviation of the full triple integral (A4), confirming its suitability for system-level design and analysis.

Author Contributions

Conceptualization, A.N.; Methodology, A.N.; Software, A.N.; Validation, H.S.; Formal analysis, H.S.; Investigation, A.N.; Resources, H.S.; Writing—original draft, A.N.; Writing—review & editing, H.S.; Visualization, H.S. and A.N.; Supervision, H.S.; Project administration, H.S.; Funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code and data of this paper can be found in [43].

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This work was supported by Qatar Research Development and Innovation Council (QRDI) under grant ARG01-0511-230129.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1.Ficili I., Giacobbe M., Tricomi G., Puliafito A. From sensors to data intelligence: Leveraging IoT, cloud, and edge computing with AI. Sensors. 2025;25:1763. doi: 10.3390/s25061763. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Bourechak A., Zedadra O., Kouahla M.N., Guerrieri A., Seridi H., Fortino G. At the confluence of artificial intelligence and edge computing in IoT-based applications: A review and new perspectives. Sensors. 2023;23:1639. doi: 10.3390/s23031639. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.IEEE Future Networks. Edge Platforms and Services Evolving into 2030. 2021. [(accessed on 23 July 2025)]. Available online: https://futurenetworks.ieee.org/podcasts/edge-platforms-and-services-evolving-into-2030.
4.Grand View Research. Edge AI Market Size, Share & Growth|Industry Report, 2030. 2024. [(accessed on 23 July 2025)]. Available online: https://www.grandviewresearch.com/industry-analysis/edge-ai-market-report.
5.MarketsandMarkets Edge AI Hardware Industry Worth $58.90 Billion by 2030. 2025. [(accessed on 23 July 2025)]. Available online: https://www.marketsandmarkets.com/PressReleases/edge-ai-hardware.asp.
6.Yang J., Chen Y., Lin Z., Tian D., Chen P. Distributed Computation Offloading in Autonomous Driving Vehicular Networks: A Stochastic Geometry Approach. IEEE Trans. Intell. Veh. 2024;9:2701–2713. doi: 10.1109/TIV.2023.3290369. [DOI] [Google Scholar]
7.Gu Y., Yao Y., Li C., Xia B., Xu D., Zhang C. Modeling and Analysis of Stochastic Mobile-Edge Computing Wireless Networks. IEEE Internet Things J. 2021;8:14051–14065. doi: 10.1109/JIOT.2021.3068382. [DOI] [Google Scholar]
8.Tran D.A., Do T.T., Zhang T. A stochastic geo-partitioning problem for mobile edge computing. IEEE Trans. Emerg. Top. Comput. 2020;9:2189–2200. doi: 10.1109/TETC.2020.2978229. [DOI] [Google Scholar]
9.Hmamouche Y., Benjillali M., Saoudi S., Yanikomeroglu H., Renzo M.D. New Trends in Stochastic Geometry for Wireless Networks: A Tutorial and Survey. Proc. IEEE. 2021;109:1200–1252. doi: 10.1109/JPROC.2021.3061778. [DOI] [Google Scholar]
10.Zhang Y., Chen G., Du H., Yuan X., Kadoch M., Cheriet M. Real-time remote health monitoring system driven by 5G MEC-IoT. Electronics. 2020;9:1753. doi: 10.3390/electronics9111753. [DOI] [Google Scholar]
11.Li T., Sahu A.K., Talwalkar A., Smith V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020;37:50–60. doi: 10.1109/MSP.2020.2975749. [DOI] [Google Scholar]
12.Solans D., Heikkila M., Vitaletti A., Kourtellis N., Anagnostopoulos A., Chatzigiannakis I. Non-i.i.d data in federated learning: A survey with taxonomy, metrics, methods, frameworks and future directions. arXiv. 20242411.12377 [Google Scholar]
13.Lu Z., Pan H., Dai Y., Si X., Zhang Y. Federated learning with non-iid data: A survey. IEEE Internet Things J. 2024;11:19188–19209. doi: 10.1109/JIOT.2024.3376548. [DOI] [Google Scholar]
14.Jimenez-Gutierrez D.M., Hassanzadeh M., Anagnostopoulos A., Chatzigiannakis I., Vitaletti A. A thorough assessment of the non-iid data impact in federated learning. arXiv. 2025 doi: 10.48550/arXiv.2503.17070.2503.17070 [DOI] [Google Scholar]
15.Su W., Li L., Liu F., He M., Liang X. AI on the edge: A comprehensive review. Artif. Intell. Rev. 2022;55:6125–6183. doi: 10.1007/s10462-022-10141-4. [DOI] [Google Scholar]
16.Shi Y., Yang K., Jiang T., Zhang J., Letaief K.B. Communication-efficient edge AI: Algorithms and systems. IEEE Commun. Surv. Tutor. 2020;22:2167–2191. doi: 10.1109/COMST.2020.3007787. [DOI] [Google Scholar]
17.Letaief K.B., Shi Y., Lu J., Lu J. Edge artificial intelligence for 6G: Vision, enabling technologies, and applications. IEEE J. Sel. Areas Commun. 2021;40:5–36. doi: 10.1109/JSAC.2021.3126076. [DOI] [Google Scholar]
18.Lin F.P.C., Hosseinalipour S., Michelusi N., Brinton C.G. Delay-aware hierarchical federated learning. IEEE Trans. Cogn. Commun. Netw. 2023;10:674–688. doi: 10.1109/TCCN.2023.3329024. [DOI] [Google Scholar]
19.Wang S., Tuor T., Salonidis T., Leung K.K., Makaya C., He T., Chan K. Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 2019;37:1205–1221. doi: 10.1109/JSAC.2019.2904348. [DOI] [Google Scholar]
20.Xiao H., Xu C., Ma Y., Yang S., Zhong L., Muntean G.M. Edge intelligence: A computational task offloading scheme for dependent IoT application. IEEE Trans. Wirel. Commun. 2022;21:7222–7237. doi: 10.1109/TWC.2022.3156905. [DOI] [Google Scholar]
21.Qiao D., Guo S., Zhao J., Le J., Zhou P., Li M., Chen X. ASMAFL: Adaptive staleness-aware momentum asynchronous federated learning in edge computing. IEEE Trans. Mob. Comput. 2024;24:3390–3406. doi: 10.1109/TMC.2024.3510135. [DOI] [Google Scholar]
22.Fan W., Chen Z., Hao Z., Wu F., Liu Y. Joint Task Offloading and Resource Allocation for Quality-Aware Edge-Assisted Machine Learning Task Inference. IEEE Trans. Veh. Technol. 2023;72:6739–6752. doi: 10.1109/TVT.2023.3235520. [DOI] [Google Scholar]
23.Fan W., Li S., Liu J., Su Y., Wu F., Liu Y. Joint Task Offloading and Resource Allocation for Accuracy-Aware Machine-Learning-Based IIoT Applications. IEEE Internet Things J. 2023;10:3305–3321. doi: 10.1109/JIOT.2022.3181990. [DOI] [Google Scholar]
24.Khalili A., Zarandi S., Rasti M. Joint Resource Allocation and Offloading Decision in Mobile Edge Computing. IEEE Commun. Lett. 2019;23:684–687. doi: 10.1109/LCOMM.2019.2897008. [DOI] [Google Scholar]
25.Kuang Z., Li L., Gao J., Zhao L., Liu A. Partial Offloading Scheduling and Power Allocation for Mobile Edge Computing Systems. IEEE Internet Things J. 2019;6:6774–6785. doi: 10.1109/JIOT.2019.2911455. [DOI] [Google Scholar]
26.Zhang S., Gu H., Chi K., Huang L., Yu K., Mumtaz S. DRL-Based Partial Offloading for Maximizing Sum Computation Rate of Wireless Powered Mobile Edge Computing Network. IEEE Trans. Wirel. Commun. 2022;21:10934–10948. doi: 10.1109/TWC.2022.3188302. [DOI] [Google Scholar]
27.Malik U.M., Javed M.A., Frnda J., Rozhon J., Khan W.U. Efficient matching-based parallel task offloading in iot networks. Sensors. 2022;22:6906. doi: 10.3390/s22186906. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Bolat Y., Murray I., Ren Y., Ferdosian N. Decentralized Distributed Sequential Neural Networks Inference on Low-Power Microcontrollers in Wireless Sensor Networks: A Predictive Maintenance Case Study. Sensors. 2025;25:4595. doi: 10.3390/s25154595. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Zhao Y., Li M., Lai L., Suda N., Civin D., Chandra V. Federated learning with non-i.i.d data. arXiv. 20181806.00582 [Google Scholar]
30.Lai P., He Q., Xia X., Chen F., Abdelrazek M., Grundy J., Hosking J., Yang Y. Dynamic User Allocation in Stochastic Mobile Edge Computing Systems. IEEE Trans. Serv. Comput. 2022;15:2699–2712. doi: 10.1109/TSC.2021.3063148. [DOI] [Google Scholar]
31.Lyu Z., Xiao M., Xu J., Skoglund M., Di Renzo M. The larger the merrier? Efficient large AI model inference in wireless edge networks. arXiv. 2025 doi: 10.48550/arXiv.2505.09214.2505.09214 [DOI] [Google Scholar]
32.Wu Y., Zheng J. Modeling and Analysis of the Uplink Local Delay in MEC-Based VANETs. IEEE Trans. Veh. Technol. 2020;69:3538–3549. doi: 10.1109/TVT.2020.2970551. [DOI] [Google Scholar]
33.Wu Y., Zheng J. Modeling and Analysis of the Local Delay in an MEC-Based VANET for a Suburban Area. IEEE Internet Things J. 2022;9:7065–7079. doi: 10.1109/JIOT.2021.3116195. [DOI] [Google Scholar]
34.Cheng Q., Cai G., He J., Kaddoum G. Design and Performance Analysis of MEC-Aided LoRa Networks with Power Control. IEEE Trans. Veh. Technol. 2025;74:1597–1609. doi: 10.1109/TVT.2024.3459046. [DOI] [Google Scholar]
35.Dhillon H.S., Ganti R.K., Baccelli F., Andrews J.G. Modeling and Analysis of K-Tier Downlink Heterogeneous Cellular Networks. IEEE J. Sel. Areas Commun. 2012;30:550–560. doi: 10.1109/JSAC.2012.120405. [DOI] [Google Scholar]
36.Andrews J.G., Baccelli F., Ganti R.K. A Tractable Approach to Coverage and Rate in Cellular Networks. IEEE Trans. Commun. 2011;59:3122–3134. doi: 10.1109/TCOMM.2011.100411.100541. [DOI] [Google Scholar]
37.Savi M., Tornatore M., Verticale G. Impact of processing-resource sharing on the placement of chained virtual network functions. IEEE Trans. Cloud Comput. 2019;9:1479–1492. doi: 10.1109/TCC.2019.2914387. [DOI] [Google Scholar]
38.Mu Y., Garg N., Ratnarajah T. Federated learning in massive MIMO 6G networks: Convergence analysis and communication-efficient design. IEEE Trans. Netw. Sci. Eng. 2022;9:4220–4234. doi: 10.1109/TNSE.2022.3196463. [DOI] [Google Scholar]
39.Wang H., Kaplan Z., Niu D., Li B. Optimizing federated learning on non-i.i.d data with reinforcement learning; Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications; Toronto, ON, Canada. 6–9 July 2020; pp. 1698–1707. [Google Scholar]
40.He C., Annavaram M., Avestimehr S. Group knowledge transfer: Federated learning of large cnns at the edge. Adv. Neural Inf. Process. Syst. 2020;33:14068–14080. [Google Scholar]
41.Deng L. The MNIST database of handwritten digit images for machine learning research [best of the web] IEEE Signal Process. Mag. 2012;29:141–142. doi: 10.1109/MSP.2012.2211477. [DOI] [Google Scholar]
42.MNIST Dataset on Papers with Code. [(accessed on 23 July 2025)]. Available online: https://www.kaggle.com/datasets/hojjatk/mnist-dataset.
43.Saeedi H., Nouruzi A. Stochastic-Geometric-based Modeling for Partial Offloading Task Computing in Edge AI Systems. GitHub Repository. [(accessed on 21 September 2025)]. Available online: https://github.com/alinouruzi/Stochastic-Geometric-based-Modeling-for-Partial-Offloading-Task-Computing-in-Edge-AI-Systems. [DOI] [PubMed]
44.Xie J., Girshick R., Farhadi A. Unsupervised deep embedding for clustering analysis; Proceedings of the International Conference on Machine Learning, PMLR; New York, NY, USA. 16–24 June 2016; pp. 478–487. [Google Scholar]
45.3GPP. 5G; Service Requirements for the 5G System (3GPP TS 22.261 Version 16.14.0 Release 16). Technical Specification ETSI TS 122 261 V16.14.0, ETSI. 2021. [(accessed on 29 July 2025)]. Available online: https://www.etsi.org/deliver/etsi_ts/122200_122299/122261/16.14.00_60/ts_122261v161400p.pdf.
46.5G Hub. 5G Ultra Reliable Low Latency Communication (URLLC) 2023. [(accessed on 29 July 2025)]. Available online: https://5ghub.us/5g-ultra-reliable-low-latency-communication-urllc/
47.Wang Q., Haga Y. Research on Structure Optimization and Accuracy Improvement of Key Components of Medical Device Robot; Proceedings of the 2024 International Conference on Telecommunications and Power Electronics (TELEPE); Frankfurt, Germany. 29–31 May 2024; pp. 809–813. [DOI] [Google Scholar]
48.Zrubka Z., Holgyesi A., Neshat M., Nezhad H.M., Mirjalili S., Kovács L., Péntek M., Gulácsi L. Towards a single goodness metric of clinically relevant, accurate, fair and unbiased machine learning predictions of health-related quality of life; Proceedings of the 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES); Nairobi, Kenya. 26–28 July 2023; pp. 000285–000290. [DOI] [Google Scholar]
49.Cressie N. Statistics for Spatial Data. John Wiley & Sons; Hoboken, NJ, USA: 2015. [Google Scholar]
50.Dai R., Akyildiz I.F. A spatial correlation model for visual information in wireless multimedia sensor networks. IEEE Trans. Multimed. 2009;11:1148–1159. doi: 10.1109/TMM.2009.2026100. [DOI] [Google Scholar]
51.Haenggi M. Stochastic Geometry for Wireless Networks. Cambridge University Press; Cambridge, UK: 2013. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The source code and data of this paper can be found in [43].

[B1-sensors-25-06892] 1.Ficili I., Giacobbe M., Tricomi G., Puliafito A. From sensors to data intelligence: Leveraging IoT, cloud, and edge computing with AI. Sensors. 2025;25:1763. doi: 10.3390/s25061763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2-sensors-25-06892] 2.Bourechak A., Zedadra O., Kouahla M.N., Guerrieri A., Seridi H., Fortino G. At the confluence of artificial intelligence and edge computing in IoT-based applications: A review and new perspectives. Sensors. 2023;23:1639. doi: 10.3390/s23031639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3-sensors-25-06892] 3.IEEE Future Networks. Edge Platforms and Services Evolving into 2030. 2021. [(accessed on 23 July 2025)]. Available online: https://futurenetworks.ieee.org/podcasts/edge-platforms-and-services-evolving-into-2030.

[B4-sensors-25-06892] 4.Grand View Research. Edge AI Market Size, Share & Growth|Industry Report, 2030. 2024. [(accessed on 23 July 2025)]. Available online: https://www.grandviewresearch.com/industry-analysis/edge-ai-market-report.

[B5-sensors-25-06892] 5.MarketsandMarkets Edge AI Hardware Industry Worth $58.90 Billion by 2030. 2025. [(accessed on 23 July 2025)]. Available online: https://www.marketsandmarkets.com/PressReleases/edge-ai-hardware.asp.

[B6-sensors-25-06892] 6.Yang J., Chen Y., Lin Z., Tian D., Chen P. Distributed Computation Offloading in Autonomous Driving Vehicular Networks: A Stochastic Geometry Approach. IEEE Trans. Intell. Veh. 2024;9:2701–2713. doi: 10.1109/TIV.2023.3290369. [DOI] [Google Scholar]

[B7-sensors-25-06892] 7.Gu Y., Yao Y., Li C., Xia B., Xu D., Zhang C. Modeling and Analysis of Stochastic Mobile-Edge Computing Wireless Networks. IEEE Internet Things J. 2021;8:14051–14065. doi: 10.1109/JIOT.2021.3068382. [DOI] [Google Scholar]

[B8-sensors-25-06892] 8.Tran D.A., Do T.T., Zhang T. A stochastic geo-partitioning problem for mobile edge computing. IEEE Trans. Emerg. Top. Comput. 2020;9:2189–2200. doi: 10.1109/TETC.2020.2978229. [DOI] [Google Scholar]

[B9-sensors-25-06892] 9.Hmamouche Y., Benjillali M., Saoudi S., Yanikomeroglu H., Renzo M.D. New Trends in Stochastic Geometry for Wireless Networks: A Tutorial and Survey. Proc. IEEE. 2021;109:1200–1252. doi: 10.1109/JPROC.2021.3061778. [DOI] [Google Scholar]

[B10-sensors-25-06892] 10.Zhang Y., Chen G., Du H., Yuan X., Kadoch M., Cheriet M. Real-time remote health monitoring system driven by 5G MEC-IoT. Electronics. 2020;9:1753. doi: 10.3390/electronics9111753. [DOI] [Google Scholar]

[B11-sensors-25-06892] 11.Li T., Sahu A.K., Talwalkar A., Smith V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020;37:50–60. doi: 10.1109/MSP.2020.2975749. [DOI] [Google Scholar]

[B12-sensors-25-06892] 12.Solans D., Heikkila M., Vitaletti A., Kourtellis N., Anagnostopoulos A., Chatzigiannakis I. Non-i.i.d data in federated learning: A survey with taxonomy, metrics, methods, frameworks and future directions. arXiv. 20242411.12377 [Google Scholar]

[B13-sensors-25-06892] 13.Lu Z., Pan H., Dai Y., Si X., Zhang Y. Federated learning with non-iid data: A survey. IEEE Internet Things J. 2024;11:19188–19209. doi: 10.1109/JIOT.2024.3376548. [DOI] [Google Scholar]

[B14-sensors-25-06892] 14.Jimenez-Gutierrez D.M., Hassanzadeh M., Anagnostopoulos A., Chatzigiannakis I., Vitaletti A. A thorough assessment of the non-iid data impact in federated learning. arXiv. 2025 doi: 10.48550/arXiv.2503.17070.2503.17070 [DOI] [Google Scholar]

[B15-sensors-25-06892] 15.Su W., Li L., Liu F., He M., Liang X. AI on the edge: A comprehensive review. Artif. Intell. Rev. 2022;55:6125–6183. doi: 10.1007/s10462-022-10141-4. [DOI] [Google Scholar]

[B16-sensors-25-06892] 16.Shi Y., Yang K., Jiang T., Zhang J., Letaief K.B. Communication-efficient edge AI: Algorithms and systems. IEEE Commun. Surv. Tutor. 2020;22:2167–2191. doi: 10.1109/COMST.2020.3007787. [DOI] [Google Scholar]

[B17-sensors-25-06892] 17.Letaief K.B., Shi Y., Lu J., Lu J. Edge artificial intelligence for 6G: Vision, enabling technologies, and applications. IEEE J. Sel. Areas Commun. 2021;40:5–36. doi: 10.1109/JSAC.2021.3126076. [DOI] [Google Scholar]

[B18-sensors-25-06892] 18.Lin F.P.C., Hosseinalipour S., Michelusi N., Brinton C.G. Delay-aware hierarchical federated learning. IEEE Trans. Cogn. Commun. Netw. 2023;10:674–688. doi: 10.1109/TCCN.2023.3329024. [DOI] [Google Scholar]

[B19-sensors-25-06892] 19.Wang S., Tuor T., Salonidis T., Leung K.K., Makaya C., He T., Chan K. Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 2019;37:1205–1221. doi: 10.1109/JSAC.2019.2904348. [DOI] [Google Scholar]

[B20-sensors-25-06892] 20.Xiao H., Xu C., Ma Y., Yang S., Zhong L., Muntean G.M. Edge intelligence: A computational task offloading scheme for dependent IoT application. IEEE Trans. Wirel. Commun. 2022;21:7222–7237. doi: 10.1109/TWC.2022.3156905. [DOI] [Google Scholar]

[B21-sensors-25-06892] 21.Qiao D., Guo S., Zhao J., Le J., Zhou P., Li M., Chen X. ASMAFL: Adaptive staleness-aware momentum asynchronous federated learning in edge computing. IEEE Trans. Mob. Comput. 2024;24:3390–3406. doi: 10.1109/TMC.2024.3510135. [DOI] [Google Scholar]

[B22-sensors-25-06892] 22.Fan W., Chen Z., Hao Z., Wu F., Liu Y. Joint Task Offloading and Resource Allocation for Quality-Aware Edge-Assisted Machine Learning Task Inference. IEEE Trans. Veh. Technol. 2023;72:6739–6752. doi: 10.1109/TVT.2023.3235520. [DOI] [Google Scholar]

[B23-sensors-25-06892] 23.Fan W., Li S., Liu J., Su Y., Wu F., Liu Y. Joint Task Offloading and Resource Allocation for Accuracy-Aware Machine-Learning-Based IIoT Applications. IEEE Internet Things J. 2023;10:3305–3321. doi: 10.1109/JIOT.2022.3181990. [DOI] [Google Scholar]

[B24-sensors-25-06892] 24.Khalili A., Zarandi S., Rasti M. Joint Resource Allocation and Offloading Decision in Mobile Edge Computing. IEEE Commun. Lett. 2019;23:684–687. doi: 10.1109/LCOMM.2019.2897008. [DOI] [Google Scholar]

[B25-sensors-25-06892] 25.Kuang Z., Li L., Gao J., Zhao L., Liu A. Partial Offloading Scheduling and Power Allocation for Mobile Edge Computing Systems. IEEE Internet Things J. 2019;6:6774–6785. doi: 10.1109/JIOT.2019.2911455. [DOI] [Google Scholar]

[B26-sensors-25-06892] 26.Zhang S., Gu H., Chi K., Huang L., Yu K., Mumtaz S. DRL-Based Partial Offloading for Maximizing Sum Computation Rate of Wireless Powered Mobile Edge Computing Network. IEEE Trans. Wirel. Commun. 2022;21:10934–10948. doi: 10.1109/TWC.2022.3188302. [DOI] [Google Scholar]

[B27-sensors-25-06892] 27.Malik U.M., Javed M.A., Frnda J., Rozhon J., Khan W.U. Efficient matching-based parallel task offloading in iot networks. Sensors. 2022;22:6906. doi: 10.3390/s22186906. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28-sensors-25-06892] 28.Bolat Y., Murray I., Ren Y., Ferdosian N. Decentralized Distributed Sequential Neural Networks Inference on Low-Power Microcontrollers in Wireless Sensor Networks: A Predictive Maintenance Case Study. Sensors. 2025;25:4595. doi: 10.3390/s25154595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29-sensors-25-06892] 29.Zhao Y., Li M., Lai L., Suda N., Civin D., Chandra V. Federated learning with non-i.i.d data. arXiv. 20181806.00582 [Google Scholar]

[B30-sensors-25-06892] 30.Lai P., He Q., Xia X., Chen F., Abdelrazek M., Grundy J., Hosking J., Yang Y. Dynamic User Allocation in Stochastic Mobile Edge Computing Systems. IEEE Trans. Serv. Comput. 2022;15:2699–2712. doi: 10.1109/TSC.2021.3063148. [DOI] [Google Scholar]

[B31-sensors-25-06892] 31.Lyu Z., Xiao M., Xu J., Skoglund M., Di Renzo M. The larger the merrier? Efficient large AI model inference in wireless edge networks. arXiv. 2025 doi: 10.48550/arXiv.2505.09214.2505.09214 [DOI] [Google Scholar]

[B32-sensors-25-06892] 32.Wu Y., Zheng J. Modeling and Analysis of the Uplink Local Delay in MEC-Based VANETs. IEEE Trans. Veh. Technol. 2020;69:3538–3549. doi: 10.1109/TVT.2020.2970551. [DOI] [Google Scholar]

[B33-sensors-25-06892] 33.Wu Y., Zheng J. Modeling and Analysis of the Local Delay in an MEC-Based VANET for a Suburban Area. IEEE Internet Things J. 2022;9:7065–7079. doi: 10.1109/JIOT.2021.3116195. [DOI] [Google Scholar]

[B34-sensors-25-06892] 34.Cheng Q., Cai G., He J., Kaddoum G. Design and Performance Analysis of MEC-Aided LoRa Networks with Power Control. IEEE Trans. Veh. Technol. 2025;74:1597–1609. doi: 10.1109/TVT.2024.3459046. [DOI] [Google Scholar]

[B35-sensors-25-06892] 35.Dhillon H.S., Ganti R.K., Baccelli F., Andrews J.G. Modeling and Analysis of K-Tier Downlink Heterogeneous Cellular Networks. IEEE J. Sel. Areas Commun. 2012;30:550–560. doi: 10.1109/JSAC.2012.120405. [DOI] [Google Scholar]

[B36-sensors-25-06892] 36.Andrews J.G., Baccelli F., Ganti R.K. A Tractable Approach to Coverage and Rate in Cellular Networks. IEEE Trans. Commun. 2011;59:3122–3134. doi: 10.1109/TCOMM.2011.100411.100541. [DOI] [Google Scholar]

[B37-sensors-25-06892] 37.Savi M., Tornatore M., Verticale G. Impact of processing-resource sharing on the placement of chained virtual network functions. IEEE Trans. Cloud Comput. 2019;9:1479–1492. doi: 10.1109/TCC.2019.2914387. [DOI] [Google Scholar]

[B38-sensors-25-06892] 38.Mu Y., Garg N., Ratnarajah T. Federated learning in massive MIMO 6G networks: Convergence analysis and communication-efficient design. IEEE Trans. Netw. Sci. Eng. 2022;9:4220–4234. doi: 10.1109/TNSE.2022.3196463. [DOI] [Google Scholar]

[B39-sensors-25-06892] 39.Wang H., Kaplan Z., Niu D., Li B. Optimizing federated learning on non-i.i.d data with reinforcement learning; Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications; Toronto, ON, Canada. 6–9 July 2020; pp. 1698–1707. [Google Scholar]

[B40-sensors-25-06892] 40.He C., Annavaram M., Avestimehr S. Group knowledge transfer: Federated learning of large cnns at the edge. Adv. Neural Inf. Process. Syst. 2020;33:14068–14080. [Google Scholar]

[B41-sensors-25-06892] 41.Deng L. The MNIST database of handwritten digit images for machine learning research [best of the web] IEEE Signal Process. Mag. 2012;29:141–142. doi: 10.1109/MSP.2012.2211477. [DOI] [Google Scholar]

[B42-sensors-25-06892] 42.MNIST Dataset on Papers with Code. [(accessed on 23 July 2025)]. Available online: https://www.kaggle.com/datasets/hojjatk/mnist-dataset.

[B43-sensors-25-06892] 43.Saeedi H., Nouruzi A. Stochastic-Geometric-based Modeling for Partial Offloading Task Computing in Edge AI Systems. GitHub Repository. [(accessed on 21 September 2025)]. Available online: https://github.com/alinouruzi/Stochastic-Geometric-based-Modeling-for-Partial-Offloading-Task-Computing-in-Edge-AI-Systems. [DOI] [PubMed]

[B44-sensors-25-06892] 44.Xie J., Girshick R., Farhadi A. Unsupervised deep embedding for clustering analysis; Proceedings of the International Conference on Machine Learning, PMLR; New York, NY, USA. 16–24 June 2016; pp. 478–487. [Google Scholar]

[B45-sensors-25-06892] 45.3GPP. 5G; Service Requirements for the 5G System (3GPP TS 22.261 Version 16.14.0 Release 16). Technical Specification ETSI TS 122 261 V16.14.0, ETSI. 2021. [(accessed on 29 July 2025)]. Available online: https://www.etsi.org/deliver/etsi_ts/122200_122299/122261/16.14.00_60/ts_122261v161400p.pdf.

[B46-sensors-25-06892] 46.5G Hub. 5G Ultra Reliable Low Latency Communication (URLLC) 2023. [(accessed on 29 July 2025)]. Available online: https://5ghub.us/5g-ultra-reliable-low-latency-communication-urllc/

[B47-sensors-25-06892] 47.Wang Q., Haga Y. Research on Structure Optimization and Accuracy Improvement of Key Components of Medical Device Robot; Proceedings of the 2024 International Conference on Telecommunications and Power Electronics (TELEPE); Frankfurt, Germany. 29–31 May 2024; pp. 809–813. [DOI] [Google Scholar]

[B48-sensors-25-06892] 48.Zrubka Z., Holgyesi A., Neshat M., Nezhad H.M., Mirjalili S., Kovács L., Péntek M., Gulácsi L. Towards a single goodness metric of clinically relevant, accurate, fair and unbiased machine learning predictions of health-related quality of life; Proceedings of the 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES); Nairobi, Kenya. 26–28 July 2023; pp. 000285–000290. [DOI] [Google Scholar]

[B49-sensors-25-06892] 49.Cressie N. Statistics for Spatial Data. John Wiley & Sons; Hoboken, NJ, USA: 2015. [Google Scholar]

[B50-sensors-25-06892] 50.Dai R., Akyildiz I.F. A spatial correlation model for visual information in wireless multimedia sensor networks. IEEE Trans. Multimed. 2009;11:1148–1159. doi: 10.1109/TMM.2009.2026100. [DOI] [Google Scholar]

[B51-sensors-25-06892] 51.Haenggi M. Stochastic Geometry for Wireless Networks. Cambridge University Press; Cambridge, UK: 2013. [Google Scholar]

PERMALINK

Stochastic Geometric-Based Modeling for Partial Offloading Task Computing in Edge-AI Systems

Hamid Saeedi

Ali Nouruzi

Roles

Abstract

1. Introduction

1.1. State-of-the-Art and Motivations

1.2. Contributions

2. Related Works

Table 1.

3. System Model and Parameters

Figure 1.

Table 2.

4. The Proposed Problem

4.1. Constraints

4.2. Data Freshness and Distribution-Aware Loss Modeling

Figure 2.

4.3. Problem Formulation

5. Feasibility Analysis

Figure 3.

6. Solution Method

6.1. Dataset Description

6.2. Proposed Solution

6.2.1. Joint and Disjoint Optimization Protocols

6.2.2. Assignment Relaxation

6.2.3. Offloading Ratio Relaxation

6.2.4. Joint Optimization

6.2.5. PGD Mathematical Details

6.2.6. Computational Complexity and Scalability Discussion

6.2.7. Implementation

6.2.8. Convergence and Penalty Analysis

7. Performance Evaluation

7.1. Task Offloading and Delay

Figure 4.

Figure 5.

Figure 6.

Figure 7.

7.2. Comparison with the Baseline Scenario

Figure 8.

Figure 9.

Figure 10.

7.3. Comparing Non-i.i.d.-Blind and Non-i.i.d.-Aware Scenarios

Figure 11.

Figure 12.

8. Conclusions

Appendix A

Appendix B

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases