Skip to main content
Sensors (Basel, Switzerland) logoLink to Sensors (Basel, Switzerland)
. 2025 Nov 12;25(22):6892. doi: 10.3390/s25226892

Stochastic Geometric-Based Modeling for Partial Offloading Task Computing in Edge-AI Systems

Hamid Saeedi 1,*, Ali Nouruzi 1
PMCID: PMC12656120  PMID: 41305100

Abstract

This paper proposes a cooperative framework for resource allocation in multi-access edge computing (MEC) under a partial task offloading setting, addressing the joint challenges of learning performance and system efficiency in heterogeneous edge environments. In the proposed architecture, selected users act as edge servers (SEs) that collaboratively assist others alongside a central server (CS). A joint optimization problem is formulated to integrate model training with resource allocation while accounting for data freshness and spatial correlation among user tasks. The correlation-aware formulation penalizes outdated and redundant data, leading to improved robustness against non-i.i.d. distributions. To solve the NP-hard problem efficiently, a projected gradient descent (PGD) method is developed. The simulation results demonstrate that the proposed cooperative approach achieves a balanced delay of 0.042 s, close to edge-only computing (0.033 s) and 30% lower than the CS-only mode, while improving clustering accuracy to 99.2% (up to 15% higher than the baseline). Moreover, it reduces the central server load by nearly half, ensuring scalability and latency compliance within 3GPP limits. These findings confirm that cooperation between SEs and the CS substantially enhances reliability and performance in distributed Edge-AI system.

Keywords: resource allocation, multi-access edge computing, edge AI, partial offloading

1. Introduction

1.1. State-of-the-Art and Motivations

Recent advances in artificial intelligence (AI) have sparked innovation in various technological domains. Most conventional AI solutions rely on centralized computing architectures that depend on large-scale data collection. While these centralized models offer powerful learning, perception, and decision-making capabilities, they have their own limitations in dynamic and uncertain environments. Transferring data to a central server and accumulating it for computing is time-consuming and costly. Additionally, providing adequate computing resources on a central server is challenging, especially under high demand or time-sensitive conditions. To address these challenges, researchers have proposed more distributed and adaptive architectures. One prominent approach is multi-access edge computing (MEC), which brings computing and storage resources closer to end devices at the edge of the network. This configuration improves task offloading efficiency and enables real-time system responsiveness. Edge AI enables the execution of AI models directly at the network edge, allowing real-time decision-making with reduced latency and lower reliance on centralized servers. This approach is particularly valuable in time-sensitive and resource-constrained environments [1,2]. Due to the significant advantages of Edge AI, such as enhanced operational efficiency, improved data privacy, and ultra-low latency, global interest and investment in this field have surged in recent years. According to market analysis, the global Edge AI market is projected to exceed USD 66 billion by 2030 [3,4,5].

Despite its benefits, Edge AI and MEC are faced with several significant challenges. One of the most critical issues is the propagation of uncertainty in distributed environments. In this regard, stochastic geometry models can provide a powerful analytical framework for characterizing spatial randomness and evaluating network performance [6,7,8,9,10]. Furthermore, due to asynchronous updates and decentralized model execution, local errors can accumulate and spread across the network, ultimately degrading the overall model accuracy and reliability. Another major challenge lies in the non-independent and identically distributed (non-i.i.d) nature of user data. While various solutions have been proposed in the context of federated learning (FL) to mitigate this issue, concerns remain regarding the generalization and reliability of global models trained on highly heterogeneous local datasets [11]. Such data distribution disparities can significantly disrupt model convergence and performance. While most existing works on non-i.i.d data in FL and machine learning (ML) focus on non-uniformity of local data distributions (e.g., label skew or quantity skew), fewer studies address the non-independence aspect, where user data may be statistically correlated or dependent across clients [12,13,14].

These challenges highlight the need for robust and correlation-aware learning frameworks that maintain accuracy, adaptability, and convergence under decentralized and statistically dependent conditions. In such environments, traditional centralized or federated approaches often fall short, particularly when dealing with heterogeneous and temporally dynamic data [15,16,17].

To address these limitations, we propose a distributed learning framework tailored for partial task offloading in MEC environments. Unlike conventional paradigms, our approach enables user devices to selectively offload task-related data to edge or central servers (CS) via data-level cooperation. This setup leverages local computation, respects delay constraints, and accommodates statistical heterogeneity across users. A key novelty of our model lies in its integration of spatial and directional task correlations. This is often overlooked in existing works, i.e, the data is treated as being i.i.d which is not an accurate assumption. We introduce a correlation-aware loss function that explicitly incorporates data freshness and penalizes contributions from distant or weakly correlated users. This design enhances the relevance of updates, reduces the impact of outdated or biased data, and improves overall learning robustness in Edge AI systems. Using stochastic geometry analysis, we can analytically guarantee that users can be served by at least one local server at any time. In summary, the main question addressed in this paper is as follows: How can user tasks in a network be served efficiently under a partial offloading MEC scenario, with priority for edge servers, while taking into account the characteristics of an AI-driven edge model, including communication latency and the challenges posed by non-i.i.d data?

1.2. Contributions

Different from previous works [18,19,20,21,22,23], this paper introduces a novel cooperative computation framework for partial offloading in MEC environments. The main contributions are as follows:

  • We propose a cooperative partial offloading model in MEC environments, where tasks can be processed either locally by neighboring users or centrally by a server, taking spatial and directional correlations into account where a closed-form upper bound for spatial correlation is derived to constrain offloading decisions based on user proximity and sensing overlap.

  • To minimize learning loss under delay and resource constraints, a novel optimization problem is formulated that incorporates freshness-aware weighting, correlation modeling, and allocation decisions. Moreover, to improve robustness in non-i.i.d. settings, we integrate earth mover’s distance (EMD) into the loss function to capture distributional dissimilarity among users’ data.

  • To ensure scalability, we develop a coordination-free solution method suitable for practical deployment in distributed MEC systems.

  • Leveraging stochastic geometry, we provide tractable analytical characterizations of coverage probability and delay distribution, which not only enable probabilistic guarantees on task offloading but also ensure the generalizability of the proposed framework to large-scale and heterogeneous MEC networks.

  • We show that using the proposed framework, we can significantly reduce the computation load on the central server compared to baseline schemes and for a given delay threshold, we can give service to considerably higher number of users.

2. Related Works

In this section, we first review related works on MEC systems and their resource management strategies, followed by a discussion on studies focusing on Edge AI. Regardless of the method used to solve the optimization problem in MEC-based systems, task offloading is typically classified as either full or partial. In full offloading, the entire task is computed at the edge server, while in partial offloading, the task is split between local and edge computing server to balance latency and resource usage [24,25,26]. In [27] the authors investigate parallel task offloading in fog-enabled IoT networks, where computational tasks are divided into multiple sub-tasks and executed concurrently across heterogeneous fog nodes. They formulate the resource allocation problem with the objective of minimizing overall task latency while ensuring simultaneous completion of all sub-tasks, which is essential for efficient utilization of distributed resources. To address the inherent instability caused by interdependent task assignments, the authors design a matching-based allocation framework that maintains stable associations between task-originating devices and helper nodes. Through extensive simulations, the proposed method is shown to reduce average latency by up to 52% under high workload conditions, outperforming several state-of-the-art offloading strategies and demonstrating its suitability for large-scale, delay-sensitive IoT systems. In [20] the authors propose a computation offloading scheme for IoT applications involving dependent tasks. This scheme consists of two main components: a multi-queue priority algorithm that schedules dependent sub-tasks, and a deep reinforcement learning method based on Actor–Critic for making dynamic offloading decisions. The framework is designed to minimize task completion time and energy consumption in edge environments by handling task dependencies and fluctuating network conditions efficiently. By leveraging the Lyapunov method, the stochastic optimization problem is stated in  [22] with the aim of providing a joint task offloading and resource allocation framework for edge-assisted machine learning inference. This framework focuses on minimizing end-to-end latency while preserving inference accuracy and queue stability. The model considers local and edge inference options, whereby each console can adapt the quality of uploaded data and dynamically decide on offloading and computing strategies. The problem is decomposed into three sub-problems: offloading and channel allocation; data quality adjustment; and computational resource assignment. These sub-problems are solved using convex optimization and low-complexity heuristic methods. The simulation results demonstrate substantial improvements in latency reduction and stability under dynamic edge environments.

In [28], the authors proposed a method called the Decentralized Distributed Sequential Neural Network (DDSNN), tailored for low-power devices in wireless sensor networks, where conventional deep models face strict memory and energy constraints. By sequentially partitioning a LeNet model across multiple nodes, DDSNN enables fully decentralized inference without the need for compression or centralized coordination. In a predictive maintenance case study involving industrial pump vibration data, the framework preserved full precision, achieved 99% accuracy, and reduced inference latency by nearly 50% compared to the baseline. Although the accuracy gain over the non-distributed model was marginal, the authors emphasize that in highly resource-constrained settings, even slight improvements are significant, making DDSNN a practical and scalable solution.

The approach proposed in [21] addresses FL in heterogeneous edge environments by allowing each edge device to perform a different number of local updates, adapting to their computational capacity. This idea lays a foundation for handling heterogeneity in edge systems and can be extended further to include other real-world factors. For instance, while the original model focuses on computational diversity, it does not account for communication delays or queuing effects that are common in distributed systems. Additionally, the influence of data heterogeneity is not explicitly modeled.

Authors in [18] introduce a dynamic client selection strategy for FL, aiming to improve training efficiency under resource constraints in edge environments. The approach incorporates client-side characteristics such as computation power and data volume into a selection metric to adaptively involve suitable participants in each communication round. This method improves convergence speed and reduces communication cost. However, the model assumes relatively consistent client availability and does not explicitly address delay variability or the impact of severe data heterogeneity, which are common in real-world edge networks. Moreover, the paper focuses primarily on client selection policies rather than modeling deeper aspects such as uncertainty propagation or the effects of delayed or imbalanced updates on global performance. These aspects open opportunities for extending this methodology toward more delay-aware and distribution-sensitive learning frameworks.

To improve personalization in FL, the authors in  [29] propose an adversarial training framework combined with data-free knowledge distillation. Their method leverages earth mover’s distance (EMD) to align local and global data distributions, effectively addressing the challenge of non-i.i.d data across clients while preserving privacy. Although this framework improves global model adaptation, it does not consider delay sensitivity, dynamic task partitioning, or computing constraints common in edge environments. Nevertheless, the idea presents a promising direction that can be extended to such realistic, delay-aware, and resource-constrained edge AI settings.

With the aim of addressing label distribution skew in FL, the authors in [30] propose a novel learning approach that leverages knowledge distillation and a label-invariant teacher-student framework. Their method focuses on mitigating performance degradation due to non-i.i.d label distributions without directly sharing model parameters. This work sheds light on the importance of decoupling label heterogeneity from the model optimization process. While the approach effectively addresses label skew, it can be extended to incorporate delay sensitivity and partial offloading in edge computing environments—dimensions must be considered.

Furthermore, authors in [31] have explored pruning-aware collaborative inference of large AI models at the edge, where model partitioning and resource optimization are jointly considered to balance accuracy, latency, and energy consumption. While this provides important insights into enabling efficient edge inference, it does not address aspects such as training under heterogeneous data distributions and stochastic task offloading, which remain central to advancing edge intelligence.

To facilitate a more effective comparison between this work and prior studies, we present Table 1.

Table 1.

Comparison of related works with our proposed framework.

Ref. Method Main Contribution Comparison with Our Work
[27] Matching-based allocation in fog IoT Stable parallel sub-task execution, latency reduction Focuses on fog parallelism, our work integrates spatial correlation and non-i.i.d. modeling in MEC
[20] Multi-queue scheduling with Actor–Critic DRL Dependent task completion, energy efficiency Addresses task dependencies, our work emphasizes correlation-aware offloading and robustness to heterogeneous data
[22] Lyapunov-based stochastic optimization Joint offloading and resource allocation with latency/queue guarantees Provides stability analysis, our framework couples delay guarantees with PGD-based optimization
[28] Decentralized Sequential Neural Network (DDSNN) Lightweight inference across low-power devices Focuses on TinyML inference; our work targets MEC task offloading with stochastic geometry and learning integration
[21] Adaptive local updates in heterogeneous FL Handles device heterogeneity by adjusting update counts Considers computational diversity, our approach also incorporates delay constraints and correlation-aware offloading
[18] Client selection strategy for FL Faster convergence, reduced communication overhead Optimizes participant choice, our framework integrates delay guarantees and data heterogeneity in MEC
[29] Adversarial FL with Earth Mover’s Distance Improves global adaptation under non-i.i.d. data Focuses on privacy-preserving FL, our work extends EMD to MEC with offloading and latency constraints
[30] Label-invariant knowledge distillation in FL Mitigates label skew via teacher-student framework Addresses label heterogeneity, our framework also accounts for spatial correlation, partial offloading, and delay guarantees
Our Work PGD-based joint optimization in MEC Cooperative partial offloading, EMD-based robustness to non-i.i.d., stochastic geometry analysis, reduced CS load Provides unified framework coupling task offloading, correlation modeling, and delay-aware learning optimization

Symbol Notation: fX(x) and FX(x) denote the probability density function (PDF) and cumulative distribution function (CDF) of x. Pr(x) denotes the probability of x. |.| is the absolute value, and . is the ceiling function.

3. System Model and Parameters

This article focuses on a task offloading scenario in which data-generating equipment, referred to as requesting entities (REs), can delegate their computational tasks either to nearby local edge devices, referred to as serving entities (SEs), or to a centralized server (CS). The goal is to enable efficient and delay-sensitive learning by leveraging Edge AI frameworks, where training occurs across distributed nodes with limited resources and potentially heterogeneous data distributions. In this regard, as can be seen in Figure 1, we consider a central server for task computing.

Figure 1.

Figure 1

A hybrid edge architecture in which some entities request task execution (REs) and others provide computing resources (SEs). A CS is also available for additional offloading. Typical use cases include smart campuses or factories, where IoT devices and robots can be considered as REs, and where attendees smartphones can be considered as SEs performing partial task offloading.

The set of equipment whose tasks needed to be addressed is represented by RE={r}1R. In addition, the set of equipment that can provide computing resources for local task computing is denoted by SE={s}1S. We assume that the devices are randomly distributed over an area of size |A| (in square meters), following a Poisson Point Process (PPP) [32,33,34]. Let λSE and λRE denote the spatial intensities of SEs and REs, respectively, measured in devices per square meter. According to the properties of the PPP, the probability of observing exactly k devices of a given type—either SEs or REs—within a region of area |A| is given by f(k)=eλ|A|(λ|A|)kk!, where λ{λSE,λRE} denotes the spatial density (intensity) of SEs or REs, respectively. Furthermore, we assume that the spatial distributions of SEs and REs are independent (In our model, REs represent IoT devices or sensors that are typically deployed in large numbers and remain stationary within a given environment (e.g., a campus or factory). In contrast, SEs correspond to mobile devices, such as smartphones or laptops, which are carried by human users. Since the placement of REs is governed by specific deployment requirements, while the mobility of SEs is driven by human movement patterns, it is reasonable to assume that the spatial distributions of REs and SEs are mutually independent. Moreover, the locations of individual REs and SEs are also assumed to be independent of one another. This independence assumption is widely adopted in stochastic geometry models, as it enhances both realism and analytical tractability) of each other, consistent with the properties of independent homogeneous PPPs [33,35,36].

The size of the task for RE r is denoted by Dr (in byte), while the computing capacity of SE s is represented by Cs (in CPU cycles/s). Similarly, the computing capacity of the CS is denoted by C^. In this paper, we assume that the task data consists of image-based content. (For instance, one can envision a scenario involving sensors and mobile robots deployed across environments such as a university campus, smart factory, or smart city. These devices are responsible for capturing images of their surroundings to support perception and decision-making tasks.) Accordingly, we adopt a widely used benchmark dataset to model the image data in our experiments. In line with this, we define a dynamic field of view (FoV) for each RE r, represented by an angular direction ϕrt at time slot t. This direction evolves over time according t: ϕrt=ϕr0+tφ, where ϕr0[0,2π] is the initial viewing angle and φ is a constant angular increment per time slot. In addition, each RE is assumed to have a symmetric viewing range with a total width of φ˜, meaning that at time t, RE r can observe all the objects located within the angular sector ϕrtφ˜2,ϕrt+φ˜2. If the angular displacement φ between two consecutive time slots satisfies φφ˜, then the task based on the data collected at time t is assumed to be independent of the task based on the data collected at time t1. Otherwise, a dependency exists, which will be addressed in the subsequent sections.

Furthermore, the maximum visual sensing range for all REs is assumed to be equal and is represented by L. In addition, the time that RE r collects the data for its task, is denoted by t1<τrt. Furthermore, the maximum range of each edge server is denoted by L. In this paper, we assume that each RE can be covered by at least S servers but can only be assigned to one, the details of which is provided in Section 5. A data sample from the dataset of RE r is denoted by xrDr, where Dr represents the dataset associated with RE r. The size of each REs dataset is modeled as a uniform random variable:

|Dr|Uniform[nmin,nmax],r, (1)

where nmin and nmax denote the minimum and maximum number of samples, respectively. Here, nmax=ϱN, where N is the total number of samples in the global dataset, and ϱ[0,1] quantifies the degree of quantity skew across the REs. Here we assume that for the transmitted xr, the received version at SEs is represented by xrs, and at the CS, by x^r, where the effect of wireless channel transmission such as additive noise and fading have in been incorporated within them.

We use notation δr,st{0,1} to represent the assignment variable, where δr,st=1 if RE r is assigned to SE s, and δr,st=0 otherwise. Let αrt[0,1] represent the fraction of the data task from RE r that is offloaded to the CS. Consequently, (1αrt) denotes the fraction of the task offloaded to the assigned SE s. Accordingly, the number of data samples offloaded to the CS by RE r is αrtDr, and we denote this subset as D^r={x^r}. Similarly, the number of samples offloaded to SE s is (1αrt)Dr, and we denote this subset as Drs={xrs}. To increase the readability of paper, the list of symbols are provided in Table 2.

Table 2.

List of system model parameters.

Symbol Description
RE Set of requesting entities (task generators)
SE Set of serving entities (edge servers)
λRE Spatial density of REs (devices/m2)
λSE Spatial density of SEs (devices/m2)
|A| Total coverage area
Dr Task size of RE r (bytes)
Cs Computing capacity of SE s (CPU cycles/s)
C^ Computing capacity of central server (CS)
L Maximum sensing/coverage range of REs (m)
ϕrt Viewing angle of RE r at time slot t
φ˜ Angular width of the field of view (FoV)
φ Angular displacement between time slots
κ Correlation threshold among REs’ FoVs
f(k) PPP probability of observing k nodes in area |A|

4. The Proposed Problem

4.1. Constraints

Each RE must be assigned to one SE, as a result,

sδr,st=1,r,t. (2)

Furthermore, the following constraints ensure that the total computational load assigned to each SE and the CS does not exceed their respective computing capacities:

rγδr,st(1αrt)DrCs,s,t, (3)
rγαrtDrC^,t, (4)

where γ denotes the number of CPU cycles required per byte [37]. In addition, we consider computing delay, transmission delay, and queuing delay. The computing delay is calculated by:

Tr,Comp=γ(αrtDrC^+(1αrt)DrCs),r. (5)

The transmission delay also is obtained by the following:

Tr,Trans=αrtDrV^+(1αrt)DrV,r, (6)

where V and V^ are the transmission data rate between REs and SEs and the CS, in byte per second, respectively. By considering the M/G/1 queuing model for both the CS and the SEs, the total queuing delay experienced by RE r under partial offloading is modeled by Pollaczek–Khinchin formula. We denote by λtask the task generation rate of each RE (in tasks per second). The net arrival rates to CS and to SE s are then

ΛCS=λtaskrαrλtask|A|λREα¯, (7)
Λs=λtaskrδ˜r,s(1αr), (8)

where α¯ denotes the average offloading fraction across REs (for large systems one may use α¯=E[αr]). For the CS, the service time of a task originating from RE r (seconds) is

SCS,r=γαrDrC^,Drin[byte],C^in[CPUcycles/s]. (9)

Consequently, the first and second moments of the service time at the CS are

E[SCS]=E[γαD]C^,E[SCS2]=E[(γαD)2]C^2. (10)

By the Pollaczek–Khinchin formula for an M/G/1 queue, the mean waiting time in queue (excluding service) at the CS is

Wq,CS=ΛCSE[SCS2]2(1ρCS),ρCS=ΛCSE[SCS]. (11)

Analogously, for SE s with service time Ss,r=γ(1αr)DrCs we have

E[Ss]=E[γ(1α)D]Cs,E[Ss2]=E[γ2(1α)2D2]Cs2, (12)
Wq,s=ΛsE[Ss2]2(1ρs),ρs=ΛsE[Ss]. (13)

Finally, the expected queuing delay experienced by RE r (in seconds) under partial offloading and soft assignments is given by the mixture

Tr,Que=αrWq,CS+(1αr)sδ˜r,sWq,s. (14)

Finally, by the following constraint, we guarantee that total delay TMax must be less than threshold of maximum delay TMax:

Tr,Comp+Tr,Trans+Tr,Que=Tr,TotalTMax,r. (15)

4.2. Data Freshness and Distribution-Aware Loss Modeling

Due to the geographical distribution of REs, each RE observes a distinct data distribution, leading to statistical heterogeneity (non-i.i.d) across the system. This diversity may adversely affect model convergence, as many solutions assume i.i.d data and are developed based on this assumption. We refer to this scenario as non-i.i.d.-blind and we try to develop a non-i.i.d.-aware model. To do so, we formulate a delay-aware and distribution-sensitive loss model that incorporates statistical dissimilarity, temporal dynamics, and communication delay penalties. The global loss function at the CS is given by the following:

L(α,θ)=rx^rD^rαrtDrrαrtDrl(x^r;θ)ν(τr),t, (16)

where l(x^r;θ) is the local loss. In parallel, the loss function at each SE s is computed as follows:

Ls(α,δ,θs)=rxrsDrsδr,st((1αrt)Dr)rδr,st((1αrt)Dr)l(xrs;θs)gr(t)ν(τr),s, (17)

where the delay penalty term is defined as ν(τr)=e(tτr), and τr denotes the time when the sample from RE r was generated. This function prioritizes fresher data by assigning smaller penalties to more recent samples. The dissimilarity coefficient gr(t) quantifies the statistical distance between RE r’s local distribution and the global distribution using EMD:

gr(t)=exp(k=1KP^r,k(t)Pk(t)wk),r. (18)

To model temporal dynamics, assuming that at time slot t1, only K classes have been observed (i.e., Pr,k(t1)>0), while the remaining KK classes are unseen, the estimated class probabilities at time t are as follows:

P^r,k(t)Pr,k(t1)·eκtimemax(1,KK),ifPr,k(t1)>0,ϵ,ifPr,k(t1)=0, (19)

where κtime controls the decay of outdated distributions and ϵ ensures normalization (as shown in Appendix A). Furthermore, if all classes have been seen at time slot t1, (Pr,k(t1)>0,k), we assume that P^r,k(t)Pr,k(t1),r,k. Although this work focuses on quantity skew due to non-uniform sensing rates and task sizes, feature skew is partly reflected through the spatial randomness of RE datasets and wireless noise distortions. Label skew is not directly relevant here since the framework operates in an unsupervised setting. Future extensions could explicitly incorporate feature-level or domain-shift variations to capture broader heterogeneity conditions in Edge AI.

In addition, each RE is assumed to have a symmetric FoV with total angular width φ˜, centered around its viewing angle ϕr. That is, at time t, RE r can observe all objects located within an angular sector of width φ˜ centered at ϕr.

We consider REs uniformly distributed within a circular region of radius L:

(lrUniform[0,L],r), (20)

We incorporate both spatial and directional correlations between the tasks of REs r and r using a physically meaningful and dimensionally consistent formulation. The correlation depends on their positions lr, lr and viewing angles ϕr, ϕr, with angular difference Δϕ=|ϕrϕr|. As illustrated in Figure 2, the task correlation is modeled as follows:

Cr,r=explrlr22s2exp1cos(Δϕ)2σϕ2, (21)

where s is the spatial correlation length and σϕ is the angular correlation parameter, ensuring dimensional consistency. To ensure the correlation remains below a threshold κ(0,1], we require

Cr,rκ,r,r, (22)

Figure 2.

Figure 2

An illustrative example of how the FoV of REs overlap and how their spatial correlation influences their observations. The yellow star represents a typical point that is visible to both users.

This leads to the following geometric condition:

lrlr2+s2(1cos(Δϕ))2s2ln1κ. (23)

For system design, we consider the expected spatial configuration. Under uniform distribution, the expected squared distance is E[lrlr2]=L2/3. This yields the following simplified angular correlation:

Cmax(Δϕ)=κs·exp1cos(Δϕ)2σϕ2, (24)

where κs=expL2/6s2 represents the spatial correlation baseline. The fundamental design constraint becomes the following:

φ˜arccos12σϕ2lnexpL2/6s2κ, (25)

provided the argument lies in [1,1]. This closed-form expression provides a practical design guideline, clearly showing the trade-offs between spatial coverage (L), correlation parameters (s, σϕ), and the correlation threshold (κ) (more details are provided in Appendix B).

4.3. Problem Formulation

With the aim of minimizing the total loss of the system, i.e., the loss of the CS and SEs over variables α, δ, and θ, subject to the constraints discussed above, we state the following optimization problem:

minα,δ,θLCS(α,θ)+sLs(α,δ,θ), (26a)
s.t:sδr,st=1,r,t, (26b)
Tr,TotalTMax,r, (26c)
rγδr,st(1αrt)DrCs,s, (26d)
rγαrtDrC^, (26e)

where constraint (26b) guarantees that each RE is assigned to exactly one SE, while (26c) ensures that the total delay does not exceed the maximum allowable threshold TMax. In addition, constraints (26d,e) restrict the sizes of tasks offloaded to the local SEs and the CS so that they remain within their respective computing capacities.

5. Feasibility Analysis

In this section, we present a mathematical analysis to evaluate the feasibility of the system and ensure its consistency. Based on the properties of a homogeneous PPP, the PDF of the distance D between an RE and its nearest serving entity SE is given by

fD(d)=2πλSEdeλSEπd2,d0, (27)

where λSE denotes the density of SEs. The expected distance can be derived as

E[D]=12λSE. (28)

To guarantee that REs are within a maximum distance threshold dmax from at least one SE with high probability, we impose

Pr(Ddmax)=1eλSEπdmax21ε, (29)

where ε is the outage tolerance (i.e., the probability that an RE is not covered within dmax). Rearranging the above condition yields the following requirement on the SE density:

λSEln(ε)πdmax2. (30)

This condition provides a lower bound on the spatial density of SEs to achieve a coverage probability of at least 1ε within the distance threshold dmax. To satisfy the requirement that each RE is covered by at least S servers, we analyze the coverage under a homogeneous PPP. Let S˜ denote the number of SEs within distance L of a typical RE. Then, S˜Poisson(λSEπL2). To ensure P(NS)1ε˜, we must have the following:

eλSEπL2k=0S1(λSEπL2)kk!ε˜, (31)

and finally we have the following:

P(NS)1eλSEπL2k=0S1(λSEπL2)kk!, (32)

which implicitly provides a lower bound on λSE as a function of L and the target reliability 1ε˜. To illustrate, consider S=3 required SEs per RE. When the reliability target is 1ε˜=0.95, the corresponding coverage intensity is approximately 6.30. Hence, the minimum server density should satisfy the following:

λSE6.30πL2. (33)

For L=100m, this yields the following:

λSE2.01×104servers/m2, (34)

which is equivalent to one SE per 70×70m2 area on average. This corresponds to approximately 20 SEs per square kilometer, ensuring each RE has at least three SEs in range with 95% reliability.

If the reliability requirement is increased to 1ε˜=0.99, the coverage intensity increases to about 8.45, resulting in the following:

λSE8.45πL22.69×104servers/m2. (35)

For L=150m, this reduces to λSE1.20×104servers/m2, which still guarantees that each RE is covered by three SEs with probability at least 0.99.

These results numerically confirm that moderate SE densities, on the order of 104 servers/m2, are sufficient to maintain reliable multi-server coverage within typical urban cell sizes. Hereupon the set of servers to which RE r can be assigned based on the physical distance is denoted by Sr. Assuming identical task sizes Dr=D¯ and SE capacities Cs=C¯, the total required computing resources in the network is RED¯γ. If all REs adopt a uniform offloading strategy αrt=α¯, then the computing loads are split as D¯γα¯RE for the CS and D¯γ(1α¯)RE for the SEs. Assuming the total CS capacity is C^ and the total SE capacity is λSE|A|C¯, the maximum number of REs the network can support under this strategy is REMax(α¯)=minC^D¯γα¯,λSE|A|C¯D¯γ(1α¯).

While this analysis provides an upper bound, it does not incorporate the binary task assignment variables δr,s, which govern the actual RE-to-SE allocation decisions. As a result, the derived expression represents an idealized scenario. The practical feasibility of this result depends on whether the local constraints at each SE can be satisfied under the discrete assignment structure. Analytically, for α¯ we have α¯=C^λSE|A|C¯+C^. Figure 3 illustrates this trade-off across values of α¯.

Figure 3.

Figure 3

Effect of α¯ on the maximum number of REs supported by the network.

6. Solution Method

6.1. Dataset Description

To evaluate the performance of the proposed model, we use the widely adopted MNIST dataset in image classification and ML-based systems  [38,39,40]. The MNIST dataset consists of 70,000 gray-scale images of handwritten digits (0–9), each of size 28×28 pixels, with 60,000 samples for training and 10,000 for testing [41,42]. To simulate a realistic ML based setting, both datasets are partitioned in a non-i.i.d fashion across multiple edge nodes. Each node is assigned a unique subset of the data to reflect user-specific distributions, capturing the impact of data heterogeneity and decentralized learning on model performance.

6.2. Proposed Solution

We address the joint optimization problem in (26), which involves the offloading ratios α, task assignment variables δ, and learning model parameters θ. The combinatorial nature of the binary assignment variables makes direct optimization computationally prohibitive. To overcome this challenge, we employ a continuous relaxation framework, in which discrete decision variables are parameterized through smooth mappings and optimized via projected gradient descent (PGD).

6.2.1. Joint and Disjoint Optimization Protocols

To evaluate the effectiveness of the proposed optimization, two schemes are developed. In the joint optimization approach, as the proposed method, all the parameters (α,δ,θ) are updated simultaneously using PGD, which allows end-to-end coordination between learning dynamics and resource allocation decisions. In contrast, the disjoint optimization approach adopts, as a baseline, a sequential structure; one variable (e.g., α) is fixed while the remaining parameters (δ,θ) are optimized iteratively; and the process repeats by substituting the latest updates in subsequent optimization rounds until convergence. Both optimization protocols operate under identical dataset partitions, computing capacities, and latency constraints, ensuring a fair and consistent comparison of their convergence behavior and performance.

6.2.2. Assignment Relaxation

Let dr,s denote the distance between RE r and SE s. The normalized distance is defined as

d¯r,sdr,sdmax,dmax=maxr,sdr,s. (36)

The normalized load of SE s at time t is

C˜st=rδ˜r,st(1αrt)DrCs,hst[1C˜st]+, (37)

where hst represents the fraction of available capacity at SE s, and [x]+=max{0,x}. Based on these features, we define an affinity score between RE r and SE s:

qr,st=expλdd¯r,sexpλC˜st,λd,λ>0, (38)

which encourages assignments toward nearby and less-loaded servers. Matrix bt{br,st}r,s, is then expressed as a linear parametric function:

br,st=β0+βd(1d¯r,s)+βhst+βqlogqr,st. (39)

The relaxed assignment is obtained via a softmax mapping:

δ˜r,st=exp(br,st)sexp(br,st), (40)

which ensures δ˜rt lies on the probability simplex.

6.2.3. Offloading Ratio Relaxation

For each RE r, we define an aggregate edge suitability score:

Grtsδ˜r,sthst, (41)

which increases when nearby SEs have higher available capacity. The central-server logit, collected into the vector at{art}r, is parameterized as

art=τ0τ1Grt,τ1>0, (42)

leading to the relaxed offloading ratio

αrt=σ(art), (43)

where σ(·) is the sigmoid function. Thus, higher edge suitability Grt reduces αrt, prioritizing task processing at the edge.

6.2.4. Joint Optimization

The relaxed decision variables (a,b) and the learning model parameters θ are optimized on the augmented objective:

Ltotal=LTask+λ1Pcapacity+λ2Pdelay. (44)

Here, LTask denotes the unsupervised learning loss, composed of a reconstruction term and a distribution-alignment term to address non-i.i.d. data across servers:

LTask=Exx^(z)2+μDDivp(z|edge),p(z|CS), (45)

where x^(z) denotes the reconstructed task from latent representation z, and DDiv is a statistical divergence measure. The penalty terms ensure feasibility with respect to capacity and delay constraints:

Pcapacity=Emax{0,U(θ)Cmax}, (46)
Pdelay=Emax{0,T(θ)Tmax}. (47)

6.2.5. PGD Mathematical Details

The optimization problem in (26) is of the constrained form,

minxCf(x),x(α,δ,θ), (48)

where f(x) is the total system loss and C is the feasible set induced by (26b–e). To solve this, PGD alternates between a gradient step

y(k+1)=x(k)ηf(x(k)), (49)

and a projection step

x(k+1)=ΠCy(k+1), (50)

where η>0 is the learning rate and ΠC(·) denotes Euclidean projection onto C:

ΠC(y)=argminzCzy2. (51)

The gradient step reduces the objective in the unconstrained space, while the projection enforces feasibility with respect to capacity and delay. Under mild assumptions (e.g., Lipschitz continuity of f), PGD converges to a first-order stationary point. PGD updates at each iteration k, and the relaxed logits (a,b) and the model parameters θ are updated via

a(k+1)=a(k)ηaaLtotal(k), (52)
b(k+1)=b(k)ηbbLtotal(k), (53)
θ(k+1)=θ(k)ηθθLtotal(k). (54)

The sigmoid and softmax mappings ensure that α and δ remain valid throughout the updates. Finally, discrete assignments are obtained as

δr,st=1s=argmaxsδ˜r,st. (55)

The overall procedure is summarized in Algorithm 1.

Algorithm 1 Joint PGD-based offloading and assignment optimization
  •  1:

    Hyper-parameters: Server capacities {Cs}, CS capacity C^, maximum delay Tmax, learning rates ηa,ηb,ηθ, penalty weights λ, initial logits a0,b0, initial model parameters θ0

  •  2:

    for each time slot t do

  •  3:

       for iteration k do

  •  4:

         Compute offloading ratios: αrt,k=σ(ar(k)),r

  •  5:

         Compute soft assignments: δ˜r,s(k)=exp(br,s(k))sexp(br,s(k)),r,s

  •  6:

         Evaluate loss: L(k)=L(α(k),δ˜(k),θ(k))

  •  7:

         Compute gradients:

         ga=aL(k),gb=bL(k),gθ=θL(k)

  •  8:

         Update parameters:

         a(k+1)=a(k)ηaga;

         b(k+1)=b(k)ηbgb;

         θ(k+1)=θ(k)ηθgθ

  •  9:

       end for

  • 10:

    end for

6.2.6. Computational Complexity and Scalability Discussion

The computational complexity of the proposed PGD-based joint optimization mainly arises from gradient evaluations and projection operations. Each iteration involves O(RS+Pθ) operations, where R and S denote the numbers of REs and SEs, respectively, and Pθ is the number of learnable model parameters. Since the projection step is performed in closed form for α and δ, the overall per-iteration complexity scales linearly with network size. This makes the framework scalable to large MEC deployments. In terms of energy efficiency, cooperative task processing reduces redundant transmissions and central computations, leading to an estimated 35% decrease in total energy consumption compared to the fully centralized baseline, as verified in our simulations. Hence, the PGD formulation achieves a balanced trade-off between convergence speed, scalability, and energy efficiency for real-time Edge AI.

6.2.7. Implementation

The entire framework is implemented in PyTorch. Both the unsupervised reconstruction/alignment objective and the penalty terms are differentiable tensors. Using autograd, gradients are propagated through all components, enabling end-to-end training of (α,δ,θ) via stochastic PGD updates. The learning component is implemented as a lightweight convolutional neural network (CNN) to ensure compatibility with edge devices. The model consists of two convolutional layers with ReLU activations, followed by two fully connected layers. All the trainable parameters (θ,a,b) are optimized jointly using the Adam optimizer with learning rates ηθ=103, ηa=5×104, and ηb=5×104. The overall optimization minimizes the total loss Ltotal, which combines the unsupervised reconstruction and distribution-alignment terms with capacity and delay penalties as defined in (26). This design enables a stable, end-to-end training process while maintaining low computational overhead suitable for resource-constrained edge environments.

6.2.8. Convergence and Penalty Analysis

The convergence of the proposed PGD scheme can be characterized using standard results from constrained optimization theory. Let the total loss function f(x)=Ltotal be continuously differentiable with an L-Lipschitz gradient, i.e.,

f(x1)f(x2)2Lx1x22,x1,x2C. (56)

Then, for a fixed learning rate 0<η<2L, the PGD iteration

x(k+1)=ΠCx(k)ηf(x(k)), (57)

is guaranteed to converge to a first-order stationary point satisfying f(x),zx0,zC. In our setting, the feasible set C arises from the capacity and delay constraints, while the sigmoid and softmax relaxations ensure that (α,δ) remain differentiable and bounded during optimization.

Furthermore, the penalty terms in (26) provide a smooth relaxation of the original hard constraints:

Ltotal=LTask+λ1Pcapacity+λ2Pdelay, (58)

where λ1 and λ2 act as trade-off coefficients. By jointly incorporating both penalty components, the optimizer effectively balances constraint satisfaction and model performance within a unified objective surface, leading to improved numerical stability and faster convergence compared to treating each constraint separately.

7. Performance Evaluation

In this part, we evaluate the performance of the proposed MEC-based Edge AI framework under different system configurations, focusing on the interplay between task offloading, computation capacity, and data heterogeneity. The source code and data is available in [43].

7.1. Task Offloading and Delay

In Figure 4, we obtain the loss value for CS and SEs. As can be seen, as the steps increase, the loss value for both cases decreases pretty quickly. Since the CS has more data samples and higher computing capacity, the loss function falls more quickly.

Figure 4.

Figure 4

The loss values for LCS and Ls servers.

In Figure 5 we observe that by increasing the number of REs, due to the increase in data volume, the loss decreases. This will contribute in better accuracy as can be seen in Figure 6. note that since our framework follows an unsupervised learning paradigm, the term “accuracy” in all plots refers to clustering accuracy (CA), computed as the optimal label-alignment accuracy between the predicted clusters and ground-truth classes [44].

Figure 5.

Figure 5

Loss function for diffrent number of REs.

Figure 6.

Figure 6

Accuracy of model vs. the number of REs.

In Figure 6, we illustrate the effect of the number of REs on overall accuracy and as can be seen, more REs will result in an improved accuracy. On the other hand, more REs mean larger delay. As such, a threshold on the delay will impose an upper bound on the number of REs and simultaneously on the accuracy. For example, 30 REs can provide 99% accuracy. On the other hand, this number of REs causes a delay as large as 0.05 s, as can be seen in Figure 7, which is within the allowed value by 3GPP [45,46]. In addition, we set C^=4000 [CPU cycle/s] for the CS, Cs=200 [CPU cycle/s] s, and Dr=1.5 [Mega bytes], r.

Figure 7.

Figure 7

Effect of number of REs on the total delay in second for the joint scenario (proposed) and disjoint (baseline).

7.2. Comparison with the Baseline Scenario

Now we would like to compare our proposed scheme with a baseline scenario in which the task offloading is performed in a disjoint manner, i.e., each RE is restricted to either the CS or a single SE without coordination. Such a rigid allocation increases the reliance on the CS and leads to inefficient utilization of edge resources. In contrast, our proposed joint framework allows flexible task distribution across CS and SEs, which not only balances the load but also maximizes edge-side computing.

As can be seen in Figure 7, the proposed method causes less delay than the baseline for different values of REs. For example, if the target delay threshold is set to 0.045 s, the proposed scheme can accommodate up to 30 REs, while the baseline could only afford half of this.

To further demonstrate the practical advantage of the proposed cooperative optimization, we compare three deployment modes as only SEs computing, only CS offloading, and the proposed joint scheme with cooperative computing. The results in Figure 8 and Figure 9 show that the cooperative configuration achieves an excellent trade-off and while maintaining an average delay near to the edge-only configuration (0.042 s vs. 0.033 s), it significantly improves clustering average accuracy to 99.2%, surpassing both baselines. These results confirm that the adaptive coordination between SEs and the CS enhances system efficiency, reduces task congestion, and provides stable model performance under heterogeneous network conditions.

Figure 8.

Figure 8

Average total delay for the three modes: only SEs, only CS, and the proposed cooperative scheme. The cooperative mode achieves a balanced delay of 0.042 s, lower than the only CS case (0.060 s) and close to only SEs (0.033 s), in the case of 30 REs.

Figure 9.

Figure 9

Accuracy comparison for only SEs, only CS, and cooperative modes. The proposed method achieves 99.2% accuracy, outperforming only SEs (85%) and only CS (99.7%), in the case of 30 REs.

In Figure 10, we have compared the proposed method and baseline method in terms of the fraction of tasks that are offloaded on CS. As can be seen, for different values of REs, the load on CS is cut into half when using the proposed framework, which is important from a practical point of view.

Figure 10.

Figure 10

Effect of number of REs on the mean of α for the joint scenario (proposed) and disjoint (baseline).

7.3. Comparing Non-i.i.d.-Blind and Non-i.i.d.-Aware Scenarios

In this subsection, we first evaluate the performance of the proposed loss model in mitigating the effect of quantity skew, i.e., non-i.i.d.-aware, as formulated in (18) and (19), against the non-i.i.d.-blind scenario that does not account for it. Quantity skew arises when clients (REs) have highly imbalanced numbers of local samples. Unlike the balanced case, where each client contributes equally, the aggregation here is biased toward clients with larger datasets.

In Figure 11, we evaluate the effect of ϱ on accuracy for both the non-i.i.d.-blind and non-i.i.d.-aware cases. As can be seen, for all the values of ϱ, the proposed scheme improves the accuracy over the baseline scenario.

Figure 11.

Figure 11

Effect of quantity skew coefficient ϱ on the total accuracy of model for non-i.i.d.-blind (baseline) and -aware models (proposed), for 30 REs.

Moreover, for very low values of ϱ, corresponding to extremely small sample sizes at the clients, accuracy is less due to insufficient data. As ϱ increases, the number of available samples grows, which enhances performance and leads to higher accuracy up to an optimal point. Beyond this point, however, the variance in the distribution of data across clients becomes significant, introducing instability and larger errors, which ultimately causes accuracy to decrease. Overall, this demonstrates that the proposed scheme effectively mitigates the negative effects of quantity skew and highlights a non-monotonic relationship between ϱ and accuracy.

Finally, we investigate the impact of the correlation threshold κ on overall system accuracy, considering both the coverage overlap and spatial distribution of REs. Our analysis demonstrates that increasing the maximum allowable correlation between mutually visible REs considerably affects the system’s clustering accuracy, as shown in Figure 12. Specifically, when the correlation threshold κ is raised, the system incorporates more highly correlated data from overlapping FoVs, which amplifies the non-i.i.d nature of the collected datasets. This increased correlation leads to model overfitting and reduced generalization capability, ultimately degrading clustering performance.

Figure 12.

Figure 12

Effect of correlation on the system accuracy for non-i.i.d.-blind and -aware models.

As far as non-i.i.d.-blind and -aware scenarios are concerned, we can see that the accuracy of the non-i.i.d.-aware case has considerably improved compared to the non-i.i.d.-blind scenario that does not enforce the FoV constraint derived in (22). The difference, in some cases, is above 10% which is very significant in clustering and demonstrates the critical importance of properly regulating angular separation between REs through mathematical formulation of the correlation threshold.

Comparing the last 2 figures, we can see that in contrast to Figure 12, the improvement in accuracy in Figure 11 is not significant. However, it is important to note that for many applications, even a slight improvement in accuracy is critical. For example, in the context of smart manufacturing, this has the potential to reduce miss-clustering in defect detection. Similarly, in the field of healthcare, marginal gains have been shown to enhance diagnostics reliability [47,48].

8. Conclusions

This work presented a cooperative framework for partial task offloading in Edge-AI systems, leveraging stochastic geometry to capture spatial randomness and to guide correlation-aware resource allocation. A key methodological contribution was the derivation of a closed-form upper bound on spatial correlation, which enabled constraints ensuring that only relevant and timely contributions are included in the global model. We further formulated and solved a joint optimization problem over task assignments, offloading ratios, and learning parameters, and proposed a practical PGD method. Through feasibility analysis and simulations, we demonstrated that the framework effectively addresses the challenges of latency guarantees, accuracy, and scalability in heterogeneous MEC environments. In particular, the results confirm that explicitly handling non-i.i.d. data distributions and spatial correlations leads to superior performance compared to baseline models where learning and resource allocation are decoupled.

Appendix A

We assume that the class distribution for each RE r at time t satisfies the normalization condition:

kP^r,k(t)=kPr,k(t)=1,t,r. (A1)

Assume that at time slot t1, only K<K classes have been observed (i.e., Pr,k(t1)>0), while the remaining KK classes are unseen. To model the evolution of the class distribution over time while maintaining the normalization constraint, we define the estimated distribution at time t as follows:

kP^r,k(t)=eκtimek=1KPr,k(t1)=1+(KK)ϵ=1,t,r, (A2)

where κtime is a decay parameter controlling the memory of previous distributions, and ϵ is obtained by ϵ=1eκtimeKK.

Appendix B

To justify the validity and dimensional consistency of (22), this appendix provides the derivation of the spatial–directional correlation function used in the paper. We consider a homogeneous field of REs distributed according to a Poisson process within a circular region of radius L. Each RE observes its environment within a FoV characterized by an angle ϕr and a total width φ˜. Let two REs, located at lr and lr, have an angular separation Δϕ=|ϕrϕr| and Euclidean distance dr,r=lrlr.

Following isotropic Gaussian field models in spatial statistics [49,50] and stochastic geometry [51], the pairwise correlation between these REs can be expressed as

C(dr,r,Δϕ)=expdr,r22s2exp1cosΔϕ2σϕ2, (A3)

where s denotes the spatial correlation length and σϕ the angular correlation parameter, both ensuring dimensional consistency.

For uniformly distributed REs, the average correlation over spatial and angular randomness can be written as

C¯=1L2φ˜0φ˜0L0LC(dr,r,Δϕ)dl1dl2dΔϕ. (A4)

While the above integral does not admit a closed-form solution, a tractable and physically meaningful approximation can be obtained by replacing dr,r2 with its expected value under the uniform distribution, E[dr,r2]=L2/3. This substitution gives

C(Δϕ)κs·exp1cosΔϕ2σϕ2,κs=expL26s2, (A5)

which depends only on the normalized spatial scale L/s and the angular separation Δϕ. This simplification yields a closed-form, dimensionless expression that is well suited for system-level design.

To ensure that the correlation between any two REs remains below a desired threshold κ(0,1], we impose C(Δϕ)κ. Using (A5) and solving for Δϕ yields

Δϕ=arccos12σϕ2lnκsκ. (A6)

Therefore, to suppress excessive task similarity across users, the FoV width should satisfy

φ˜Δϕ=arccos12σϕ2lnexp(L2/6s2)κ. (A7)

This condition provides a physically interpretable trade-off between sensing range, allowable correlation, and FoV overlap. The approximation has been numerically validated and shown to remain within a small deviation of the full triple integral (A4), confirming its suitability for system-level design and analysis.

Author Contributions

Conceptualization, A.N.; Methodology, A.N.; Software, A.N.; Validation, H.S.; Formal analysis, H.S.; Investigation, A.N.; Resources, H.S.; Writing—original draft, A.N.; Writing—review & editing, H.S.; Visualization, H.S. and A.N.; Supervision, H.S.; Project administration, H.S.; Funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code and data of this paper can be found in [43].

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This work was supported by Qatar Research Development and Innovation Council (QRDI) under grant ARG01-0511-230129.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Ficili I., Giacobbe M., Tricomi G., Puliafito A. From sensors to data intelligence: Leveraging IoT, cloud, and edge computing with AI. Sensors. 2025;25:1763. doi: 10.3390/s25061763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bourechak A., Zedadra O., Kouahla M.N., Guerrieri A., Seridi H., Fortino G. At the confluence of artificial intelligence and edge computing in IoT-based applications: A review and new perspectives. Sensors. 2023;23:1639. doi: 10.3390/s23031639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.IEEE Future Networks. Edge Platforms and Services Evolving into 2030. 2021. [(accessed on 23 July 2025)]. Available online: https://futurenetworks.ieee.org/podcasts/edge-platforms-and-services-evolving-into-2030.
  • 4.Grand View Research. Edge AI Market Size, Share & Growth|Industry Report, 2030. 2024. [(accessed on 23 July 2025)]. Available online: https://www.grandviewresearch.com/industry-analysis/edge-ai-market-report.
  • 5.MarketsandMarkets Edge AI Hardware Industry Worth $58.90 Billion by 2030. 2025. [(accessed on 23 July 2025)]. Available online: https://www.marketsandmarkets.com/PressReleases/edge-ai-hardware.asp.
  • 6.Yang J., Chen Y., Lin Z., Tian D., Chen P. Distributed Computation Offloading in Autonomous Driving Vehicular Networks: A Stochastic Geometry Approach. IEEE Trans. Intell. Veh. 2024;9:2701–2713. doi: 10.1109/TIV.2023.3290369. [DOI] [Google Scholar]
  • 7.Gu Y., Yao Y., Li C., Xia B., Xu D., Zhang C. Modeling and Analysis of Stochastic Mobile-Edge Computing Wireless Networks. IEEE Internet Things J. 2021;8:14051–14065. doi: 10.1109/JIOT.2021.3068382. [DOI] [Google Scholar]
  • 8.Tran D.A., Do T.T., Zhang T. A stochastic geo-partitioning problem for mobile edge computing. IEEE Trans. Emerg. Top. Comput. 2020;9:2189–2200. doi: 10.1109/TETC.2020.2978229. [DOI] [Google Scholar]
  • 9.Hmamouche Y., Benjillali M., Saoudi S., Yanikomeroglu H., Renzo M.D. New Trends in Stochastic Geometry for Wireless Networks: A Tutorial and Survey. Proc. IEEE. 2021;109:1200–1252. doi: 10.1109/JPROC.2021.3061778. [DOI] [Google Scholar]
  • 10.Zhang Y., Chen G., Du H., Yuan X., Kadoch M., Cheriet M. Real-time remote health monitoring system driven by 5G MEC-IoT. Electronics. 2020;9:1753. doi: 10.3390/electronics9111753. [DOI] [Google Scholar]
  • 11.Li T., Sahu A.K., Talwalkar A., Smith V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020;37:50–60. doi: 10.1109/MSP.2020.2975749. [DOI] [Google Scholar]
  • 12.Solans D., Heikkila M., Vitaletti A., Kourtellis N., Anagnostopoulos A., Chatzigiannakis I. Non-i.i.d data in federated learning: A survey with taxonomy, metrics, methods, frameworks and future directions. arXiv. 20242411.12377 [Google Scholar]
  • 13.Lu Z., Pan H., Dai Y., Si X., Zhang Y. Federated learning with non-iid data: A survey. IEEE Internet Things J. 2024;11:19188–19209. doi: 10.1109/JIOT.2024.3376548. [DOI] [Google Scholar]
  • 14.Jimenez-Gutierrez D.M., Hassanzadeh M., Anagnostopoulos A., Chatzigiannakis I., Vitaletti A. A thorough assessment of the non-iid data impact in federated learning. arXiv. 2025 doi: 10.48550/arXiv.2503.17070.2503.17070 [DOI] [Google Scholar]
  • 15.Su W., Li L., Liu F., He M., Liang X. AI on the edge: A comprehensive review. Artif. Intell. Rev. 2022;55:6125–6183. doi: 10.1007/s10462-022-10141-4. [DOI] [Google Scholar]
  • 16.Shi Y., Yang K., Jiang T., Zhang J., Letaief K.B. Communication-efficient edge AI: Algorithms and systems. IEEE Commun. Surv. Tutor. 2020;22:2167–2191. doi: 10.1109/COMST.2020.3007787. [DOI] [Google Scholar]
  • 17.Letaief K.B., Shi Y., Lu J., Lu J. Edge artificial intelligence for 6G: Vision, enabling technologies, and applications. IEEE J. Sel. Areas Commun. 2021;40:5–36. doi: 10.1109/JSAC.2021.3126076. [DOI] [Google Scholar]
  • 18.Lin F.P.C., Hosseinalipour S., Michelusi N., Brinton C.G. Delay-aware hierarchical federated learning. IEEE Trans. Cogn. Commun. Netw. 2023;10:674–688. doi: 10.1109/TCCN.2023.3329024. [DOI] [Google Scholar]
  • 19.Wang S., Tuor T., Salonidis T., Leung K.K., Makaya C., He T., Chan K. Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 2019;37:1205–1221. doi: 10.1109/JSAC.2019.2904348. [DOI] [Google Scholar]
  • 20.Xiao H., Xu C., Ma Y., Yang S., Zhong L., Muntean G.M. Edge intelligence: A computational task offloading scheme for dependent IoT application. IEEE Trans. Wirel. Commun. 2022;21:7222–7237. doi: 10.1109/TWC.2022.3156905. [DOI] [Google Scholar]
  • 21.Qiao D., Guo S., Zhao J., Le J., Zhou P., Li M., Chen X. ASMAFL: Adaptive staleness-aware momentum asynchronous federated learning in edge computing. IEEE Trans. Mob. Comput. 2024;24:3390–3406. doi: 10.1109/TMC.2024.3510135. [DOI] [Google Scholar]
  • 22.Fan W., Chen Z., Hao Z., Wu F., Liu Y. Joint Task Offloading and Resource Allocation for Quality-Aware Edge-Assisted Machine Learning Task Inference. IEEE Trans. Veh. Technol. 2023;72:6739–6752. doi: 10.1109/TVT.2023.3235520. [DOI] [Google Scholar]
  • 23.Fan W., Li S., Liu J., Su Y., Wu F., Liu Y. Joint Task Offloading and Resource Allocation for Accuracy-Aware Machine-Learning-Based IIoT Applications. IEEE Internet Things J. 2023;10:3305–3321. doi: 10.1109/JIOT.2022.3181990. [DOI] [Google Scholar]
  • 24.Khalili A., Zarandi S., Rasti M. Joint Resource Allocation and Offloading Decision in Mobile Edge Computing. IEEE Commun. Lett. 2019;23:684–687. doi: 10.1109/LCOMM.2019.2897008. [DOI] [Google Scholar]
  • 25.Kuang Z., Li L., Gao J., Zhao L., Liu A. Partial Offloading Scheduling and Power Allocation for Mobile Edge Computing Systems. IEEE Internet Things J. 2019;6:6774–6785. doi: 10.1109/JIOT.2019.2911455. [DOI] [Google Scholar]
  • 26.Zhang S., Gu H., Chi K., Huang L., Yu K., Mumtaz S. DRL-Based Partial Offloading for Maximizing Sum Computation Rate of Wireless Powered Mobile Edge Computing Network. IEEE Trans. Wirel. Commun. 2022;21:10934–10948. doi: 10.1109/TWC.2022.3188302. [DOI] [Google Scholar]
  • 27.Malik U.M., Javed M.A., Frnda J., Rozhon J., Khan W.U. Efficient matching-based parallel task offloading in iot networks. Sensors. 2022;22:6906. doi: 10.3390/s22186906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bolat Y., Murray I., Ren Y., Ferdosian N. Decentralized Distributed Sequential Neural Networks Inference on Low-Power Microcontrollers in Wireless Sensor Networks: A Predictive Maintenance Case Study. Sensors. 2025;25:4595. doi: 10.3390/s25154595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhao Y., Li M., Lai L., Suda N., Civin D., Chandra V. Federated learning with non-i.i.d data. arXiv. 20181806.00582 [Google Scholar]
  • 30.Lai P., He Q., Xia X., Chen F., Abdelrazek M., Grundy J., Hosking J., Yang Y. Dynamic User Allocation in Stochastic Mobile Edge Computing Systems. IEEE Trans. Serv. Comput. 2022;15:2699–2712. doi: 10.1109/TSC.2021.3063148. [DOI] [Google Scholar]
  • 31.Lyu Z., Xiao M., Xu J., Skoglund M., Di Renzo M. The larger the merrier? Efficient large AI model inference in wireless edge networks. arXiv. 2025 doi: 10.48550/arXiv.2505.09214.2505.09214 [DOI] [Google Scholar]
  • 32.Wu Y., Zheng J. Modeling and Analysis of the Uplink Local Delay in MEC-Based VANETs. IEEE Trans. Veh. Technol. 2020;69:3538–3549. doi: 10.1109/TVT.2020.2970551. [DOI] [Google Scholar]
  • 33.Wu Y., Zheng J. Modeling and Analysis of the Local Delay in an MEC-Based VANET for a Suburban Area. IEEE Internet Things J. 2022;9:7065–7079. doi: 10.1109/JIOT.2021.3116195. [DOI] [Google Scholar]
  • 34.Cheng Q., Cai G., He J., Kaddoum G. Design and Performance Analysis of MEC-Aided LoRa Networks with Power Control. IEEE Trans. Veh. Technol. 2025;74:1597–1609. doi: 10.1109/TVT.2024.3459046. [DOI] [Google Scholar]
  • 35.Dhillon H.S., Ganti R.K., Baccelli F., Andrews J.G. Modeling and Analysis of K-Tier Downlink Heterogeneous Cellular Networks. IEEE J. Sel. Areas Commun. 2012;30:550–560. doi: 10.1109/JSAC.2012.120405. [DOI] [Google Scholar]
  • 36.Andrews J.G., Baccelli F., Ganti R.K. A Tractable Approach to Coverage and Rate in Cellular Networks. IEEE Trans. Commun. 2011;59:3122–3134. doi: 10.1109/TCOMM.2011.100411.100541. [DOI] [Google Scholar]
  • 37.Savi M., Tornatore M., Verticale G. Impact of processing-resource sharing on the placement of chained virtual network functions. IEEE Trans. Cloud Comput. 2019;9:1479–1492. doi: 10.1109/TCC.2019.2914387. [DOI] [Google Scholar]
  • 38.Mu Y., Garg N., Ratnarajah T. Federated learning in massive MIMO 6G networks: Convergence analysis and communication-efficient design. IEEE Trans. Netw. Sci. Eng. 2022;9:4220–4234. doi: 10.1109/TNSE.2022.3196463. [DOI] [Google Scholar]
  • 39.Wang H., Kaplan Z., Niu D., Li B. Optimizing federated learning on non-i.i.d data with reinforcement learning; Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications; Toronto, ON, Canada. 6–9 July 2020; pp. 1698–1707. [Google Scholar]
  • 40.He C., Annavaram M., Avestimehr S. Group knowledge transfer: Federated learning of large cnns at the edge. Adv. Neural Inf. Process. Syst. 2020;33:14068–14080. [Google Scholar]
  • 41.Deng L. The MNIST database of handwritten digit images for machine learning research [best of the web] IEEE Signal Process. Mag. 2012;29:141–142. doi: 10.1109/MSP.2012.2211477. [DOI] [Google Scholar]
  • 42.MNIST Dataset on Papers with Code. [(accessed on 23 July 2025)]. Available online: https://www.kaggle.com/datasets/hojjatk/mnist-dataset.
  • 43.Saeedi H., Nouruzi A. Stochastic-Geometric-based Modeling for Partial Offloading Task Computing in Edge AI Systems. GitHub Repository. [(accessed on 21 September 2025)]. Available online: https://github.com/alinouruzi/Stochastic-Geometric-based-Modeling-for-Partial-Offloading-Task-Computing-in-Edge-AI-Systems. [DOI] [PubMed]
  • 44.Xie J., Girshick R., Farhadi A. Unsupervised deep embedding for clustering analysis; Proceedings of the International Conference on Machine Learning, PMLR; New York, NY, USA. 16–24 June 2016; pp. 478–487. [Google Scholar]
  • 45.3GPP. 5G; Service Requirements for the 5G System (3GPP TS 22.261 Version 16.14.0 Release 16). Technical Specification ETSI TS 122 261 V16.14.0, ETSI. 2021. [(accessed on 29 July 2025)]. Available online: https://www.etsi.org/deliver/etsi_ts/122200_122299/122261/16.14.00_60/ts_122261v161400p.pdf.
  • 46.5G Hub. 5G Ultra Reliable Low Latency Communication (URLLC) 2023. [(accessed on 29 July 2025)]. Available online: https://5ghub.us/5g-ultra-reliable-low-latency-communication-urllc/
  • 47.Wang Q., Haga Y. Research on Structure Optimization and Accuracy Improvement of Key Components of Medical Device Robot; Proceedings of the 2024 International Conference on Telecommunications and Power Electronics (TELEPE); Frankfurt, Germany. 29–31 May 2024; pp. 809–813. [DOI] [Google Scholar]
  • 48.Zrubka Z., Holgyesi A., Neshat M., Nezhad H.M., Mirjalili S., Kovács L., Péntek M., Gulácsi L. Towards a single goodness metric of clinically relevant, accurate, fair and unbiased machine learning predictions of health-related quality of life; Proceedings of the 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES); Nairobi, Kenya. 26–28 July 2023; pp. 000285–000290. [DOI] [Google Scholar]
  • 49.Cressie N. Statistics for Spatial Data. John Wiley & Sons; Hoboken, NJ, USA: 2015. [Google Scholar]
  • 50.Dai R., Akyildiz I.F. A spatial correlation model for visual information in wireless multimedia sensor networks. IEEE Trans. Multimed. 2009;11:1148–1159. doi: 10.1109/TMM.2009.2026100. [DOI] [Google Scholar]
  • 51.Haenggi M. Stochastic Geometry for Wireless Networks. Cambridge University Press; Cambridge, UK: 2013. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The source code and data of this paper can be found in [43].


Articles from Sensors (Basel, Switzerland) are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES