. 2025 Aug 22;15:30845. doi: 10.1038/s41598-025-13983-4

Table 4.

Comparative evaluation of agents in terms of stability, convergence, and energy efficiency.

Agent	Reward (mean ± σ)	Convergence time (steps)	Inference energy (mJ)	Memory footprint (KB)
Delay shift	0.914 ± 0.017	72	0.87	128
Rule-based	0.803 ± 0.034	–	0.21	41
Softmax	0.866 ± 0.025	104	1.42	162
QoS heuristic	0.842 ± 0.049	118	0.98	97