Low-rank adaptation for edge AI

Zhixue Wang; Hongyao Ma; Jiahui Zhai

doi:10.1038/s41598-025-16794-9

. 2025 Sep 26;15:33109. doi: 10.1038/s41598-025-16794-9

Low-rank adaptation for edge AI

Zhixue Wang ^1,^✉, Hongyao Ma ², Jiahui Zhai ²

PMCID: PMC12475476 PMID: 41006599

Abstract

The rapid advancement of edge artificial intelligence (AI) has unlocked transformative applications across various domains. However, it also poses significant challenges in efficiently updating models on edge devices, which are often constrained by limited computational and communication resources. Here, we present low-rank adaptation method for Edge AI (LoRAE), Leveraging low-rank decomposition of convolutional neural networks (CNNs) weight matrices, LoRAE reduces the number of updated parameters to approximately 4% of traditional full-parameter updates, effectively mitigating the computational and communication challenges associated with model updates. Extensive experiments across image classification, object detection, and image segmentation tasks demonstrate that LoRAE significantly decreases the scale of trainable parameters while maintaining or even enhancing model accuracy. Using the YOLOv8x model, LoRAE achieves parameter reductions of 86.1%, 98.6%, and 94.1% across the three tasks, respectively, without compromising accuracy. These findings highlight the potential of LoRAE as an efficient and precise solution for resource-constrained edge AI systems.

Keywords: Edge AI, Low-rank adaptation, Model update efficiency, Parameter reduction

Subject terms: Computer science, Computational science

Introduction

The rapid advancement of artificial intelligence (AI), the Internet of Things (IoT), and edge computing has significantly expanded the scale and complexity of intelligent devices. Edge AI has emerged as a transformative technology, enabling localized data processing on edge devices and reducing reliance on cloud computing^1–3. This approach not only enhances data privacy and real-time responsiveness but also improves system efficiency through decentralization. It supports a wide range of applications, including smart cities and autonomous vehicles, where real-time decision-making is critical.

Edge AI, despite its many advantages, faces significant challenges, particularly in scenarios requiring frequent model updates. The large parameter sizes and high computational demands of deep learning models make on-device training on resource-constrained edge devices infeasible⁴. Fine-tuning models for specific tasks often involves modifying a substantial number of parameters, resulting in considerable computational and bandwidth costs. These limitations render traditional full-parameter update methods unsuitable for Edge AI systems, especially in dynamic environments characterized by constantly varying computational capabilities, fluctuating network transmission conditions, and evolving data distributions. The key challenge lies in reducing computational and communication costs during model updates while maintaining accuracy and performance^5,6. Approximate Computing provides a promising approach to balancing precision and functional correctness, effectively reducing computational demands, energy consumption, and communication latency. However, the adoption of low-rank approximation techniques in Edge AI remains limited⁷. While early efforts have focused on optimizing neural network size and computational complexity, methods such as Low-Rank Adaptation (LoRA)⁸, which employ matrix decomposition to reduce redundancy while preserving model performance, represent a nascent and underexplored area of research.

In recent years, LoRA⁸ has garnered significant attention due to its exceptional parameter efficiency and fine-tuning performance demonstrated in large language models and Transformer architectures. By adding and training low-rank decomposition update matrices alongside pre-trained weights, LoRA successfully reduces the number of trainable parameters significantly without sacrificing performance. However, LoRA’s effectiveness and potential challenges in CNNs still require in-depth exploration. CNNs, through their unique convolutional kernel structures, can effectively capture local patterns and hierarchical spatial features in images, thus making the preservation of spatial sensitivity (i.e., the spatial position and interrelationships of features) critically important⁴⁹. Directly applying the original form of LoRA to vision models often results in the inability to supplement local spatial correlation inductive bias. This is primarily because its low-rank decomposition, without specific optimization for convolutional operations, may interfere with or simplify these crucial spatial information, thereby weakening CNNs’ ability to capture fine details⁵⁰.

To tackle these challenges, we propose an innovative low-rank adaptation method for Edge AI (LoRAE), which offers an efficient approach to substantially cut down the number of parameters updated during model adjustments. By harnessing the low-rank decomposition of model weight matrices, LoRAE effectively mitigates communication and computational burdens, updating only a small subset of parameters to efficiently capture the most crucial model variations.

The study’s main contributions are twofold. First, we introduce LoRAE, a low-rank adaptive decomposition method that enhances the efficiency of edge AI. By capitalizing on convolutional properties, LoRAE compresses parameter updates to roughly 4%, significantly reducing both computational and communication costs, thereby providing an innovative solution for achieving efficient model updates in resource-constrained environments. Second, we conduct a systematic evaluation of LoRAE across 5 public datasets and 49 vision models, with the results offering valuable references for optimizing edge AI models.

The paper is structured as follows. Section II reviews related work on model updates and efficient fine-tuning in Edge AI, emphasizing limitations of traditional methods and recent trends. Section III introduces the design and implementation of LoRAE, detailing its algorithmic principles and optimization strategies. Section IV outlines the experimental setup, including datasets, metrics, and comparative analyses. Section V concludes with key contributions and future research directions.

Related work

Edge AI

Edge AI and Cloud AI constitute two prominent paradigms for AI deployment, each possessing distinct characteristics, benefits, and challenges¹². Cloud AI leverages centralized data centers to provide robust computing capabilities. This approach is particularly suited for processing extensive datasets and intricate AI models that demand substantial computational resources¹³. However, the centralized architecture of Cloud AI often leads to higher latency and increased bandwidth requirements. These issues become pronounced when frequent data transfers occur between edge devices and the cloud¹⁴.

In contrast, Edge AI performs computations on local devices with relatively limited computational power. This paradigm emphasizes low latency, bandwidth conservation, and enhanced data privacy². Edge AI is particularly advantageous in applications requiring real-time responses, such as those in the IoT domain, smart devices, autonomous vehicles, and smart cities. In these contexts, it enables devices to process sensor data promptly, achieving millisecond-level response times. This capability significantly enhances user experience and supports critical decision-making tasks, such as traffic management and public safety, without latency interference^15–18. However, achieving high detection accuracy in these applications is challenging, especially in dynamic and practical scenarios where detection types change in real-time. Frequent model parameter updates are essential to maintain optimal performance in such conditions.

Despite its advantages, Edge AI faces several significant challenges. A primary limitation is the restricted bandwidth, which severely impacts the efficiency of model updates and data transmission. The relatively low bandwidth between edge devices and the cloud slows down frequent model updates, elongating the time required for training and deployment. Furthermore, the storage capacity of edge devices is typically limited. This restricts their ability to store large-scale models and datasets locally, thereby impeding their capacity to maintain diverse and up-to-date AI models¹⁹. Additionally, the computational power of edge devices is often insufficient compared to cloud servers, making complex model training unsuitable for edge environments. Bandwidth constraints further hinder timely parameter updates, especially for deep neural network tasks that demand substantial computational resources²⁰.

To address these challenges, researchers have proposed various optimization techniques. Model compression significantly reduces the size of AI models, facilitating their transfer and deployment¹⁰. Incremental updates focus on transmitting only the changes in the model, thereby conserving bandwidth²¹. Edge-cloud collaboration combines the strengths of edge and cloud computing. In this approach, computationally intensive training tasks are offloaded to the cloud, while edge devices focus on real-time inference and decision-making²².

Parameter-efficient fine-tuning for AI models in resource-constrained environments

Fine-tuning is a pivotal technique in deep learning that adjusts the parameters of pre-trained models for specific tasks. It is particularly valuable when labeled data is limited or computational resources are constrained¹⁰. By leveraging the generalized features of pre-trained models, fine-tuning reduces computational costs while enhancing performance on new tasks, making it indispensable for Edge AI applications on resource-limited devices.

To optimize models for resource-constrained environments like TinyML, traditional methods often include model compression techniques such as pruning, knowledge distillation, and quantization. Pruning minimizes model size by removing less critical connections or neurons, enhancing memory and energy efficiency¹⁰. Knowledge distillation transfers knowledge from larger teacher models to smaller student models, improving student accuracy while maintaining compactness²². Quantization reduces model precision by converting high-precision floating-point weights to lower-bit integers. This technique significantly lowers storage requirements and computational costs, making it particularly advantageous for deployment on edge devices. Foundational research, such as the work on Quantized Neural Networks by Courbariaux et al.²³, demonstrated the feasibility of training models with extremely low-precision weights and activations. Comprehensive surveys, like that by Nagel et al.²⁴, further explore various quantization methods including Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), highlighting their importance for efficient deployment on resource-limited devices²⁵.

Recent advancements in large language models (LLMs) have made fine-tuning increasingly resource-intensive^26–29, primarily due to the exponential growth in model scale (with billions or even trillions of parameters), the ever-expanding size of training datasets, and the adoption of more complex training paradigms such as large-scale pre-training and instruction tuning. To address this, Parameter-Efficient Fine-tuning (PEFT) methods like LoRA have been proposed. LoRA employs low-rank matrix decomposition to adjust linear layer weights, drastically reducing trainable parameters and improving training efficiency without introducing additional inference overhead⁸. Variants like VeRA³⁰, FedPara³¹, GS-LoRA³², HydraLoRA³³, and MTLoRA³⁵, along with integrations with Mixture of Experts (MoE) architectures³⁵, further extend LoRA’s applicability.For instance, Conv-LoRA⁵⁰ integrates ultra-lightweight convolutional parameters into the LoRA framework, specifically to inject image-related inductive biases into plain Vision Transformer (ViT) encoders (e.g., in the Segment Anything Model, SAM).

While general PEFT methods, including LoRA, have excelled in Transformer-based architectures, their direct application to CNNs often faces challenges due to the unique spatial characteristics of convolutional layers. CNNs inherently capture local patterns and hierarchical spatial features, and a naive low-rank decomposition might disrupt these crucial inductive biases. Consequently, specialized low-rank adaptation techniques and other PEFT methods have been developed for CNNs and vision models, aiming to maintain efficiency while preserving spatial sensitivity.

Motivation and methodology

As the application scenarios for Edge AI become more diverse and the demands increase, the challenge of frequent model updates becomes particularly prominent. Traditional methods rely on transferring and updating large amounts of parameters. However, edge devices have limited computational resources and communication bandwidth. Therefore, it is crucial to minimize the number of parameters that need to be transferred and updated. This must be done while maintaining high performance and adaptability.

According to the Scaling Law, the size of deep learning models has been progressively increasing. However, pre-trained models typically operate within significantly smaller intrinsic dimensions, indicating that only a small portion of the parameter space is essential for effective fine-tuning. Fine-tuning this low-dimensional space can yield performance comparable to full-parameter updates, a concept that has been validated in large models⁸. This raises the question whether AI models with relatively small parameter sizes can achieve competitive performance by fine-tuning their intrinsic dimensions on edge devices. Additionally, it remains to be seen whether parameter updates can be minimized while maintaining high accuracy in high-frequency update scenarios. Although LoRA has shown success in transformer-based models, its effectiveness in convolutional neural networks (CNNs) remains unclear. Preliminary studies suggest that ConvNets, which focus on local patterns and spatial structures, do not benefit significantly from LoRA’s low-rank compression, leading to slower convergence and diminished performance.

To address the challenges of frequent updates to the device model under tight resource constraints, we introduce LoRAE, a low-rank adaptation method tailored for convolutional networks in edge AI. In each convolutional layer, we insert two learnable modules, the LoRA _extractor and the LoRA_mapper, both based on low-rank decomposition to effectively extract and map key update directions, thereby significantly reducing the number of parameters that must be updated. During training, as is shown in Fig. 1a , the original weights of the backbone network remain frozen while only these low rank matrices are updated by backpropagation, compressing the training parameter budget to approximately 4% and greatly enhancing training efficiency. At inference time (see Fig. 1b ), low-rank matrices can be fused once with the backbone weights (reparameterization) before deployment, ensuring that the computation graph and total multiply–accumulate operations are identical to those of a fully fine-tuned model with no added latency. Alternatively, one may load the low-rank matrices dynamically and add them at run-time without fusion, incurring negligible extra computation compared to the original backbone. Moreover, LoRAE offers deployment flexibility on edge devices: the backbone model is loaded only once, after which the compact LoRAE weights can be switched or loaded on demand, significantly reducing model loading latency and context-switch overhead, an especially valuable advantage in multitask or multi-user scenarios.

Fig. 1 — LoRAE training and inference workflow diagram.

Spatially sensitive low-rank adaptation for convolutional neural networks

LoRA has emerged as an effective technique for fine-tuning large-scale models by leveraging low-rank matrix decomposition. This approach reduces the scale of parameter updates, significantly lowering storage and computational requirements while enabling rapid adaptation of pretrained models. The weight update in LoRA is represented as:

where Inline graphic and . Here, and denote the input and output feature dimensions, respectively, and represents the rank of the decomposition. The pretrained weight matrix remains frozen, with only the low-rank matrices and being updated.

To address this limitation, LoRAE introduces LoRA_extractor and LoRA_mapper, which integrate low-rank decomposition while preserving essential spatial features, as illustrated in Fig. 2. The weight update in LoRAE is refined as:

where Inline graphic and are low-rank matrices designed specifically for convolutional layers to preserve spatial sensitivity. The matrix captures the spatial structures of the input, while maps the reduced features back to the full-dimensional output space. This design ensures the model retains its ability to learn local patterns while benefiting from the efficiency of low-rank decomposition.

Fig. 2 — Spatially sensitive low-rank adaptation with parameter reconstruction.

LoRA_extractor: Low-rank convolutional dimensionality reduction

LoRA_extractor replaces traditional low-rank fully connected matrices with a low-rank convolutional approach to achieve dimensionality reduction. By preserving the input channel count and setting the output channel count to the rank Inline graphic , LoRA_extractor employs convolutional operations to compress channel dimensions while extracting spatial features:

where Inline graphic represents the dimension-reduced feature map, and is the low-rank convolutional kernel. This method effectively compresses the parameter count while maintaining the spatial structure of the input image.

LoRA_mapper: Feature reconstruction with low-rank mapping

LoRA_mapper reconstructs the dimension-reduced feature map Inline graphic into the output space of the original weight matrix. This is achieved through matrix multiplication with a low-rank matrix:

where Inline graphic is the low-rank matrix. This ensures the restoration of spatial information and maintains the quality of the original feature map.

By reconfiguring LoRA with spatial sensitivity, LoRAE significantly reduces the scale of parameter updates while retaining spatial information critical to convolutional operations. This design makes it particularly suitable for resource-constrained environments, offering a novel solution for efficient model updates in edge AI applications.

Optimizations in LoRAE

LoRAE introduces optimizations that exploit the redundancy in convolutional operations, limiting weight updates to a low-dimensional subspace. Key benefits include:

Parameter scale reduction

The parameter count for traditional convolution is given by:

LoRAE reduces this to:

where Inline graphic .

The reduction ratio is approximately:

Computational complexity reduction

The computational complexity is reduced from:

to:

These optimizations render LoRAE highly effective for resource-constrained edge AI tasks.

Experiments

Experimental design and configuration

Task definition and datasets

To address challenges in resource-constrained edge AI scenarios, LoRAE is evaluated on image classification, object detection, and image segmentation tasks, encompassing both general vision tasks and domain-specific applications to comprehensively assess its performance and adaptability. The datasets used include ImageNet⁴⁴ for image classification, VOC⁴⁵ for object detection, and three domain-specific datasets for segmentation tasks: GlobalWheat2020⁴⁶ for wheat maturity detection in smart agriculture, Crack-seg⁴⁷ for crack detection in buildings, and Carparts-seg⁴⁸ for car part segmentation for intelligent driving. These datasets provide diverse scenarios to validate LoRAE’s effectiveness under high-dynamic environments and stringent resource constraints.

Models and hardware configuration

To comprehensively evaluate LoRAE, mainstream models across the three tasks were selected, spanning from traditional convolutional neural networks to state-of-the-art algorithms to provide a robust evaluation framework. For image classification, YOLO11-cls, YOLOv8-cls, EfficientNet, and ResNet were chosen; for object detection, YOLO11, YOLOv8, YOLOv5, and Faster R-CNN; and for image segmentation, YOLO11-seg, YOLOv8-seg, U-Net, and DeepLabV3+. r values of 2, 4, 8, 16, 32, and 64 were tested to investigate their impact on performance metrics, including training time, accuracy, and computational efficiency. For reproducibility, superparameter settings for YOLOv8 are summarized in Table 1. The experiments were conducted on a high-performance system with two Intel®Xeon®Gold 6248R processors (24 cores each) and two NVIDIA GeForce RTX 3090 GPUs (24 GB memory each) to ensure reliable and reproducible results.

Table 1.

Hyperparameters of YOLOv8 for three vision tasks.

Config	Vision task
Config	Image classification	Object detection	Image segmentation
Datasets	Cifar100	VOC	Carparts-seg
Optimizer	Adam	Adam	Adam
Training epochs	100 or 300	100 or 300	100 or 300
Initial learning rate	0.001	0.001	0.001
Batch size	1024	16	16
Image size	32	640	640
Weight decay	0.0005	0.0005	0.0005
Momentum	0.937	0.937	0.937
Warmup epochs	3.0	3.0	3.0
Dropout rate	0.0	–	–
Box loss gain	–	7.5	–
Class loss gain	–	0.5	–
IoU threshold	–	0.7	–
Max detections	–	300	–
Mask ratio	–	–	4.0
Overlap mask	–	–	True
Rank ()	2,3,4,...,64	2,3,4,...,64	2,3,4,...,64

Open in a new tab

Performance evaluation of LoRAE

This section presents a detailed quantitative and qualitative analysis of the LoRAE method across three tasks: image classification, object detection, and image segmentation. The evaluation systematically examines LoRAE’s effectiveness in enhancing model accuracy and optimizing trainable parameter scales. Key aspects of the analysis include validation accuracy, model parameter scale, loss and accuracy curves.

Tables 2, 3, 4 present a comparative analysis of training with and without LoRAE. In Tables 2, 3, 4 throughout this section, bold values represent the recommended rank (r) settings of LoRAE for corresponding model-task combinations (non-bold values are non-recommended), selected based on balanced parameter reduction and accuracy. The comparison emphasizes validation accuracy and changes in trainable parameters to demonstrate LoRAE’s effectiveness and applicability. The experimental results reveal that LoRAE significantly reduces trainable parameter scales. At the same time, it maintains or surpasses the accuracy levels achieved by conventional training methods across all three tasks. For image classification, LoRAE achieves consistent Top-1 and Top-5 accuracy on the CIFAR-100 dataset. It also significantly reduces the number of updated parameters. In object detection, LoRAE demonstrates substantial parameter reductions on the VOC and GlobalWheat2020 datasets. It maintains high mAP@50(B) and mAP@50-95(B) across both small models (e.g., YOLOv8n) and large models (e.g., YOLO11x), showcasing its adaptability. For image segmentation, LoRAE exhibits resource optimization advantages on the Crack-seg and Carparts-seg datasets. While small models experience minor accuracy drops within acceptable ranges, large models display superior performance.

Table 2.

Comparison of model accuracy and parameter counts in image classification.

Model		Cifar100		Upd. Params(M)
Model		Acc-top1	Acc-top5
YOLO11n-cls	–	0.552	0.821	1.66
YOLO11n-cls	64	0.533	0.822	0.99
YOLOv8n-cls	–	0.540	0.822	1.99
YOLOv8n-cls	64	0.542	0.813	0.84
EfficientNet-B1	–	0.561	0.816	7.79
EfficientNet-B1	64	0.542	0.813	2.14
ResNet-152	–	0.461	0.737	58.3
ResNet-152	64	0.442	0.703	6.77
YOLO11x-cls	–	0.729	0.919	28.4
YOLO11x-cls	64	0.714	0.922	6.57
YOLOv8x-cls	–	0.696	0.904	60.4
YOLOv8x-cls	64	0.684	0.906	8.4

Open in a new tab

Table 3.

Comparison of model accuracy and parameter counts in object detection.

Model		VOC		GlobalWheat2020		Upd. Params(M)
Model		mAP@50(B)	mAP@50-95(B)	mAP@50(B)	mAP@50-95(B)
YOLO11n	–	0.843	0.647	0.974	0.662	2.59
YOLO11n	8	0.867	0.676	0.964	0.621	0.38
YOLOv8n	–	0.820	0.622	0.969	0.640	3.01
YOLOv8n	8	0.829	0.627	0.952	0.588	0.29
YOLOv5s	–	0.842	0.648	0.974	0.663	9.13
YOLOv5s	8	0.881	0.694	0.964	0.625	0.50
Faster R-CNN	–	0.851	0.647	0.972	0.668	41.53
Faster R-CNN	8	0.854	0.667	0.962	0.938	9.18
YOLO11x	–	0.896	0.735	0.985	0.714	56.90
YOLO11x	8	0.936	0.795	0.980	0.686	5.62
YOLOv8x	–	0.881	0.717	0.984	0.708	68.20
YOLOv8x	8	0.932	0.783	0.976	0.694	1.93

Open in a new tab

Table 4.

Comparison of model accuracy and parameter counts in image segmentation.

Model		Crack-seg		Carparts-seg		Upd. Params(M)
Model		mAP@50(M)	mAP@50-95(M)	mAP@50(M)	mAP@50-9(M)
YOLO11n-seg	–	0.688	0.235	0.690	0.566	2.84
YOLO11n-seg	8	0.679	0.218	0.664	0.499	0.75
YOLOv8n-seg	–	0.660	0.213	0.680	0.559	3.27
YOLOv8n-seg	8	0.673	0.212	0.555	0.417	0.34
U-Net	–	0.636	0.201	0.632	0.534	24.44
U-Net	8	0.621	0.194	0.628	0.505	8.02
DeepLabV3+	–	0.667	0.222	0.663	0.574	58.63
DeepLabV3+	8	0.639	0.207	0.652	0.540	12.19
YOLO11x-seg	–	0.708	0.241	0.675	0.583	62.05
YOLO11x-seg	8	0.719	0.244	0.715	0.592	4.09
YOLOv8x-seg	–	0.704	0.236	0.704	0.607	71.77
YOLOv8x-seg	8	0.715	0.232	0.693	0.564	2.12

Open in a new tab

Figures 3 and 4 illustrate the training loss and validation accuracy curves across the three tasks. In Fig. 3a, the Top-5 accuracy stabilizes within approximately 20 epochs for all models. This indicates rapid feature learning. Validation accuracy improves with increased parameter scales, such as from YOLOv8n-cls to YOLO11x-cls. This demonstrates the enhanced feature representation capability of larger models. In Fig. 3b, the validation accuracy curves show that mAP@50 (Box) stabilizes after around 40 epochs. Models using LoRAE, such as YOLO11n Inline graphic , outperform their counterparts. This highlights LoRAE’s effectiveness in enhancing feature and positional information learning. Furthermore, Fig. 4b demonstrates that the training loss curves exhibit faster convergence and improved stability with LoRAE. Figures 3c and 4c reveal significant fluctuations in validation accuracy and loss curves during the first 100 epochs. These initial fluctuations are often observed in the early stages of training, particularly when adapting to new or complex tasks. They can be attributed to the model’s exploratory learning phase, where it is still adjusting to the specific nuances and distribution characteristics of the new taskâ€™s data, sometimes coupled with the dynamics of the learning rate scheduler. As training progresses, the model gradually converges, with accuracy increasing and loss decreasing overall. This demonstrates that LoRAE effectively captures distinguishing features of automotive parts. Despite a substantial reduction in trainable parameters, LoRAE achieves performance comparable to, or even surpassing, that of fully fine-tuned models.

Fig. 4 — Validation loss comparison across vision tasks.

Figure 5 provides further evidence for these findings by illustrating the relationship between updated parameters and accuracy across different model architectures. In image classification (Fig. 5a), LoRAE-enabled models sustain high Acc-top1 scores even with up to 77.0% parameter reduction, while larger models such as YOLO11x-cls exhibit only marginal accuracy degradation. For object detection (Fig. 5b), mAP@50(B) remains stable or even improves (e.g., YOLOv5s achieves gains with 97.3% fewer parameters) across various parameter scales. In image segmentation (Fig. 5c), consistent trends are observed, with models like YOLO11x-seg attaining higher mAP@50(M) while updating substantially fewer parameters. These results collectively confirm LoRAE’s effectiveness in balancing efficiency and performance across diverse tasks.

Fig. 5 — Performance evaluation of LoRAE across model architectures and configurations.

Overall, as task complexity increases, the required number of training epochs also grows. The LoRAE method demonstrates particularly strong performance in models with larger parameter scales. It significantly reduces the number of trainable parameters while achieving superior training accuracy.

Rank value analysis

This section investigates the impact of Inline graphic on model performance, emphasizing the relationship between validation accuracy and the number of updated parameters under different settings. The variation of validation accuracy with respect to is also analyzed. As summarized in Tables 5, 6, 7, the specific effects of on image classification, object detection, and image segmentation tasks are discussed. In Tables 5, 6, 7 throughout this section, bold values represent the recommended rank (r) settings of LoRAE for corresponding model-task combinations (non-bold values are non-recommended), selected based on balanced parameter reduction and accuracy. This analysis serves as a reference for selecting Inline graphic to balance model performance and resource efficiency. It provides guidance for determining optimal configurations for LoRAE across various tasks.

Table 5.

Performance of LoRAE on image classification with varying rank values.

Model		Cifar100		Upd. Params(M)
Model		Acc-top1	Acc-top5
YOLO11n-cls	–	0.552	0.821	1.66
	64	0.533 3.4%	0.822 0.1%	1.00 39.9%
	32	0.514 6.9%	0.805 2.0%	0.51 69.5%
	16	0.508 8.0%	0.807 1.7%	0.26 84.2%
	8	0.484 12.3%	0.7946 3.2%	0.14 91.6%
	4	0.415 24.8%	0.7267 11.5%	0.08 95.3%
	2	0.341 38.2%	0.639 22.2%	0.05 97.1%
YOLOv8n-cls	–	0.540	0.822	1.99
	64	0.542 0.4%	0.813 1.1%	0.84 57.5%
	32	0.499 7.6%	0.7972 3.0%	0.42 78.7%
	16	0.428 20.7%	0.7323 10.9%	0.21 89.4%
	8	0.374 30.6%	0.6733 18.1%	0.10 94.7%
	4	0.309 43.0%	0.5894 28.3%	0.05 97.3%
	2	0.226 58.2%	0.480 41.5%	0.03 98.7%
EfficientNet-B1	–	0.561	0.816	7.79
	64	0.542 3.3%	0.813 0.4%	2.14 72.4%
	32	0.511 8.9%	0.7864 3.7%	1.13 85.5%
	16	0.498 11.2%	0.7624 6.6%	0.60 92.4%
	8	0.423 24.7%	0.6967 14.7%	0.31 96.0%
	4	0.387 31.0%	0.6241 23.6%	0.17 97.9%
	2	0.241 57.0%	0.5246 35.8%	0.09 98.9%
ResNet-152	–	0.461	0.737	58.35
	64	0.442 4.1%	0.703 4.7%	6.78 88.4%
	32	0.407 11.7%	0.676 8.4%	3.64 93.8%
	16	0.361 21.7%	0.596 19.2%	1.98 96.6%
	8	0.315 31.6%	0.561 23.9%	1.01 98.3%
	4	0.241 47.8%	0.468 36.6%	0.54 99.1%
	2	0.148 67.9%	0.346 53.0%	0.30 99.5%
YOLO11x-cls	–	0.729	0.919	28.48
	64	0.714 2.1%	0.922 0.3%	6.57 77.0%
	32	0.708 2.9%	0.918 0.1%	3.43 88.0%
	16	0.676 7.3%	0.909 1.1%	1.87 93.4%
	8	0.650 10.8%	0.893 2.8%	1.08 96.2%
	4	0.613 15.9%	0.878 4.5%	0.69 97.6%
	2	0.574 21.3%	0.849 7.6%	0.50 98.3%
YOLOv8x-cls	–	0.696	0.904	60.47
	64	0.684 1.7%	0.906 0.2%	8.40 86.1%
	32	0.668 4.0%	0.902 0.2%	4.20 93.0%
	16	0.644 7.5%	0.892 1.3%	2.10 96.5%
	8	0.605 13.1%	0.866 4.2%	1.05 98.3%
	4	0.559 19.7%	0.826 8.6%	0.53 99.1%
	2	0.480 31.2%	0.763 15.6%	0.26 99.6%

Open in a new tab

Table 6.

Performance of LoRAE on object detection with varying rank values.

Model		VOC		GlobalWheat2020		Upd. Params (M)
Model		mAP@50(B)	mAP@50-95(B)	mAP@50(B)	mAP@50-95(B)
YOLOv5s	–	0.842	0.648	0.974	0.663	9.13
	64	0.885 5.1%	0.690 6.5%	0.967 0.7%	0.640 3.5%	2.01 78.0%
	32	0.887 5.3%	0.691 6.6%	0.970 0.4%	0.646 2.6%	1.01 88.9%
	16	0.883 4.9%	0.692 6.8%	0.974 0.0%	0.658 0.8%	0.50 94.5%
	8	0.881 4.6%	0.694 7.1%	0.964 1.0%	0.625 5.7%	0.25 97.3%
	4	0.873 3.7%	0.677 4.5%	0.968 0.6%	0.630 4.9%	0.13 98.6%
	2	0.867 3.0%	0.667 3.0%	0.963 1.1%	0.614 7.4%	0.06 99.3%
Faster R-CNN	–	0.851	0.647	0.972	0.668	41.53
	64	0.851 0.0%	0.654 1.1%	0.941 3.2%	0.581 13.0%	38.42 7.5%
	32	0.853 0.2%	0.658 1.7%	0.958 1.4%	0.621 7.0%	30.36 26.9%
	16	0.867 1.9%	0.672 3.9%	0.970 0.2%	0.649 2.8%	16.24 60.8%
	8	0.854 0.4%	0.667 3.1%	0.962 1.0%	0.638 4.5%	9.18 77.9%
	4	0.857 0.7%	0.659 1.9%	0.962 0.1%	0.619 7.3%	5.65 86.4%
	2	0.829 2.6%	0.628 3.0%	0.924 4.9%	0.554 17.1%	3.88 90.7%
YOLOv8n	–	0.820	0.622	0.969	0.640	3.01
	64	0.840 2.4%	0.642 3.2%	0.962 0.7%	0.619 3.3%	2.30 23.7%
	32	0.843 2.8%	0.642 3.2%	0.958 1.1%	0.611 4.5%	1.15 61.7%
	16	0.833 1.6%	0.636 2.3%	0.955 1.4%	0.599 6.4%	0.58 80.7%
	8	0.829 1.1%	0.627 0.8%	0.952 1.8%	0.588 8.1%	0.29 90.4%
	4	0.790 3.7%	0.583 6.3%	0.941 3.0%	0.561 12.3%	0.14 95.4%
	2	0.736 10.2%	0.523 15.9%	0.927 4.3%	0.541 15.5%	0.07 97.7%
YOLOv8x	–	0.881	0.717	0.984	0.708	68.17
	64	0.931 5.7%	0.771 7.5%	0.976 0.8%	0.670 5.4%	15.46 77.3%
	32	0.930 5.6%	0.778 8.5%	0.974 1.0%	0.675 4.7%	7.73 88.7%
	16	0.932 5.8%	0.773 7.8%	0.978 0.6%	0.675 4.7%	3.87 94.3%
	8	0.932 5.8%	0.783 9.2%	0.976 0.8%	0.694 2.0%	1.93 97.2%
	4	0.932 5.8%	0.782 9.1%	0.974 1.0%	0.656 7.4%	0.97 98.6%
	2	0.930 5.6%	0.778 8.5%	0.977 0.8%	0.661 6.6%	0.48 99.3%
YOLO11n	–	0.843	0.647	0.974	0.662	2.59
	64	0.839 0.5%	0.644 0.5%	0.971 0.3%	0.651 1.7%	2.22 14.3%
	32	0.841 0.2%	0.644 0.5%	0.970 0.4%	0.643 2.9%	1.17 54.8%
	16	0.870 3.2%	0.678 4.8%	0.968 0.6%	0.639 3.5%	0.65 74.9%
	8	0.867 2.9%	0.676 4.5%	0.964 1.0%	0.621 6.2%	0.38 85.3%
	4	0.866 2.7%	0.675 4.3%	0.960 1.4%	0.610 7.9%	0.25 90.3%
	2	0.865 2.6%	0.668 3.2%	0.954 2.1%	0.591 10.7%	0.19 92.7%
YOLO11x	–	0.896	0.735	0.985	0.714	56.90
	64	0.932 4.0%	0.791 7.6%	0.982 0.3%	0.707 1.0%	15.46 72.9%
	32	0.936 4.5%	0.792 7.7%	0.982 0.3%	0.692 3.1%	9.15 83.9%
	16	0.937 4.6%	0.787 7.1%	0.981 0.4%	0.686 4.0%	5.62 90.1%
	8	0.936 4.5%	0.795 8.2%	0.980 0.5%	0.681 4.6%	3.86 93.2%
	4	0.925 3.2%	0.762 3.7%	0.978 0.7%	0.671 5.9%	2.54 95.5%
	2	0.939 4.8%	0.796 8.3%	0.974 1.1%	0.661 7.4%	1.27 97.8%

Open in a new tab

Table 7.

Performance of LoRAE on image segmentation with varying rank values.

Model		Crack-seg		Carparts-seg		Upd. Params (M)
Model		mAP@50(M)	mAP@50-95(M)	mAP@50(M)	mAP@50-95(M)
U-Net	–	0.636	0.201	0.632	0.534	24.44
	64	0.642 0.9%	0.204 1.5%	0.640 1.3%	0.529 0.9%	20.12 17.7%
	32	0.645 1.4%	0.205 2.0%	0.648 2.5%	0.524 1.9%	16.06 34.3%
	16	0.638 0.3%	0.203 1.0%	0.635 0.5%	0.518 3.0%	12.03 50.8%
	8	0.621 2.4%	0.194 3.5%	0.628 0.6%	0.505 5.4%	8.02 67.2%
	4	0.577 9.3%	0.182 9.5%	0.620 1.9%	0.498 6.7%	4.01 83.6%
	2	0.519 18.4%	0.163 18.9%	0.607 4.0%	0.471 11.8%	2.00 91.8%
DeepLabV3+	–	0.667	0.222	0.663	0.574	58.63
	64	0.672 0.8%	0.225 1.4%	0.668 0.8%	0.569 0.9%	47.19 19.5%
	32	0.678 1.6%	0.228 2.7%	0.672 1.4%	0.563 1.9%	35.75 39.0%
	16	0.662 0.8%	0.217 2.3%	0.660 0.5%	0.550 4.2%	24.38 58.4%
	8	0.639 4.2%	0.207 6.8%	0.652 1.7%	0.540 6.0%	12.19 79.2%
	4	0.621 6.9%	0.200 9.9%	0.643 3.0%	0.531 7.5%	6.09 89.6%
	2	0.576 13.6%	0.183 17.6%	0.635 4.2%	0.522 9.1%	3.05 94.8%
YOLOv8n-seg	–	0.660	0.213	0.680	0.559	3.27
	64	0.684 3.6%	0.230 8.0%	0.664 2.3%	0.515 7.9%	2.71 17.0%
	32	0.667 1.1%	0.212 0.5%	0.646 5.0%	0.509 9.0%	1.36 58.5%
	16	0.653 1.1%	0.212 0.5%	0.628 7.6%	0.481 14.0%	0.68 79.2%
	8	0.673 2.0%	0.212 0.5%	0.555 18.4%	0.417 25.4%	0.34 89.6%
	4	0.676 2.4%	0.217 1.9%	0.542 20.3%	0.403 28.0%	0.17 94.8%
	2	0.669 1.4%	0.206 3.3%	0.415 38.9%	0.310 44.5%	0.08 97.4%
YOLOv8x-seg	–	0.704	0.236	0.704	0.607	71.77
	64	0.673 4.4%	0.219 7.2%	0.726 3.1%	0.602 0.8%	17.00 76.3%
	32	0.575 18.3%	0.188 20.3%	0.728 3.4%	0.603 0.7%	8.49 88.2%
	16	0.712 1.1%	0.222 5.9%	0.719 2.1%	0.593 2.3%	4.25 94.1%
	8	0.715 1.6%	0.232 1.7%	0.693 1.6%	0.564 7.1%	2.12 97.0%
	4	0.717 1.8%	0.235 0.4%	0.695 1.6%	0.567 6.6%	1.06 98.5%
	2	0.724 2.8%	0.238 0.8%	0.687 2.4%	0.563 7.2%	0.53 99.3%
YOLO11n-seg	–	0.688	0.235	0.690	0.566	2.84
	64	0.671 2.5%	0.216 8.1%	0.708 2.6%	0.559 1.2%	2.85 0.1%
	32	0.717 4.2%	0.232 1.3%	0.698 1.2%	0.543 4.1%	2.64 7.3%
	16	0.687 0.1%	0.228 3.0%	0.689 0.1%	0.536 5.3%	1.38 51.5%
	8	0.679 1.3%	0.218 7.2%	0.664 3.8%	0.499 11.8%	0.75 73.6%
	4	0.674 2.0%	0.216 8.1%	0.604 12.5%	0.447 21.0%	0.44 84.7%
	2	0.674 2.0%	0.210 10.6%	0.588 14.8%	0.425 24.9%	0.28 90.2%
YOLO11x-seg	–	0.708	0.241	0.675	0.583	62.05
	64	0.668 5.7%	0.221 8.3%	0.706 4.6%	0.587 0.7%	18.03 71.0%
	32	0.672 5.1%	0.225 6.6%	0.719 6.5%	0.595 2.1%	10.06 83.8%
	16	0.701 1.0%	0.236 2.1%	0.727 7.7%	0.606 3.9%	6.08 90.2%
	8	0.719 1.6%	0.244 1.2%	0.715 6.0%	0.592 1.6%	4.09 93.4%
	4	0.712 0.6%	0.240 0.4%	0.710 5.2%	0.579 0.7%	3.09 95.0%
	2	0.714 0.8%	0.252 4.6%	0.681 1.0%	0.558 4.3%	2.60 95.8%

Open in a new tab

In image classification, the results on the CIFAR-100 dataset (Table 5) demonstrate a gradual decline in model accuracy as Inline graphic decreases. This decline is primarily attributed to the reduced feature representation capacity. However, even at , the YOLO11x-cls and YOLOv8x-cls models achieve substantial parameter reductions of 86.1% and 98.3%, respectively. Notably, both models show improvements in Top-5 accuracy by 0.3% and 0.2%, with only minor decreases in Top-1 accuracy. These results highlight the effectiveness of LoRAE in balancing performance and efficiency. For object detection, Table 6 indicates that LoRAE maintains competitive performance even at Inline graphic . The YOLOv5s model achieves mAP@50 and mAP@50-95 improvements of 4.6% and 7.1% on the VOC dataset, respectively, while reducing trainable parameters by 97.3%. Larger models, such as YOLO11x, exhibit stable performance at lower . In contrast, smaller models are more sensitive to reductions, particularly on complex datasets like GlobalWheat2020. In image segmentation, Table 7 summarizes performance trends under varying Inline graphic . For example, with , the YOLOv8x-seg model achieves a 0.1% mAP@50 improvement on the Crack-seg dataset, along with a 66.8% parameter reduction. Larger models show greater stability at lower . However, smaller models, such as YOLOv8n-seg, experience more pronounced performance drops.

Overall, the results demonstrate that Inline graphic has a significant impact on model performance and parameter reduction. While lower values reduce trainable parameters, they may degrade performance, particularly for smaller models. Medium values (e.g., or ) strike a balance between performance and resource efficiency. Larger models exhibit greater stability at lower Inline graphic . Additionally, task and dataset characteristics influence sensitivity to . Proper selection of enables efficient model updates and performance optimization in resource-constrained scenarios.

Visualization analysis

To further validate the performance of the LoRAE method across tasks, this study presents a visualization analysis of object detection and image segmentation, demonstrating its applicability in edge AI scenarios. As shown in Fig. 6, detailed experimental designs and dataset splits were conducted for both tasks. The object detection task involves multi-object scenes (e.g., cat, dog, person, sofa), while the image segmentation task focuses on vehicle components (e.g., windows, doors, wheels). Red dashed boxes in images highlight detection/segmentation errors for intuitive performance comparison.

In object detection (Fig. 6a ), two methods were compared: Retrain and LoRAE. Quantitative and qualitative analyses focused on detection accuracy, false detection rate, and complex-scene adaptability. Under the 8-15-5 split, LoRAE demonstrated superior detection capability: Retrain failed to detect cats in Image I (indices 4-6), while LoRAE successfully identified them. In Image II (indices 4-6), LoRAE correctly detected sofas, whereas Retrain missed them. Retrain also exhibited cat miss-detections and sofa false-detections in Image II (indices 4, 7), while LoRAE produced precise results. Overall, LoRAE significantly reduced false annotation rates and improved detection accuracy.

In image segmentation (Fig. 6b ), LoRAE outperformed Retrain in boundary processing and complex-scene detail segmentation. Retrain failed to identify front mirrors in Image I (index 7), while LoRAE achieved accurate segmentation. In Image II, Retrain showed inaccuracies in wheel segmentation and door differentiation, where as LoRAE provided precise door and wheel segmentation with minor wheel-region deviations. Experimental results confirmed that LoRAE reduced class error rates and enhanced segmentation accuracy/reliability, particularly in boundary details.

Combined results show LoRAE outperforming Retrain in both tasks. LoRAE achieved efficient model optimization with minimal parameter updates, reducing false annotation rates in object detection and improving boundary segmentation in image tasks. These findings validate LoRAE’s practicality for resource-constrained edge AI. Compared to Retrain, LoRAE maintains high performance with significantly fewer parameter updates. Future research may explore its application to larger datasets and additional tasks.

Conclusion and future work

This study introduces an innovative low-rank adaptation method for Edge AI (LoRAE), specifically designed to address the challenges of efficient model updates in resource-constrained edge AI scenarios. The approach leverages low-rank decomposition of weight matrices to minimize the number of updated parameters, achieving approximately 4% of the parameter updates required by traditional full-parameter methods. This effectively mitigates computational and communication burdens during model adjustments. LoRAE significantly reduces the scale of trainable parameters while maintaining accuracy comparable to–and in some cases exceeding-full-parameter update methods. Extensive experiments across image classification, object detection, and image segmentation tasks validate its performance. For example, in object detection using the YOLOv8x model with Inline graphic , LoRAE reduces parameter updates by 98.6% while improving mAP@50(B) by 5.8%. Even at , LoRAE achieves a mere 1.6% decrease in mAP@50(B) compared to traditional retraining, demonstrating its robustness. Currently, the effectiveness of LoRAE has primarily been validated on 2D visual tasks such as image classification, object detection, and image segmentation. However, its applicability and performance in broader domains, including 3D object recognition, multimodal learning, and other non-visual tasks, require further in-depth investigation and validation. Furthermore, while LoRAE demonstrates excellent performance in efficiently reducing model update parameters, its potential as an aggressive direct model compression method for creating extremely compact models still necessitates more comprehensive exploration and empirical research.

Future work will focus on two primary directions. First, while this study emphasizes visual tasks (e.g., object detection, image segmentation), subsequent research will explore the applicability of LoRAE to other domains, such as natural language processing and multimodal tasks. Second, inspired by the significant intrinsic low-rank characteristics observed in small models, future efforts will investigate leveraging LoRAE for direct model compression. The goal is to develop extremely compact models that surpass original performance, offering a novel optimization approach for resource-constrained environments. These advancements will further solidify LoRAE’s role as an efficient solution for edge AI systems with stringent computational and communication constraints.

Author contributions

Zhixue Wang, the corresponding author, conceived the LoRAE method, designed the low-rank decomposition framework and convolutional layer optimization modules, and led the experimental design and manuscript writing; Hongyao Ma participated in algorithm modeling and multi-model validation across image classification, object detection, and segmentation tasks to analyze rank value impacts; Jiahui Zhai analyzed edge device resource constraints, handled dataset processing and experiment reproduction.

Funding

No funding support was received for this research.

Data availability

The datasets used in our experiments are all public datasets. The CIFAR-100 can be accessed at https://www.cs. toronto.edu/ kriz/cifar.html, the VOC is available at http://host.robots.ox.ac.uk/pascal/VOC/, the Global Wheat 2020 can be found at https://www.kaggle.com/c/global-wheat-detection/data, the Crack-seg is accessible at https://universe.roboflow.com/university-bswxt/crack-bphdr, and the Carparts-seg is available at https://universe.roboflow.com/gianmarco-russo-vt9xr/car-seg-un1pm.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Singh, R. & Gill, S. S. Edge AI: A survey. Internet Things Cyber Phys. Syst.3, 71–92 (2023). [Google Scholar]
2.Shi, Y. et al. Communication-efficient edge AI: Algorithms and systems. IEEE Commun. Surv. Tutorials22(4), 2167–2191 (2020). [Google Scholar]
3.Martin, J. et al. Embedded vision intelligence for the safety of smart cities. J. Imaging8(12), 326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hyysalo, J. et al. Smart mask-Wearable IoT solution for improved protection and personal health. Internet Things18, 100511 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Daghero, F., Pagliari, D. J. & Poncino, M. Energy-efficient deep learning inference on edge devices. Adv. Comput.122, 247–301 (2021). [Google Scholar]
6.Capra, M. et al. Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead. IEEE Access8, 225134–225180 (2020). [Google Scholar]
7.Damsgaard, H. J. et al. Adaptive approximate computing in edge AI and IoT applications: A review. J. Syst. Architect.150, 103114 (2024). [Google Scholar]
8.Hu, E. J., Shen, Y., & Wallis, P., et al. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
9.Sufian, A. et al. A survey on deep transfer learning to edge computing for mitigating the COVID-19 pandemic. J. Syst. Architect.108, 101830 (2020). [Google Scholar]
10.Qi, C. et al. An efficient pruning scheme of deep neural networks for Internet of Things applications. EURASIP J. Adv. Sig. Process.2021(1), 31 (2021). [Google Scholar]
11.Gupta, S., & Agrawal, A. Gopalakrishnan K, et al. Deep learning with limited numerical precision. In International Conference on Machine Learning, 1737-1746 (PMLR, 2015).
12.Wang, X., Han, Y., & Leung, V. C. M., et al. Edge AI: Convergence of Edge Computing and Artificial Intelligence. (Springer, 2020).
13.Dastjerdi, A. V. & Buyya, R. Fog computing: Helping the Internet of Things realize its potential. Computer49(8), 112–116 (2016). [Google Scholar]
14.Cui, L. et al. A survey on application of machine learning for Internet of Things. Int. J. Mach. Learn. Cybernet.9, 1399–1417 (2018). [Google Scholar]
15.Teoh, Y. K., Gill, S. S. & Parlikad, A. K. IoT and fog-computing-based predictive maintenance model for effective asset management in Industry 4.0 using machine learning. IEEE Internet Things J.10(3), 2087–2094 (2021). [Google Scholar]
16.Kamruzzaman, M. M. New opportunities, challenges, and applications of edge-AI for connected healthcare in smart cities. IEEE Globecom Workshops (GC Wkshps) 1–6 (IEEE, 2021).
17.Soro, S. TinyML for ubiquitous edge AI. arXiv preprint arXiv:2102.01255, (2021).
18.Lovén, L., Leppänen, T., & Peltonen, E. et al. EdgeAI: A vision for distributed, edge-native artificial intelligence in future 6G networks. 6G Wireless Summit, March 24-26, 2019 Levid, (2019).
19.Sipola, T. et al. 31st Conference of Open Innovations Association (FRUCT). 320–331 (IEEE, 2022).
20.Marculescu, R., Marculescu, D., & Ogras, U. Edge AI: Systems design and ML for IoT data analytics. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 3565–3566 (2020).
21.Han, S., Pool, J., & Tran, J. et al. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems. 28 (2015)
22.Heo, B., Kim, J., & Yun, S. et al. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision 1921–1930 (2019).
23.Courbariaux, M., Bengio, Y. & David, J. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res.18, 1–31 (2016). [Google Scholar]
24.Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., van Baalen, M., & Blankevoort, T. A White Paper on Neural Network Quantization. arXiv preprint arXiv:2106.08295. (2021).
25.Wang, C. H., Huang, K. Y., & Chen, J. C. et al. Heterogeneous federated learning through multi-branch network. In 2021 IEEE International Conference on Multimedia and Expo (ICME) 1–6 (IEEE, 2021).
26.Chowdhery, A. et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res.24(240), 1–113 (2023). [Google Scholar]
27.Hoffmann, J., Borgeaud, S., & Mensch, A. et al. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 10 (2022).
28.Touvron, H., Lavril, T., & Izacard, G. et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, (2023).
29.Touvron, H., Martin, L., & Stone, K. et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, (2023).
30.Kopiczko, D. J., Blankevoort, T., & Asano, Y. M. Vera: Vector-based random matrix adaptation. arXiv preprint arXiv:2310.11454, (2023).
31.Hyeon-Woo, N., Ye-Bin, M., & Oh, T.H. Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv preprint arXiv:2108.06098, (2021).
32.Zhao, H., Ni, B., & Fan, J., et al. Continual forgetting for retrained vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 28631–28642 (2024).
33.Tian, C., Shi, Z. & Guo, Z. et al. HydraLoRA: An asymmetric LoRA architecture for efficient fine-tuning. arXiv preprint arXiv:2404.19245, (2024).
34.Agiza, A., Neseem, M. & Reda, S. MTLoRA: Low-rank adaptation approach for efficient multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 16196–16205 (2024).
35.Dou, S., Zhou, E. & Liu, Y., et al. LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1932–1945 (2024). [DOI] [PMC free article] [PubMed]
36.Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proceed. Natl. Acad. Sci.114(13), 3521–3526 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Chen, J., Wang, Y. & Wang, P., et al. DiffusePast: Diffusion-based generative replay for class incremental semantic segmentation. arXiv preprint arXiv:2308.01127, (2023).
38.Isele, D. & Cosgun, A. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018). [DOI] [PMC free article] [PubMed]
39.Liu, Y., Schiele, B. & Vedaldi, A., et al. Continual detection transformer for incremental object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 23799–23808 (2023).
40.Zhu, F. et al. Class-incremental learning via dual augmentation. Adv. Neural Inf. Process. Syst.34, 14306–14318 (2021). [Google Scholar]
41.Hospedales, T. et al. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell.44(9), 5149–5169 (2021). [DOI] [PubMed] [Google Scholar]
42.Douillard, A., Ramé, A. & Couairon, G., et al. Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9285–9295 (2022).
43.Liu, Y., Schiele, B. & Sun, Q. Adaptive aggregation networks for class-incremental learning. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition 2544–2553 (2021).
44.Deng, J., Dong, W. & Socher, R., et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
45.Everingham, M. et al. The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis.88, 303–338 (2010). [Google Scholar]
46.David, E., Madec, S. & Sadeghi-Tehran, P. et al. Global wheat head detection (GWHD) dataset: A large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics, (2020). [DOI] [PMC free article] [PubMed]
47.Crack-Bphdr Dataset. (2022). Open Source Dataset by University. Roboflow Universe, Roboflow. Retrieved December, 2022, from https://universe.roboflow.com/university-bswxt/crack-bphdr
48.Gianmarco, Russo. (2023). Car-seg Dataset [Open Source Dataset]. Roboflow Universe, Roboflow. Retrieved January 24, 2024, from https://universe.roboflow.com/gianmarco-russo-vt9xr/car-seg-un1pm
49.He, K., Zhang, X. & Ren, S. et al. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (2016).
50.Zhong, Z., Tang, Z. & He, T. et al. Convolution meets lora: Parameter efficient finetuning for segment anything model. arXiv preprint arXiv:2401.17868, (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Singh, R. & Gill, S. S. Edge AI: A survey. Internet Things Cyber Phys. Syst.3, 71–92 (2023). [Google Scholar]

[CR2] 2.Shi, Y. et al. Communication-efficient edge AI: Algorithms and systems. IEEE Commun. Surv. Tutorials22(4), 2167–2191 (2020). [Google Scholar]

[CR3] 3.Martin, J. et al. Embedded vision intelligence for the safety of smart cities. J. Imaging8(12), 326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Hyysalo, J. et al. Smart mask-Wearable IoT solution for improved protection and personal health. Internet Things18, 100511 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Daghero, F., Pagliari, D. J. & Poncino, M. Energy-efficient deep learning inference on edge devices. Adv. Comput.122, 247–301 (2021). [Google Scholar]

[CR6] 6.Capra, M. et al. Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead. IEEE Access8, 225134–225180 (2020). [Google Scholar]

[CR7] 7.Damsgaard, H. J. et al. Adaptive approximate computing in edge AI and IoT applications: A review. J. Syst. Architect.150, 103114 (2024). [Google Scholar]

[CR8] 8.Hu, E. J., Shen, Y., & Wallis, P., et al. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.

[CR9] 9.Sufian, A. et al. A survey on deep transfer learning to edge computing for mitigating the COVID-19 pandemic. J. Syst. Architect.108, 101830 (2020). [Google Scholar]

[CR10] 10.Qi, C. et al. An efficient pruning scheme of deep neural networks for Internet of Things applications. EURASIP J. Adv. Sig. Process.2021(1), 31 (2021). [Google Scholar]

[CR11] 11.Gupta, S., & Agrawal, A. Gopalakrishnan K, et al. Deep learning with limited numerical precision. In International Conference on Machine Learning, 1737-1746 (PMLR, 2015).

[CR12] 12.Wang, X., Han, Y., & Leung, V. C. M., et al. Edge AI: Convergence of Edge Computing and Artificial Intelligence. (Springer, 2020).

[CR13] 13.Dastjerdi, A. V. & Buyya, R. Fog computing: Helping the Internet of Things realize its potential. Computer49(8), 112–116 (2016). [Google Scholar]

[CR14] 14.Cui, L. et al. A survey on application of machine learning for Internet of Things. Int. J. Mach. Learn. Cybernet.9, 1399–1417 (2018). [Google Scholar]

[CR15] 15.Teoh, Y. K., Gill, S. S. & Parlikad, A. K. IoT and fog-computing-based predictive maintenance model for effective asset management in Industry 4.0 using machine learning. IEEE Internet Things J.10(3), 2087–2094 (2021). [Google Scholar]

[CR16] 16.Kamruzzaman, M. M. New opportunities, challenges, and applications of edge-AI for connected healthcare in smart cities. IEEE Globecom Workshops (GC Wkshps) 1–6 (IEEE, 2021).

[CR17] 17.Soro, S. TinyML for ubiquitous edge AI. arXiv preprint arXiv:2102.01255, (2021).

[CR18] 18.Lovén, L., Leppänen, T., & Peltonen, E. et al. EdgeAI: A vision for distributed, edge-native artificial intelligence in future 6G networks. 6G Wireless Summit, March 24-26, 2019 Levid, (2019).

[CR19] 19.Sipola, T. et al. 31st Conference of Open Innovations Association (FRUCT). 320–331 (IEEE, 2022).

[CR20] 20.Marculescu, R., Marculescu, D., & Ogras, U. Edge AI: Systems design and ML for IoT data analytics. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 3565–3566 (2020).

[CR21] 21.Han, S., Pool, J., & Tran, J. et al. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems. 28 (2015)

[CR22] 22.Heo, B., Kim, J., & Yun, S. et al. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision 1921–1930 (2019).

[CR23] 23.Courbariaux, M., Bengio, Y. & David, J. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res.18, 1–31 (2016). [Google Scholar]

[CR24] 24.Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., van Baalen, M., & Blankevoort, T. A White Paper on Neural Network Quantization. arXiv preprint arXiv:2106.08295. (2021).

[CR25] 25.Wang, C. H., Huang, K. Y., & Chen, J. C. et al. Heterogeneous federated learning through multi-branch network. In 2021 IEEE International Conference on Multimedia and Expo (ICME) 1–6 (IEEE, 2021).

[CR26] 26.Chowdhery, A. et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res.24(240), 1–113 (2023). [Google Scholar]

[CR27] 27.Hoffmann, J., Borgeaud, S., & Mensch, A. et al. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 10 (2022).

[CR28] 28.Touvron, H., Lavril, T., & Izacard, G. et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, (2023).

[CR29] 29.Touvron, H., Martin, L., & Stone, K. et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, (2023).

[CR30] 30.Kopiczko, D. J., Blankevoort, T., & Asano, Y. M. Vera: Vector-based random matrix adaptation. arXiv preprint arXiv:2310.11454, (2023).

[CR31] 31.Hyeon-Woo, N., Ye-Bin, M., & Oh, T.H. Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv preprint arXiv:2108.06098, (2021).

[CR32] 32.Zhao, H., Ni, B., & Fan, J., et al. Continual forgetting for retrained vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 28631–28642 (2024).

[CR33] 33.Tian, C., Shi, Z. & Guo, Z. et al. HydraLoRA: An asymmetric LoRA architecture for efficient fine-tuning. arXiv preprint arXiv:2404.19245, (2024).

[CR34] 34.Agiza, A., Neseem, M. & Reda, S. MTLoRA: Low-rank adaptation approach for efficient multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 16196–16205 (2024).

[CR35] 35.Dou, S., Zhou, E. & Liu, Y., et al. LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1932–1945 (2024). [DOI] [PMC free article] [PubMed]

[CR36] 36.Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proceed. Natl. Acad. Sci.114(13), 3521–3526 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Chen, J., Wang, Y. & Wang, P., et al. DiffusePast: Diffusion-based generative replay for class incremental semantic segmentation. arXiv preprint arXiv:2308.01127, (2023).

[CR38] 38.Isele, D. & Cosgun, A. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018). [DOI] [PMC free article] [PubMed]

[CR39] 39.Liu, Y., Schiele, B. & Vedaldi, A., et al. Continual detection transformer for incremental object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 23799–23808 (2023).

[CR40] 40.Zhu, F. et al. Class-incremental learning via dual augmentation. Adv. Neural Inf. Process. Syst.34, 14306–14318 (2021). [Google Scholar]

[CR41] 41.Hospedales, T. et al. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell.44(9), 5149–5169 (2021). [DOI] [PubMed] [Google Scholar]

[CR42] 42.Douillard, A., Ramé, A. & Couairon, G., et al. Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9285–9295 (2022).

[CR43] 43.Liu, Y., Schiele, B. & Sun, Q. Adaptive aggregation networks for class-incremental learning. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition 2544–2553 (2021).

[CR44] 44.Deng, J., Dong, W. & Socher, R., et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

[CR45] 45.Everingham, M. et al. The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis.88, 303–338 (2010). [Google Scholar]

[CR46] 46.David, E., Madec, S. & Sadeghi-Tehran, P. et al. Global wheat head detection (GWHD) dataset: A large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics, (2020). [DOI] [PMC free article] [PubMed]

[CR47] 47.Crack-Bphdr Dataset. (2022). Open Source Dataset by University. Roboflow Universe, Roboflow. Retrieved December, 2022, from https://universe.roboflow.com/university-bswxt/crack-bphdr

[CR48] 48.Gianmarco, Russo. (2023). Car-seg Dataset [Open Source Dataset]. Roboflow Universe, Roboflow. Retrieved January 24, 2024, from https://universe.roboflow.com/gianmarco-russo-vt9xr/car-seg-un1pm

[CR49] 49.He, K., Zhang, X. & Ren, S. et al. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (2016).

[CR50] 50.Zhong, Z., Tang, Z. & He, T. et al. Convolution meets lora: Parameter efficient finetuning for segment anything model. arXiv preprint arXiv:2401.17868, (2024).

PERMALINK

Low-rank adaptation for edge AI

Zhixue Wang

Hongyao Ma

Jiahui Zhai

Abstract

Introduction

Related work

Edge AI

Parameter-efficient fine-tuning for AI models in resource-constrained environments

Motivation and methodology

Fig. 1.

Spatially sensitive low-rank adaptation for convolutional neural networks

Fig. 2.

LoRAextractor: Low-rank convolutional dimensionality reduction

LoRAmapper: Feature reconstruction with low-rank mapping

Optimizations in LoRAE

Experiments

Experimental design and configuration

Task definition and datasets

Models and hardware configuration

Table 1.

Performance evaluation of LoRAE

Table 2.

Table 3.

Table 4.

Fig. 3.

Fig. 4.

Fig. 5.

Rank value analysis

Table 5.

Table 6.

Table 7.

Visualization analysis

Fig. 6.

Conclusion and future work

Author contributions

Funding

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

LoRA_extractor: Low-rank convolutional dimensionality reduction

LoRA_mapper: Feature reconstruction with low-rank mapping