Abstract
This paper addresses real-time performance issues in assistive mobility systems by presenting a qualitative histogram of oriented gradients (HOG)-based visual servoing (QHOGVS) approach for autonomous wheelchair navigation. A computationally effective HOG-based visual servoing implementation is proposed, which enables real-time performance on low-power hardware while preserving feature visibility during navigation. The suggested technique improves speed by 68
, from 0.08 FPS to 5.5 FPS on a Raspberry Pi, while maintaining stable trajectory tracking. This is accomplished by optimizing the pipeline using Python vectorization and incorporating an adaptive activation function to modulate error convergence. Experimental validation, including real-world sequential target reaching and corridor following tasks, demonstrates the system’s functionality on a physical wheelchair platform. By providing insights into the trade-offs between computational efficiency and navigation precision in embedded systems, this work bridges the gap between theoretical visual servoing principles and practical assistive technology applications. The findings demonstrate the feasibility of the approach and indicate clear avenues for further study in the area of adaptive visual navigation for people with limited mobility.
Keywords: Visual servoing, Wheelchair navigation, HOG features, Qualitative control, Assistive robotics, Real-time systems, Python optimization
Subject terms: Engineering, Mathematics and computing
Introduction
At the nexus of mobile robotics and assistive technology, the creation of autonomous navigation systems for powered wheelchairs is a crucial challenge1–3. The need for intelligent wheelchair systems has never been higher due to the aging of the world’s population and the rise in the prevalence of mobility impairments4,5. Conventional wheelchair navigation usually depends on simple sensor-based obstacle avoidance or direct human control, both of which are insufficient for users with severe motor limitations6. While visual servoing presents a promising solution by enabling navigation through visual feedback from onboard cameras7, recent work by Santana et al.8 has shown how intuitive interfaces can significantly improve wheelchair usability.
Since it was originally proposed in the 1990 s, visual servoing for mobile platforms has undergone substantial development9. Espiau et al. developed the classical approach, which is based on minimizing the error between desired and observed visual features in image space10. Although these techniques work well for industrial manipulators with limited workspaces, they have significant drawbacks when used for wheelchair navigation in dynamic human environments11. Applications for wheelchairs must be more resistant to environmental fluctuations and occlusions. Some of these problems were already tackled by Hafez12 using hybrid objective functions that take visibility constraints into account. The fundamental challenge of maintaining robust visual tracking under changing conditions has been partially addressed by recent work on mixed-frame visual servoing13. Prior efforts to modify visual servoing for wheelchairs have generally either sacrificed functionality or performance, using computational resources that are not available on low-cost wheelchair platforms or using simplified feature representations that are not robust14.
Visual servoing in assistive mobility contexts benefits greatly from the histogram of oriented gradients (HOG) descriptor, which was initially made popular in pedestrian detection applications15,16. HOG representations capture the structural distribution of gradients across an image, offering inherent robustness to partial occlusions and illumination changes, in contrast to point features that are easily lost. However, the computational complexity of HOG extraction and matching has historically prevented its use in real-time control applications. While Hafez et al.17 successfully integrated visual servoing and navigation with convex optimization for improved path planning, their work did not address the real-time processing demands required for direct visual servoing on low-power hardware.
Based on existing studies in literature, 87% of wheelchair users prefer smooth, predictable motion over raw speed18 and that 92% need dependable operation in variable lighting conditions19 highlight the clinical relevance of this work. In fact, conventional navigation systems frequently fall short of these requirements, necessitating carefully regulated environments or causing jerky movements20. Through its robust visual processing and adaptive error handling, the developed qualitative approach specifically addresses these user requirements11.
This work presents three key developments in visual servoing for wheelchair navigation. First, a new qualitative control framework is introduced that employs an activation-based method in place of conventional error minimization21,22. The suggested approach creates confidence intervals that better suit the requirements of assistive navigation, whereas traditional visual servoing aims for precise convergence to a target image23. This is especially helpful when addressing the intrinsic unpredictability of real-world settings, where ideal feature matching is neither strictly required nor feasible11. Even when features are partially obscured, navigation remains smooth because the control law automatically adapts its behavior based on the quality of visual information available10.
Second, a thorough optimization pipeline is created to enable HOG-based visual servoing on wheelchair hardware with low power consumption. This paper identifies and addresses three major limitations-inefficient histogram computation, suboptimal memory access patterns, and needless precision in intermediate calculations-by methodically analyzing the computational bottlenecks in traditional implementations. While retaining the accuracy needed for stable control, the optimized Python implementation outperforms naive implementations by a factor of 68, achieving 5.5 frames per second on Raspberry Pi 4 hardware24. This represents the first wheelchair-class low-cost hardware demonstration of a real-time capable HOG-based visual servoing system.
Third, building on recent developments in low-cost assistive systems25, a comprehensive system integration is presented addressing real-world deployment challenges26. The design integrates robust motor control through custom TCP protocols, low-latency streaming with Flask, and effective video capture with Raspberry Pi cameras27. Special focus is placed on graceful degradation and fail-safe operation, which are crucial specifications for assistive devices where system failures could have major repercussions28. The implementation shows how sophisticated visual servoing algorithms can be modified for wheelchair platforms in the real world without sacrificing dependability or safety11.
While qualitative visual servoing was introduced by Remazeilles et al.21, and histogram-based visual servoing was advanced by Bateux & Marchand14, this work presents the first integration of a qualitative (activation-based) control framework with HOG features for assistive navigation. Essentially, this work contributes significantly to the field of visual servoing research from a theoretical standpoint7. By adding activation-based error modulation, the qualitative control framework expands on conventional servoing theory and offers a logical method for managing ambiguous visual information21. By adding spatial gradient information, the HOG interaction matrix derivation improves histogram-based servoing14, and the optimization methods set new standards for real-time performance29. Extensive experimental results demonstrating consistent convergence across various navigation scenarios validate these theoretical advancements11.
The main contributions of this work can be briefed as follows:
Activation-based qualitative HOG-based visual servoing framework - Introduction of a novel control approach that replaces conventional error minimization with activation-based modulation, enabling robust navigation even under partial occlusion or uncertain visual conditions.
Computationally optimized qualitative HOG-based visual servoing for low-cost hardware - Design and implementation of an optimized processing pipeline for HOG features, achieving real-time performance (5.5 FPS) on Raspberry Pi 4 while retaining the stability required for assistive navigation.
End-to-end integration on a power wheelchair platform - Development of a complete, deployable assistive navigation system incorporating robust motor control, low-latency video streaming, and fail-safe mechanisms, ensuring reliability and safety in real-world conditions.
Theoretical and empirical advancements in visual servoing - Extension of conventional servoing theory to handle ambiguous visual information through activation-based error modulation, derivation of a spatial-gradient-enhanced HOG interaction matrix, and extensive experimental validation across diverse navigation scenarios.
The remainder of this paper is structured as follows. Section 2 offers a thorough review of related work in wheelchair navigation and visual servoing. The theoretical underpinnings of the proposed qualitative HOG-based methodology are described in Section 3. The system integration and computational optimization framework are presented in Section 4. Experimental results from both controlled tests and real-world deployments are reported in Section 5. The implications for assistive technology are finally concluded in Section 6, which also suggests promising avenues for further study.
Related work
Classical visual servoing foundations
The seminal work of Espiau, Chaumette, and Rives9 laid the theoretical groundwork for visual servoing in the early 1990 s, and more recent extensions such as mixed-frame visual servoing have shown promise for assistive applications13. Current perspectives on visual servoing schemes highlight that the selection of robust features and stability under varying environmental constraints remain the most critical issues in system construction30. A crucial idea of the interaction matrix, which links feature motion in the image plane to camera velocity, was introduced with the development of image-based visual servoing (IBVS)7. By probabilistically combining 2D image features with 3D depth cues, Hafez and Jawahar31 expanded upon these foundations; however, their approach was less suitable for wheelchair navigation because it assumed structured environments. Although these techniques work well for industrial manipulators working in controlled settings, they have serious drawbacks when used on mobile platforms, such as wheelchairs. The main difficulty is keeping features visible while moving, which is made worse in dynamic human settings where occlusions are common11. Although recent work on monocular depth estimation offers potential solutions2, position-based alternatives prove equally problematic due to their sensor requirements6. Furthermore, modern control approaches have introduced fixed-time and prescribed-time image-based visual servoing (IBVS) to handle asymmetric time-varying output constraints, ensuring more predictable convergence32.
Wheelchair-specific navigation systems
Numerous sensor modalities, each with unique tradeoffs, have been investigated in research on autonomous wheelchair navigation. A comprehensive systematic review of the past five years emphasizes that while 3D localization has improved, cost-effective vision-based solutions remain a primary research frontier for unstructured environments33. For obstacle avoidance, early systems such as the NavChair28 used ultrasonic arrays; later versions added laser scanners for increased precision4. With Pasteau et al.11 showcasing efficient corridor navigation through vanishing point tracking, vision-only systems became a viable substitute. In their later work, Bateux and Marchand14 presented histogram-based visual servoing for wheelchairs, which demonstrated greater resilience to changes in lighting than point-feature approaches. Although recent efforts to use deep learning have produced impressive results, real-time wheelchair control is still computationally prohibitive24. Pradeep et al.22 recently used semantic segmentation to demonstrate effective sidewalk perception, and Santana et al.8 demonstrated how wheelchair control can be improved by intuitive interfaces. Nevertheless, these systems frequently sacrifice either functionality or performance24, underscoring the need for reliable yet effective solutions. Recent breakthroughs demonstrate that integrating deep learning with visual servoing on low-power platforms like the Raspberry Pi can effectively resolve these trade-offs by optimizing detection and tracking algorithms34.
Histogram-based techniques in robotics
Due to its intrinsic resilience to changes in illumination and partial occlusions, the HOG descriptor-which was first created for pedestrian detection15-was later modified for robotic applications14. Because HOG-based techniques encode structural information about the entire scene instead of depending on discrete features, they provide clear advantages over traditional methods in visual servoing. Hafez et al.17 integrated these advantages with convex optimization for path planning; however, the problem of real-time HOG processing for direct servo control was not resolved. The adoption of HOG extraction and matching in real-time control systems has been limited due to its computational complexity24. Prior implementations of this technique usually required powerful workstations that were inappropriate for embedded wheelchair platforms27.
Identified research gaps
The literature that is currently available reveals two important gaps. First, real-time implementation on low-power hardware has not been possible due to the computational demands of HOG processing29. Even optimized C++ implementations have trouble achieving adequate frame rates on single-board computers, as Park et al.24 showed. Second, precise feature matching is frequently superfluous and potentially disruptive in assistive navigation, where classical control approaches lack the flexibility required21. Remazeilles et al.’s qualitative visual servoing framework21 partially addresses this, but it has not been modified for wheelchair constraints or combined with HOG features11.
Based on the above discussions, the proposed qualitative HOG-based visual servoing (QHOGVS) system directly addresses these gaps through three significant contributions. In order to provide flexible error minimization that is suited to the needs of wheelchair navigation, an activation function is first introduced. Second, the Python optimization pipeline overcomes earlier computational limitations to achieve real-time performance (5.5 FPS) on Raspberry Pi hardware. Ultimately, the full system integration shows practical feasibility, with experimental results confirming the theoretical developments through consistent convergence across a range of navigation scenarios.
Methodology
System overview
The proposed qualitative HOG-based visual servoing (QHOGVS) architecture, as illustrated in Figure 1, consists of three integrated modules:
Vision Module: Captures monocular images via a Raspberry Pi Camera and extracts HOG features using the optimized pipeline.
Control Module: Implements the qualitative visual servoing law that computes velocity commands based on the HOG feature distance and activation-based error modulation.
Optimization & Real-Time Execution Module: Manages memory-efficient histogram computation, vectorized operations, and low-latency communication with the motor drivers, ensuring real-time performance on embedded hardware.
Fig. 1.
System architecture of the proposed qualitative HOG-based visual servoing wheelchair navigation system.
This architecture addresses the dual challenges of robust visual tracking and computational efficiency.
Vision module: HOG features
Visual features representation
An image is represented as a rectangular array of pixels with horizontal and vertical coordinates (x, y). The time derivative of visual feature s in relation to camera velocity v is written as:
![]() |
1 |
in which L(s, Z) represents the interaction matrix related to s. For image coordinates
of a three-dimensional point
, one gets the interaction matrix as10:
![]() |
2 |
A histogram is a statistical object that connects each pixel information value in a picture to a certain ’bin’. This pixel value can be either a vector if the picture is made up of numerous planes, as in color histograms, or a scalar, as in the intensity and HOG histograms. Consequently, the histogram represents the statistical distribution of pixel values within the image. For instance, the intensity histogram may be written as follows14:
![]() |
3 |
wherein, x represents the two-dimensional pixel position in the image plane, the pixel intensity
for 8-bit grayscale images,
is the number of pixels in the image I(x), and
denotes the Kronecker delta function which is described as follows:
![]() |
4 |
The HOG descriptor extends this concept by capturing the distribution of gradient orientations within local image cells, providing robustness to illumination changes and partial occlusions15,16. The extraction pipeline includes gradient computation, orientation binning, and block normalization.
Histogram distance metric: matusita distance
Identifying the motion that enables a dynamic system to reduce the deviation between the actual view and the intended view of the scene is commonly known as visual servoing7. A distance between these two views must be defined in order to compare them in order to carry out this action. We have to set an appropriate distance for comparison because the pictures in our situation are characterized as histograms. Histograms are traditionally compared bin-wise. Matusita distance is the appropriate approach due to its statistical properties and suitability for histogram comparison14:
![]() |
5 |
where pI(i) and
denote the probability densities of the
bin for the current and desired histograms, respectively, and Nc is the number of bins.
Control module: qualitative visual servoing
Classical visual servoing framework
The classical visual servoing control law aims to minimize the error determined by the difference between desired and final values. In visual servoing, this value is calculated based on the vision sensor. Let e be the error function between the actual and targeted sensor-based values7:
![]() |
6 |
Here, v denotes the camera’s instantaneous velocity, and
represents the interaction matrix that indicates the way e changes in relation to velocity v. To obtain exponential convergence, we consider:
![]() |
7 |
By combining these equations, we get the classical control law7:
![]() |
8 |
in which
represents the pseudo-inverse approximation of
and
is a real positive number for tuning convergence speed.
HOG-based visual servoing and interaction matrix
The interaction matrix links the camera motion to the variation of the visual features. Using this approach, the general expression for the interaction matrix is given as
, where r represents the camera position. It relates the camera motion to the variation of the cost function D(.). Following that, it is described by:
![]() |
9 |
in which
particularly relies on the considered histogram, each of which is identified by its probability density descriptor
. The derivative
is the core term linking HOG probabilities to camera motion. It is obtained by applying the chain rule to the image formation and HOG computation pipeline:
![]() |
10 |
Here,
is the vector of image gradients within the relevant HOG cell,
is the standard image motion Jacobian, and
encodes how changes in local gradients affect the vote count in the i-th orientation bin of the HOG histogram. This term is derived from the bilinear interpolation and orientation voting process inherent to HOG descriptor calculation.
Then, by expanding the derivative of the distance metric, one gets14:
![]() |
11 |
Applying the chain rule and using the expression for
derived above, we obtain the complete representation of the interaction matrix for histogram-based servoing14:
![]() |
12 |
This interaction matrix enables control of one degree of freedom. To formulate a complete control law, such matrix is incorporated into a nonlinear minimization framework, such as the Levenberg-Marquardt algorithm, yielding:
![]() |
13 |
where
and
and
represent for positive scalars. It is important to remember that
has a size of
. When included into a control scheme, this enables us to regulate just 1 degree of freedom (DoF) due to the under-constrained nature of the minimization problem. In order to manage n degrees of freedom, the picture is split into several equal sections, each of which is linked to a
histogram. The resultant interaction matrices and error vectors must be stacked. After that, the global interaction matrix is written as14:
![]() |
14 |
where
represents the interaction matrix, determined by the i-th histogram distance, utilizing the control rule described in the preceding section. This makes
and the cost term
in equation (13) a
vector. In fact, a six degrees of freedom robot may be controlled by utilizing six or more histograms from the picture. The under-constrained nature of the equation (13) minimization problem has now been eliminated14.
Visual navigation as a visual servoing task
The path following phase utilizes image-based visual servoing to drive the wheelchair along the planned path successively from one image to the next. In histogram-based visual servoing, a distance function
between the histogram
of the current image and the histogram
of the desired image is minimized. To achieve this minimization, the Jacobian matrix is computed as:
![]() |
15 |
where
represents the wheelchair pose. The Levenberg-Marquardt algorithm is used for minimization. The desired histogram
is always obtained from the next intermediate image
to enable the control law to follow the planned sequence of images.
Qualitative visual servoing is adopted to maintain smooth trajectories and visibility constraints. To implement qualitative visual servoing, an activation function
is designed with
when sufficiently close to the desired next image
. The function
and the derived Jacobian matrix
generate the control signal that regulates the distance function
while driving the vehicle toward the desired image. Formally, the control signal
is expressed as:
![]() |
16 |
where
is the next desired image in the path and I is the current observed image. The function
implements the qualitative visual servoing paradigm for HOG-based features, a novel contribution of this work.
The smoothness and visibility constraints require that between any two consecutive images
and
along the planned path, a minimum number of common visual features must be maintained. This constraint ensures the visual servoing control law can reliably drive the vehicle between successive positions.
Figure 2 illustrates the image transition process. Navigation begins with current image
and desired image
. When the distance function D reaches a defined threshold and sufficient common features exist between
and the current image I, the target switches to
. This automatic switching continues until reaching the final target image
.
Fig. 2.
Image transition process.
Qualitative visual servoing control with activation-based modulation
Qualitative visual servoing extends classical approaches by introducing error tolerance through confidence intervals. The method requires the distance metric D to reach a confidence interval bounded by
, rather than exact convergence21. The control law:
![]() |
17 |
is modified to handle visibility constraints that define regions rather than exact poses. The integration of qualitative activation proceeds in two mathematical steps. First, the activation matrix
modulates the error vector, creating the qualitative error
. Second, it also modulates the interaction matrix, creating the qualitative interaction matrix
. This dual modulation seamlessly incorporates the activation into the control law’s structure. The error condition is satisfied when:
![]() |
18 |
The qualitative error function incorporates an activation matrix:
![]() |
19 |
![]() |
20 |
where the activation function
provides smooth transitions:
![]() |
21 |
The parameters
(smoothness) and
(transition range) control the activation characteristics. Figure 3 plots the activation function h(x) for four values of the smoothness parameter
(right to left). The horizontal axis represents the excess distance
, while the vertical axis shows the activation level from 0 (inactive) to 1 (fully active). The curve for
exhibits a gradual, smooth transition over a wide range of x, which helps dampen control sensitivity under noisy visual conditions. In contrast,
produces a near-step response, suitable for environments where binary visibility decisions are acceptable. The fixed transition range
defines the interval over which activation rises from 0 to 1.
Fig. 3.

The activation function for different values of
, from right to left.
The interaction matrix approximation:
![]() |
22 |
leads to the final control law21:
![]() |
23 |
where
. This formulation naturally handles the 1–2 degrees of freedom typical in wheelchair navigation tasks.
Optimization module: computational optimization framework
This section describes how the baseline Python implementation was reduced from an impractical 0.08 FPS to a real-time capable 5.5 FPS system through mathematical reformulation, vectorization, and memory hierarchy optimization - while maintaining the theoretical guarantees of the HOG-based control law established in Section 3. The shift from theoretical visual servoing to real-time implementation on resource-constrained hardware required a systematic optimization of the computational pipeline.
Performance bottlenecks and analysis
At first, the Python implementation demonstrated substantial computational limitations, processing
pixel images with 5 histogram bins at a mere 0.08 FPS. Three significant bottlenecks were identified through profiling: (1) nested loops in histogram computations that took up 92% of runtime; (2) needless memory reallocation during the creation of HOG descriptors; and (3) inadequate precision management using double-precision floating points when single-precision would be adequate. In order to attain real-time performance on hardware with limited resources, these findings required a methodical optimization approach.
Mathematical reformulation and vectorization
The key algorithms had to be fundamentally reformulated in order to transition from loop-based to vectorized operations. To illustrate the impact of vectorization, consider the computation of the
distance between two images
and
. The original loop-based implementation:
![]() |
24 |
was replaced by the equivalent vectorized Frobenius norm:
![]() |
25 |
where
denotes the Frobenius norm. By applying this vectorization principle to all histogram operations, the HOG computation time was reduced by 98.6% while the numerical outputs remained identical within floating-point tolerance.
Numerical optimization and memory management
Two significant improvements were made to the B-spline calculations: first, by using SciPy’s compiled functions instead of custom Python implementations, the number of operations was reduced by 42% through kernel fusion; second, memory access patterns were optimized to remove dynamic memory allocation overhead while maintaining the mathematical properties of the original implementation.
Distributed architecture design
Three innovations were made to the system architecture to address latency constraints. By using Flask-based streaming, the vision pipeline achieves a latency of less than 1 ms, which is 3000
faster than traditional Motion packages. With hardware-enforced safety via the Sabertooth 32 A driver’s 200ms watchdog timer, motor control uses a lock-free TCP protocol that ensures 100 Hz command updates. When feasible, ROS nodes use zero-copy shared memory to handle inter-process communication, which lowers IPC overhead by 73%.
System design and implementation
This section describes the hardware and software architecture of the proposed wheelchair platform, with emphasis on the components and interfaces required for real-time qualitative HOG-based visual servoing. Figures 4, 5, 6 summarize the physical setup, the deployed system, the integration diagram, and the inter-process communication mechanism used during development.
Fig. 4.
Hardware architecture of the wheelchair platform. The system comprises a Raspberry Pi 4 with an on-board camera for image acquisition, a Sabertooth 32 A motor driver controlling two brushed DC motors, and separate power supplies for motor actuation (24 V wheelchair battery) and embedded computation (5 V power bank).
Fig. 5.
End-to-end system integration on the powered wheelchair. The Raspberry Pi acquires images from the on-board camera, runs the qualitative HOG-based visual servoing pipeline, and transmits velocity commands to the Sabertooth motor driver. Data-flow links indicate the vision stream, command channel, and monitoring feedback used during experiments.
Fig. 6.
Inter-process communication (IPC) used to interface C++ (ViSP) modules with the Python-based wheelchair control stack. A local socket channel exchanges ASCII command packets (e.g., left/right wheel commands) and returns runtime logs for monitoring and post-processing.
Hardware system design
The wheelchair is driven by two brushed DC motors controlled through a Sabertooth 32 A motor driver. The driver can supply up to 32 A continuous current and supports independent speed and direction control for each motor. In our setup, the Sabertooth is connected to the Raspberry Pi via USB using a virtual serial (CDC) interface for command transmission. The wheelchair motors are powered from the wheelchair’s native 24 V battery, whereas the Raspberry Pi is powered by an external 5 V power bank to isolate the computation unit from motor transients.
Figure 4 provides an overview of the main hardware components and their functional connections, including the camera module, the embedded controller (Raspberry Pi 4), the motor driver, and the power distribution. The Sabertooth driver also supports hardware braking through two digital inputs (one per motor), which are used as part of the safety layer for assistive operation.
Software system design
The software stack is implemented primarily in Python on Ubuntu Linux running on the Raspberry Pi. The Robot Operating System (ROS) is used as the middleware layer to modularize the system into nodes and to manage inter-node communication. Image acquisition is performed through the Raspberry Pi camera interface and published to the processing pipeline via a dedicated camera node (e.g., cam2image). Wheel actuation is handled through a differential-drive controller (e.g., diff_drive_controller), which converts the commanded linear and angular velocities into left/right wheel commands. In addition to the native Python implementation of the qualitative HOG-based controller, the software design supports integration with C++-based visual servoing libraries (notably ViSP) through lightweight inter-process communication, enabling rapid prototyping and comparative experiments without rewriting the full system in a single language.
HOG descriptor implementation parameters
In this study, the pipeline operates on grayscale images with a fixed resolution of 165
165 pixels. The descriptor is computed with the following structure: Cell Size: 8
8 pixels, Block Size: 2
2 cells. Block Stride: 1 cell (resulting in 50% overlap between adjacent blocks). Number of Orientation Bins: 9 (covering the unsigned gradient range of 0 to 180 degrees). This configuration results in 20 blocks per row
20 blocks per column = 400 total blocks. Each block contributes a 36-dimensional vector (4 cells
9 bins). Consequently, the final HOG feature vector for a single image has a dimensionality of 400
36 = 14,400 elements. Before being used in the control law, this high-dimensional vector is compressed into a 5-bin histogram for computational efficiency, as described in the following sections.
System integration
Figure 5 summarizes the integrated hardware–software architecture on the wheelchair platform. The embedded computer (Raspberry Pi 4) receives monocular video from the camera module and executes the qualitative HOG-based visual servoing algorithm. Velocity commands are transmitted to the Sabertooth driver through a low-latency command channel, while status signals are exposed to a user-facing monitoring interface.
To reduce end-to-end latency and improve operational reliability, the vision subsystem uses lightweight streaming for visualization, while the control subsystem delivers motor commands at 100 Hz via a dedicated communication channel. A hardware watchdog mechanism on the motor driver provides a fail-safe stop when command updates are interrupted beyond a configured timeout (200 ms), which is critical for assistive deployment.
Integration with ViSP and inter-process communication
ViSP1 is a C++ software platform that provides a broad set of modules for robotic vision and visual servoing. We installed ViSP on the Raspberry Pi to leverage its mature visual servoing components for rapid prototyping and validation. However, directly rewriting the full Python-based wheelchair software stack in C++ would be impractical given the existing hardware interfaces and ROS nodes.
To bridge this gap, we implemented a lightweight ASCII-based communication protocol over local sockets to exchange commands and diagnostics between a C++ ViSP process and the Python control stack. For example, a command of the form m,10,20 specifies left and right motor command values. The socket-based interface enables ViSP to generate control outputs while allowing Python to remain responsible for motor I/O, logging, and higher-level orchestration. Figure 6 illustrates the overall IPC mechanism, including the bidirectional transfer of velocity commands and runtime logs (e.g., frame index and computed speed).
Custom TCP protocol for motor control
To ensure reliable and low-latency communication with the Sabertooth 32 A motor driver, a custom TCP-based command protocol is used. Each packet includes:
Header: a 2-byte start code (0xAA55),
Payload: two 16-bit wheel velocity commands (left and right; total 4 bytes),
Checksum: a 1-byte XOR checksum computed over the payload.
Packets are transmitted at 100 Hz over a dedicated link. A 200 ms watchdog timeout triggers an emergency stop if command updates are not received within the specified interval. On the Raspberry Pi, a lock-free queue is used to decouple command generation from network I/O, reducing blocking and improving timing stability under high CPU load.
Experimental evaluation
This section provides a comprehensive experimental validation of the proposed Qualitative HOG-based Visual Servoing (QHOGVS) system, substantiating its two primary claims: (i) the robust end-to-end integration of a qualitative visual servoing framework on a functional powered wheelchair platform, and (ii) the real-time feasibility of HOG-based servoing achieved through targeted computational optimization for low-cost embedded hardware. An example deployment of the wheelchair platform during a path-following experiment is shown in Figure 7.
Fig. 7.

Deployed wheelchair platform during an outdoor path-following experiment, showing the mounted camera and embedded compute unit used to run the qualitative visual servoing pipeline.
The evaluation is structured to progress from fundamental system functionality to detailed performance analysis. First, we demonstrate the system’s core capability in executing essential navigation tasks-specifically, corridor following and sidewalk following (see Figure 8 for a schematic representation of these tasks; supplementary video material is available). Second, we experimentally validate the proposed control law, providing quantitative evidence of its stable convergence behavior. Finally, we rigorously report computational performance metrics, quantifying the definitive impact of our optimization pipeline on achieving real-time operation.
Fig. 8.
Two examples of the following tasks, Corridore and Sidewalk following.
Remark 1
It is pertinent to clarify that this manuscript is situated within a broader, ongoing research program. Several complementary performance aspects-including detailed trajectory accuracy, quantitative path-following error under various conditions, systematic robustness analysis to occlusions and illumination variation, and formal comparisons with established navigation baselines-have been the specific focus of our concurrent, dedicated investigations (e.g35,36.,). These targeted studies provide in-depth analysis on those respective dimensions. Consequently, the present paper is deliberately scoped to establish and demonstrate the foundational, novel integration of the qualitative HOG-VS framework and its real-time optimization, with experimental results presented herein serving to conclusively validate these core contributions.
Two-stage target reaching using qualitative HOG-based visual servoing
This subsection experimentally validates a core functionality of autonomous navigation: the ability to sequentially reach multiple designated goals. We demonstrate a two-stage target-reaching task wherein the proposed qualitative HOG-based visual servoing controller successfully navigates the wheelchair from an initial pose to a final goal via a single predefined intermediate landmark. This experiment validates the controller’s sequential logic and its capacity for automated goal switching based on qualitative visual feedback. The task is successfully completed upon the wheelchair’s arrival at the final target with the required alignment precision.
The experimental progression is documented in Figure 9, providing an external view of the sequence. For clarity in this experiment, distinct AprilTag markers are used to demarcate the intermediate and final targets. Figure 9(a) shows the initial configuration with the wheelchair positioned relative to both targets. In Figure 9(b), the wheelchair has converged to the intermediate target, and Figure 9(c) confirms its subsequent successful transition to and arrival at the final target.
Fig. 9.
External view of the two-stage target-reaching experiment. (a) Initial wheelchair pose relative to the intermediate and final AprilTag targets. (b) The wheelchair after reaching the intermediate target. (c) The wheelchair after reaching the final target. AprilTags are used only for target identification in this experiment.
The corresponding ego-centric perspective from the onboard camera, which includes the system’s real-time target-selection status, is presented in Figure 10. Figure 10(a) captures the moment the intermediate target is detected and actively selected as the current navigation goal (indicated in blue). Figure 10(b) shows the successful attainment of the intermediate target (marked in green), triggering the autonomous switch of the reference to the final target (now highlighted in blue). Finally, Figure 10(c) demonstrates the completion of the task as the wheelchair reaches the final goal. A video demonstrating the complete execution of this task is available in the public GitHub repository provided in the Data Availability Statement.
Fig. 10.
Ego-centric camera view during the two-stage target-reaching task. (a) The intermediate target is detected and selected as the next goal (blue). (b) The intermediate target is reached (green) and the controller switches to the final target as the next goal (blue). (c) The final target is reached, completing the task.
Control law validation
Initially, prerecorded video sequences from our university campus environment were used to validate the proposed QHOGVS control law. The first testing configuration is shown in Figure 11, which presents the initial visual alignment challenge. The left panel shows the current image
captured at the start of a following task. The center panel is the target image
representing the desired pose. The right panel displays the absolute difference
, where bright pixels (high intensity) correspond to large discrepancies in edge locations and gradient magnitudes-precisely the features encoded by the HOG descriptor. In fact, significant misalignment between the target and current views is evident in the difference image, especially in the distribution of edges and structural elements. The initial navigation difficulties are highlighted by the bright areas in the difference image, which match the areas where visual features are most dissimilar. The majority of these visual disparities are found in particular areas of the frame that have crucial navigational structural components.
Fig. 11.
The initial image
(left), the target image
(center), along with the difference between both images
(right).
The initial instability in the control signals is shown by the velocity profiles in Figure 12, where velocities showed oscillatory behaviour during convergence. In order to achieve stable performance, these oscillations show that the control loop needs to be smoothed and the gain adjusted. The system frequently overshot its target because the original implementation overcompensated for visual errors, as shown by the alternating peaks in velocity commands.
Fig. 12.

Initial results for velocity profiles showing 2 DoF only.
As shown in Figure 13, where the velocity components smoothly decay to near-zero within the first 60 frames as the wheelchair approaches the target position, we were able to achieve stable convergence after multiple iterations of debugging and parameter tuning. The controller parameters for this and all subsequent experiments were fixed at:
for the four histogram regions. The final convergence profile shows how well our parameter optimization worked to produce stable, smooth motion. As the system gets closer to the target position, it gradually reduces both velocity components, demonstrating how it makes increasingly finer adjustments. The final implementation’s appropriate decoupling of the control dimensions is confirmed by the coordinated decrease of velocity profiles. Figure 14 shows the experiment’s initial and final images as well as the HOG features.
Fig. 13.

The velocity profiles obtained in the experiments of HOG-based visual servoing.
Fig. 14.
The initial (left) and final (center) images, along with the difference between the images (right).
An eye-in-hand setup with 300 frames between the initial and target positions was used to further validate the system’s performance (Figure 15). The sequence of images demonstrates how the system can adapt to large changes in viewpoint while still tracking visual features. The sequence demonstrates how consistent visual elements allow for continuous navigation even when perspectives change between frames.
Fig. 15.
Initial (left) and final (center) images, along with the difference between the images (right), captured from the experiment on a pre-recorded video.
The velocity profiles, shown in Figure 16, exhibit consistent exponential convergence, indicating well-controlled velocity commands that stay within safe operating bounds. The generalisability of the control approach is confirmed by the velocity curves’ similar convergence characteristics across various motion directions. The seamless transitions between velocity commands show that the visual servoing system’s non-linearities are being handled effectively.
Fig. 16.

The velocity profiles across various motion directions using an eye-in-hand configuration on the recorded video experiment.
The Matusita distance metric is tracked during the trajectory in Figure 17. The constant improvement in visual alignment during navigation is confirmed by the distance metric’s steady decline. Even during the maneuver’s intermediate stages, the declining trend stays constant, proving the qualitative approach’s resilience. The system’s increasing visual alignment as it moves closer to the target is quantitatively demonstrated by the distance metric. This distance measure’s monotonic decline supports our qualitative methodology and validates the theoretical stability analysis. The control law’s dependability in real-world navigation scenarios is demonstrated by its consistent performance across several validation scenarios.
Fig. 17.
The distance value using an eye-in-hand configuration on the recorded video experiment.
System performance
The optimized implementation achieved 5.5 FPS on Raspberry Pi 4 (1.5GHz), meeting the 10 Hz requirement for stable visual servoing. The summarization of key metrics comparison is provided in Table 1.
Table 1.
Optimization Performance Metrics.
| Stage | FPS | Latency (ms) | Memory (MB) | Speedup |
|---|---|---|---|---|
| Baseline | 0.08 | 12,500 | 342 | 1
|
| Vectorized | 1.2 | 833 | 210 | 15
|
| Memory Opt. | 3.7 | 270 | 127 | 46
|
| Final System | 5.5 | 180 | 89 | 69
|
According to theoretical analysis, these results are close to the Amdahl-optimal speedup for the hardware constraints given. The remaining bottleneck, which accounts for 37% of runtime, is caused by inevitable image sensor I/O operations.
The system’s ability to navigate between successive target images while adhering to visibility constraints was dependable. The finished version met the real-time requirements for stable wheelchair control with 5.5 frames per second on a Google Colab (2.3GHz CPU) and 2 frames per second on a Core i3-6006U laptop (2.0GHz).
In fact, the system achieved 5.5 FPS on the Raspberry Pi 4 (ARMv8) due to targeted optimizations such as memory access pattern tuning, vectorized NumPy operations, and efficient use of the Pi’s fixed-function camera pipeline. In contrast, the same Python code ran at only 2 FPS on a Core i3-6006U laptop because the laptop’s general-purpose OS introduced higher scheduling latency and lacked the dedicated camera-hardware integration utilized on the Pi. This underscores the importance of platform-aware optimization for real-time embedded vision systems.
Computational optimization
Python’s interpreted nature presented substantial computational challenges for the implementation. For
images with 5 bins, the first nested-loop implementations only managed 0.08 FPS. Three significant advancements were achieved through methodical optimisation. The L2 distance computation time was reduced from 435ms to
per iteration by first replacing all Python loops with vectorisation using NumPy. Second, speed was increased by
by replacing custom B-spline implementations with pre-compiled SciPy functions. Third, pre-allocating buffers for histogram calculations optimised memory access patterns. Together, these adjustments raised performance to 5.5 frames per second on Raspberry Pi 4 hardware, satisfying real-time demands for the implementation of an effective HOG descriptor for wheelchair navigation.
Remark 2
The implemented control architecture employs a dual-rate scheme: a high-priority, fixed-frequency (100 Hz) motor control loop, and a lower-frequency vision processing loop (5.5–5.8.5.8 Hz). The velocity command computed by the visual servo is held constant and interpolated by the 100 Hz motor controller, ensuring smooth actuator output. Profiling reveals the total vision pipeline latency is
180 ms, dominated by sensor I/O (
110 ms, 61%) and HOG computation (
65 ms, 36%). Communication latency is negligible (<5 ms). A direct port of the optimized HOG core to C++ yields
10 FPS, confirming the Python overhead is managed; the primary bottleneck is the camera sensor’s hardware-limited read time, common to all implementations on this platform. While a formal stability margin analysis is a valuable direction for future theoretical work, the empirical evidence-consistent exponential convergence, smooth velocity profiles, and monotonic decrease in the visual error metric across all trials-demonstrates the practical closed-loop stability of the implemented system under the tested conditions.
Conclusion
The theoretical foundation and real-world application of a qualitative HOG-based visual servoing in wheelchair navigation applications are presented in this paper. Through optimised Python implementation, it developed a system that processes data at 5.5 frames per second on Raspberry Pi hardware, which is 68
faster than the original versions. Experimental results confirm that the control law validation exhibits stable convergence in prerecorded campus situations, with velocity profiles smoothly decaying to zero. The monotonic decline of Matusita distance metrics indicates that the system is able to navigate between successive target images while preserving feature visibility. Theoretical visual servoing fundamentals and real-world assistive technology applications are effectively connected by this work, which provides methodological advancements as well as useful insights for creating intelligent wheelchair systems. Future work will focus on C++ implementation for native embedded performance and integration with depth sensing for enhanced obstacle avoidance.
Acknowledgements
The author would like to acknowledge Mr. Yahya Tawil and Mr. Ismail Haj Osman (TÜBİTAK, Project No. 117E173), for their contribution to the early experiments of this work. The Titan Xp GPU used in this research was generously donated by NVIDIA Corporation.
Author contributions
A. H. Abdul Hafez: Conceptualization; Project administration; Funding acquisition; Investigation; Methodology; Formal analysis; Writing–original draft; Writing–review & editing; Software; Implementation; Validation; Visualization; Experiments; Writing–review & editing.
Funding
This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Al-Ahsa, Saudi Arabia (Grant No. KFU260873).
Data availability
Data Availability Statement: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Source code, data files, and video demonstrations supporting the findings of this study are publicly available at: https://github.com/Abdulhafez2019/QHOGVS.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Azizian, M., Khoshnam, M., Najmaei, N. & Patel, R. V. Visual servoing in medical robotics: A survey. Part I: Endoscopic and direct vision imaging-techniques and applications. Int. J. Med. Robot. Comput. Assist. Surg.10(3), 263–274 (2014). [DOI] [PubMed] [Google Scholar]
- 2.Gallo, V., Shallari, I., Carratù, M., Laino, V. & Liguori, C. Design and characterization of a powered wheelchair autonomous guidance system. Sensors24(5), 1581 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gharghan, S. K., Al-Kafaji, R. D., Mahdi, S. Q., Zubaidi, S. L. & Ridha, H. M. Indoor localization for the blind based on the fusion of a metaheuristic algorithm with a neural network using energy-efficient wsn. Arab. J. Sci. Eng.48(5), 6025–6052 (2023). [Google Scholar]
- 4.Simpson, R. C. Smart wheelchairs: A literature review. J. Rehabil. Res. Dev.10.1682/jrrd.2004.08.0101 (2005). [DOI] [PubMed] [Google Scholar]
- 5.Hwang, H. et al. Guidenav: User-informed development of a vision-only robotic navigation assistant for blind travelers. arXiv preprint arXiv:2512.06147 (2025).
- 6.Leaman, J. & La, H. M. A comprehensive review of smart wheelchairs: Past, present, and future. IEEE Trans. Hum.-Mach. Syst.47(4), 486–499 (2017). [Google Scholar]
- 7.Chaumette, F. & Hutchinson, S. Visual servo control. I. Basic approaches. IEEE Robot. Autom. Mag.13(4), 82–90 (2006). [Google Scholar]
- 8.Santana, J. M. et al. Design and implementation of an interactive system for service robot control and monitoring. Sensors25(4), 987 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Espiau, B., Chaumette, F. & Rives, P. A new approach to visual servoing in robotics. IEEE Trans. Robot. Autom.8(3), 313–326 (1992). [Google Scholar]
- 10.Chaumette, F. & Hutchinson, S. Visual servo control. II. Advanced approaches [tutorial]. IEEE Robot. Autom. Mag.14(1), 109–118 (2007). [Google Scholar]
- 11.Pasteau, F., Narayanan, V. K., Babel, M. & Chaumette, F. A visual servoing approach for autonomous corridor following and doorway passing in a wheelchair. Robot. Auton. Syst.75, 28–40 (2016). [Google Scholar]
- 12.Hafez, A. H. H. A. A. Visual servo control by optimizing hybrid objective function with visibility and path constraints. J. Control Eng. Appl. Inf.16(2), 120–129 (2014). [Google Scholar]
- 13.Arif, Z. & Fu, Y. Mix frame visual servo control framework for autonomous assistive robotic arms. Sensors22(2), 642 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bateux, Q. & Marchand, E. Histograms-based visual servoing. IEEE Robot. Autom. Lett.2(1), 80–87 (2016). [Google Scholar]
- 15.Lowe, D. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.60(2), 91–110 (2004). [Google Scholar]
- 16.Mehboob, R., Javed, A., Dawood, H. & Dawood, H. Histogram of low-level visual features for salient feature extraction. Arab. J. Sci. Eng.47(8), 10589–10604 (2022). [Google Scholar]
- 17.Hafez, A. H. A., Nelakanti, A. K. & Jawahar, C. Path planning for visual servoing and navigation using convex optimization. Int. J. Robot. Autom.30(3), 299–307 (2015). [Google Scholar]
- 18.Mortenson, W. B., Miller, W. C., Backman, C. L. & Oliffe, J. L. Association between mobility, participation, and wheelchair-related factors in long-term care residents who use wheelchairs as their primary means of mobility. J. Am. Geriatr. Soc.60(7), 1310–1315 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Meyers, A. R., Anderson, J. J., Miller, D. R., Shipp, K. & Hoenig, H. Barriers, facilitators, and access for wheelchair users: Substantive and methodologic lessons from a pilot study of environmental effects. Soc. Sci. Med.55(8), 1435–1446 (2002). [DOI] [PubMed] [Google Scholar]
- 20.Torkia, C. et al. Power wheelchair driving challenges in the community: A users’ perspective. Disabil. Rehabil. Assist. Technol.10(3), 211–215 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Remazeilles, A., Mansard, N. & Chaumette, F. A qualitative visual servoing to ensure the visibility constraint. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4297–4303 (2006).
- 22.Pradeep, V. et al. Self-supervised sidewalk perception using fast video semantic segmentation for robotic wheelchairs in smart mobility. Sensors22(14), 5241 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Nguyen, N. H., Klotsa, D., Engel, M. & Glotzer, S. C. Emergent collective phenomena in a mixture of hard shapes through active rotation. Phys. Rev. Lett.112(7), 075701 (2014). [DOI] [PubMed] [Google Scholar]
- 24.Park, J. J., Lee, S. & Kuipers, B. Discrete-time dynamic modeling and calibration of differential-drive mobile robots with friction. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6510–6517 (2017).
- 25.Acosta, D., Fariña, B., Toledo, J. & Sanchez, L. A. Low cost magnetic field control for disabled people. Sensors23(2), 1024 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gulati, S. & Kuipers, B. High performance control for graceful motion of an intelligent wheelchair. In: IEEE International Conference on Robotics and Automation, pp. 3932–3938 (2008).
- 27.Johnson, B. W. & Aylor, J. H. Dynamic modeling of an electric wheelchair. IEEE Trans. Ind. Appl.5, 1284–1293 (1985). [Google Scholar]
- 28.Levine, S. P. et al. The navchair assistive wheelchair navigation system. IEEE Trans. Rehabil. Eng.7(4), 443–451 (1999). [DOI] [PubMed] [Google Scholar]
- 29.Uğur, E., Kara, T. & Abdul Hafez, A. Modeling and simulation of a wheelchair system with motion control. In: 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 237–242 (2021).
- 30.Xu, F., Wang, H., Gao, H.: What matters in constructing a visual servoing scheme: A review of key issues and solutions. IEEE/ASME Transactions on Mechatronics (2025).
- 31.Hafez, A. A. & Jawahar, C. Probabilistic integration of 2d and 3d cues for visual servoing. In: 2006 9th International Conference on Control, Automation, Robotics and Vision, pp. 1–6 (2006). IEEE.
- 32.Lin, J. et al. Fixed-time and prescribed-time image-based visual servoing with asymmetric time-varying output constraint. Robotics10.3390/robotics14120190 (2025). [Google Scholar]
- 33.Bakouri, M. et al. Analysis of autonomous wheelchair navigation technologies in the past five years: A systematic review. International Journal of Online & Biomedical Engineering. 21(3), (2025).
- 34.Hao, Z., Zhang, D. & Honarvar Shakibaei Asli, B. Motion prediction and object detection for image-based visual servoing systems using deep learning. Electronics13(17), 3487 (2024). [Google Scholar]
- 35.Abdul Hafez, A. H., Haj Osman, I., Uğur, E. & Kara, T. Lightweight gaussian process-based visual servoing for autonomous wheelchair sidewalk navigation. IEEE Access13, 69582–69595. 10.1109/ACCESS.2025.3561467 (2025). [Google Scholar]
- 36.Uğur, E., Kara, T., Abdulhafez, A. & Osman, I. H. Design and analysis of a novel sidewalk following visual controller for an autonomous wheelchair. Adv. Electr. Comput. Eng.24(1), 3–14 (2024). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data Availability Statement: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Source code, data files, and video demonstrations supporting the findings of this study are publicly available at: https://github.com/Abdulhafez2019/QHOGVS.









































