Towards multi-modal and cross-modal integration in LiFi-based sensing

Shimaa Naser; Omar Alhussein; Sami Muhaidat

doi:10.1038/s41598-026-36891-7

. 2026 Jan 23;16:6038. doi: 10.1038/s41598-026-36891-7

Towards multi-modal and cross-modal integration in LiFi-based sensing

Shimaa Naser ^1,^✉, Omar Alhussein ¹, Sami Muhaidat ¹

PMCID: PMC12901285 PMID: 41577798

Abstract

Sensing and localization are envisioned to play key role in shaping the future of the sixth-generation (6G) of wireless networks by enhancing their ability to make intelligent decisions. Existing research efforts have focused on utilizing radio-frequency (RF) signals to facilitate sensing tasks, thereby contributing to the heightened strain on the already congested spectrum, due to shared hardware and frequency bands. Additionally, with the explosive growth in the number of connected devices and the diverse sensing scenarios, exploring alternate frequency bands becomes crucial. Within this context, it has been demonstrated that Light Fidelity (LiFi), which utilizes existing lighting infrastructure, stands out as a promising technology due to its highly accurate 3D sensing capabilities. With this motivation, this paper explores a forward-looking perspective wherein LiFi-empowered wireless networks integrate illumination, communication, and sensing capabilities. Specifically, we review LiFi-based sensing and localization principles, highlighting technologies set to enhance its performance. Subsequently, we introduce the concept of LiFi-empowered multi-modal sensing as a promising technological advancement achieved by integrating LiFi with other sensory data sources. This fusion empowers sensing systems to adapt to environmental conditions. Finally, we shed light on potential research directions and challenges that are set to realize the potential of LiFi-empowered multi-modal sensing.

Subject terms: Electrical and electronic engineering, Information theory and computation

Introduction

With the rapid development of haptic applications and tactile internet, the emergence of immersive sensing is expected to shape the future of wireless networks. This is particularly evident in the rise of physical-cyber platforms, which demand capabilities far beyond the traditional data sensing, achieved by standard sensors. Furthermore, recent location-aware services, such as assets tracking, navigation, context-aware marketing, transportation, and logistic systems, are witnessing unprecedented positioning accuracy requirements, ranging from tens of meters to the decimeter level. Although fifth-generation (5G) networks are being rolled out globally, it is evident that 5G will not be able to fully support these new emerging applications/services. Consequently, sensing and localization are expected to be key areas of innovation in the sixth generation (6G) of mobile communication networks, supporting new applications and use cases that were not possible in the 5G era¹. It is worth emphasizing that achieving massive wireless connectivity and accurate localization and sensing for such applications and use cases comes at the expense of placing an additional burden on the existing spectrum. Toward that end, there is a pressing need to explore higher frequency bands such as the mmWaves, Terahertz (THz), and visible light spectrum, to support future communication, localization, and sensing functionalities.

While RF communication has received great attention, recently, Light Fidelity (LiFi) emerges as an attractive solution for the future network of 6G technologies, which exploits the visible light part of the electromagnetic (EM) spectrum and make it compatible with the classical, congested radio-frequency (RF) systems by offering high-data-rate connectivity². Through its intensity modulation and direct detection (IM/DD), LiFi can enable efficient data transmission from the transmitters (LED-based) to the photodetector (PD)-equipped receivers, resulting in low-latency communications, simultaneous multi-user communication, and high spatial reuse. To mention one more aspect, light propagation is highly confined, making LiFi inherently exhibit limited inter-cell interference, improved physical-layer security, and able to alleviate congestion of many RF bands. The communication capabilities of LiFi are very attractive for dense environments, RF-sensitive structures, and in situations where a predictable quality of service is important. In the realm of localization and sensing, existing technologies like WiFi, infrared, radar, satellites, and cameras face certain limitations. For example, satellite-based navigation and positioning systems fall short in delivering satisfactory performance in indoor settings. Additionally, other technologies encounter challenges such as degraded accuracy, high costs in terms of usage or installation, spectrum congestion, EM interference, and privacy infringements³. Unlike sensing and localization techniques based on RF signals, which are naturally susceptible to the adverse effects of multipath propagation, recent investigations on LiFi sensing like LiSense, StarLight, and Li-Tect have demonstrated that capturing the intensity of the incident light signal and converting it into a readable form by a measuring device has the potential to achieve highly accurate 3D sensing⁴. Specifically, in LiSense, a precise reconstruction of five major body joints was achieved with a Inline graphic mean angular error by employing a shadow-based RSS technique and using 324 floor-mounted PDs arranged on a testbed and five modulated LEDs mounted on the ceiling⁵. Similarly, StarLight uses 20 PDs on the floor and 16 LEDs mounted on the ceiling⁶. It provides a slightly higher error of Inline graphic for the 5 major body joints (3D reconstruction) compared to the first iteration of LiSense. In simulation, instead, Li-Tect realizes cubic and spherical objects in a environment using 96 evenly distributed visible light sensing units, and has an MSE of 0.05 m under the optimal transmitter and receiver position⁴. In addition to fine-grained localization and pose prediction, LiFi-based sensing has also proven to be extremely accurate in occupancy detection tasks. For instance, CeilingSee utilizes ceiling-mounted visible light sensing agents, the passive sensing systems, and a reverse-biased LED luminaire serves as receivers⁷. Using reflection-based RSS measurements and support vector regression, the system can achieve over 90% accuracy on static and dynamic indoor occupancy estimation with as few as four sensing units. This improvement in sensing performance stems from the superior line-of-sight (LoS) propagation properties of LiFi signals. Additionally, as light cannot penetrate through walls, LiFi-based sensing is more robust to interference from external sources. Nevertheless, the above-mentioned solutions require the installation of a high number of dedicated transmitters and receivers, rendering them impractical and expensive.

As future wireless systems are expected to have more dense deployment with more efficient spectrum, separating communication or sensing in dedicated infrastructures becomes increasingly infeasible. Resource limitations around spectrum, energy, and hardware complexity drive the use of conventional communication signals to augment new features. In such an environment, integration of sensing in communication systems presents an appealing solution that enhances network performance without increasing system overhead. And importantly, same LiFi signals for communication can be leveraged for localization and sensing and a unified framework of illumination, communication and sensing without any extra hardware or spectrum resources. This joint operation inherently introduces a trade-off, as both functions share common optical, spectral, and signal-processing resources. In practice, favoring one function generally results in a modest, manageable performance degradation of the other. Such a balance can be reached either by partitioning wireless resources (frequency, time, space, or code) between communication and sensing, or jointly utilizing all available resources via a unified waveform design (i.e., single and jointly designed signal structure that is simultaneously used for both communication and sensing, rather than employing separate waveforms or time/frequency resources for each function) and signal processing. Such a concept is know in the open literature as integrated sensing and communication (ISAC). Representative work suggests that tuning the priority between communication and sensing to achieve flexible trade-off between the both of them, provides flexible performance balance. Specifically, such evolution of communication priorities from balanced operating point into a communication-dominant regime leads to an increase of 40–70% in the aggregate sum rate and a corresponding decrease of 65-80% in sensing mutual information, whereas shifting toward a sensing-dominant regime enhances sensing performance at the expense of communication throughput⁸. It is important to highlight that, the IM/DD operation of LiFi systems limits the joint sensing and communication waveforms to real and non-negative signals, making the majority of complex-valued RF waveforms unsuitable for direct optical use. These approaches can be improved with pulsed, constant-envelope, and multi-carrier waveform adaptations. Notbly, such integration levels are different than communication-aided sensing, where the sensing task is opportunistically performed by capturing the multipath reflected communication signals from the object. In these situations, optical sensing is dependent solely on the intensity of the incident light. Consequently, there is no requirement to redesign the optical communication waveforms, allowing for the seamless integration of device-less sensing with LiFi communication functionality. Therefore, this integration will not only provide sensing on top of communication, but it will also enable LiFi networks to exploit their monitoring capabilities, e.g., the orientation, range, environment maps, etc., to autonomously reconfigure and boost communications performance. We envisioned future multi-service lighting infrastructures that illuminate, connect, and enable monitoring for seamless user-infrastructure interactions, enhancing the quality of life, sustainability, and economic competitiveness of future smart cities⁹.

Despite LiFi-based sensing and localization’s potential, obstacles like LoS requirements, limited sensing range, and interference from ambient light hinder its effectiveness. Furthermore, camera-based communications, while common for multi-object detection, suffer from limited imaging range and sensitivity to lighting conditions, affecting image quality, and consequently reducing object detection accuracy. Based on that, we envision the fusion of data from various sensing modalities, known as multi-modal sensing, which can effectively tackle the challenges associated with LiFi-based sensing. For instance, integrating LiFi data with acoustic or RF signals compensates for LoS blockage. Radar signals, which penetrate walls, can expand sensing range when combined with LiFi data. In high ambient light scenarios, infrared sensors complement LiFi systems. Therefore, the integration of diverse sensing modalities empowers the system to adapt to changing environmental conditions and facilitates efficient data fusion to provide a more comprehensive and accurate representation of the environment.

While several articles discussed the utilization of LiFi for sensing and localization applications, e.g.,^9–16 and the references therein, all of these articles have considered the application of LiFi for either sensing only or joint design methods for integrated positioning and communication. Furthermore, none of the existing articles have conducted a thorough review of the literature on LiFi-based sensing and localization principles and highlighted technologies poised to enhance its performance. Additionally, the literature has not previously explored the potential of integrating LiFi-based sensing with other data modalities to address the limitations of LiFi and offer a more comprehensive and accurate depiction of the environment. Based on that, there is a noticeable gap in the existing literature regarding the utilization of LiFi communication infrastructure for sensing applications, particularly in its integration with various data modalities for supporting multi-modal sensing. To the best of our knowledge, this article marks the pioneering attempt to address the role of LiFi in sensing and localization systems and its seamless integration with multi-modal data sources. Furthermore, the article discusses techniques aimed at optimizing the performance of LiFi-based sensing systems, while also shedding light on the challenges hindering its deployment and integration into future wireless networks.

LiFi-based localization and sensing

In 6G networks, localization and sensing data play pivotal roles in various enhancements. Location information improves spatial utilization, enhancing beamforming and interference management. Accurate location data is essential for augmented reality, virtual reality, and immersive gaming. Additionally, it enhances network functionalities by enabling proactive location-based backhaul routing, resource allocation optimization, and load balancing. Sensing capabilities extend environmental monitoring, including object detection, vital signs monitoring, intruder detection, human activity recognition, and user behavior analysis. This comprehensive sensing enables context-aware networks, facilitating real-time decision-making and fostering innovative applications. In this context, LiFi is a promising technology poised to revolutionize sensing and communication in future networks. It harnesses the high bandwidth and low latency of LiFi for both purposes. Off-the-shelf LEDs convey information or sensing beacons by modulating emitted light intensity, while Photodetectors (PDs) detect intensity fluctuations, enabling indoor environment monitoring. Sensing in future networks is anticipated to introduce new performance metrics like detection probability, resolution, and accuracy in range, velocity, and angles¹⁷.

Localization

Localization aims to predict an object or device’s exact 2D or 3D position in an area using information received from multiple transmitters¹⁸. Conventional localization technologies face an inherent trade-off between accuracy and cost: low-cost solutions such as WiFi typically provide only meter-level positioning accuracy, whereas high-precision systems like UWB achieve finer resolution at the expense of increased hardware cost and system complexity^19,20. By exploiting the highly directional and deterministic nature of optical propagation, LiFi-based localization enables more reliable signal measurements with limited multipath effects and reduced susceptibility to interference. This means LiFi systems can achieve localization accuracy at the centimeter level while utilizing the existing illumination infrastructure without significant cost to use. Nevertheless, localization accuracy in LiFi systems is impacted by the relative positioning of the transmitters and receivers, the number of LiFi LEDs, the number of targets, the environmental impact and the employed localization algorithms²¹. Different localization algorithms are reported according to the needs of the system, the hardware available and the current conditions¹⁴:

Proximity: Objects are positioned based on their proximity to an LED access point (AP), with their location considered collocated with the AP, covering the entire area it serves. Specifically, every transmitter is automatically sending out a unique identifier based on its known spatial positions in a database²². The receiver decodes and returns to the corresponding transmitter location at reception, and estimates its own location. For multi-LED AP systems, the position of the object depends on the strongest signal from available AP, because the received signal strength (RSS) is directly related to the distance from the transmitter to receiver in LiFi systems. Proximity-based localization is appealing for its easy-to-use design and practicality, but it provides fairly low spatial precision dependent on how tightly the LEDs are placed.
Triangulation: This approach estimates the coordinates of an object using known distances (lateration) and/or angles (angulation) related to multiple transmitters, thus formulating localization as a classical geometric problem²³. It can be broadly categorized into three types: RSS, the time-of-arrival (ToA)/time difference of arrival (TDoA), as well as the angle-of-arrival (AOA) algorithms. Although RSS and ToA/TDoA are lateration methods, AOA is an angulation method¹⁹. RSS-based localization uses optical channel gain models to estimate the distance between the object and several LED using the input signal strength which is directly proportional to the propagated distance. These distance estimates are subsequently employed in lateration to estimate the receiver’s position. While RSS techniques are attractive because they have low hardware complexity and achieve centimeter-level accuracy for ideal conditions, they are sensitive to environmental conditions, LED-receiver displacement, optical device parameters, and user mobility. In contrast, TOA/TDOA-based localization is based on the time taken by many transmitters to propagate the given signal or relative arrival duration difference between them to estimate the distance. Although TOA methodologies need accurate synchronization between transmitter and receiver, TDOA method only synchronizes transmitters and consequently is also more practical. Phase-based and amplitude-based TDOA techniques have shown high localization accuracy, yet demand accurate timing, stable synchronization, and sophisticated signal processing to mitigate noise and interference. Finally, AOA-based localization locates the receiver position as an estimate of the angles of arrival of optical signals from the LEDs using PD array or tilting PD. By employing the orientation of optical channels and Lambert’s radiation, AOA-based techniques can obtain a high degree of position accuracy with low sensitivity to power variations. Nonetheless, this hardware and orientation information is complex and require specialized receiver hardware.
Fingerprinting: This method generates fingerprints through real-time signal observations, which are compared with an already established reference map to determine location-based signal characteristics. For LiFi systems, optical fingerprints are created using attributes such as received power distributions, spectral properties, or temporal signal patterns. The experimental detected optical signatures are checked out during the operation process against the stored fingerprint database using statistical methodologies, distance-based methods or learning-based matching methods. Since the detection model does not necessarily need explicit geometric analysis, the technique is especially robust to complex propagation effects, so that, in environments requiring different LoS conditions or device orientation, it is well-prepared for a wide range of conditions.

Although achieving high accuracy and precision in localization is of paramount importance, other objectives such as system complexity, coverage, and mobility are equally pivotal for ensuring the overall success and effectiveness of the localization system.

Sensing

Sensing is the ability to estimate the changes in the environment by estimating the object/target’s relative range, velocity, dimensions, configuration, orientation, and material characteristics from the reflected light²⁴. Thus, by exploring the light propagation dynamics and how various factors such as propagation links, dimensions, colors, and material densities impact the signals, this newfound interest aims to utilize the valuable information contained within the reflections from the objects. LiFi-based sensing systems can be mapped into a three-layer structure, comprising a sensing layer, data preprocessing and analysis layer, and application layer, as illustrated in Fig. 1. First, the system’s available resources and the sensing parameters are set to design the sensing objectives such as the accuracy and sensing resolution. Then, the sensing layer comprises diverse object types functioning as sensing targets. Its main role is the concurrent tracking of physical objects of interest either through dedicated sensing signals (active sensing) or through communication-aided sensing (passive sensing). The data processing layer analyzes the obtained data from the previous layer, such as the light intensity and color (ToA, AoA, RSS in case of localization), to enable an array of smart applications within the application layer. These include domains such as smart homes, healthcare, manufacturing and industry, retail solutions, and immersive entertainment.

Device-based and device-less sensing

LiFi-based sensing is categorized into device-based and device-less sensing as shown in Fig. 2. In device-based sensing, the target object utilizes a PD to monitor changes in illumination, gathering information about its environment. Also, Objects equipped with cameras capture and generate images of light sources, and then analyze the information carried by the space/time/color domain of the light for sensing purposes. In device-less sensing, the target lacks a PD, and sensing is achieved by strategically placing a PD in the environment²⁵. The gathered information is sent back to the network infrastructure for processing using advanced techniques. As mentioned earlier, an important issue in LiFi-based sensing systems is whether the light source is active or passive for the sensing process. The active sources are controlled in a sense to probe the environment or a specific target object so that signal (corresponds to the black pulsed waveform in Fig. 2b) can be easily simulated and controlled according to the sensing and corresponding information requirements, as shown in Fig. 2b. In contrast, a source is passive if it was not specifically designed to extract information about the object of interest, as shown in Fig. 2a. For example, a transmitter sending some data (corresponds to the red pulsed waveform) to a communication device acts as a passive source from the sensing point of view if the same object is detected opportunistically with the reflected object or distortions in light. Passive sources, such as ambient light or communication-based transmissions, contribute relatively unobtrusive sensing, but also deliver restricted and less controllable data, demanding solid signal processing techniques. A summary of the taxonomy of the LiFi-based sensing is provided in Fig. 3.

Categories of LiFi-based sensing (a) Device-less sensing, passive transmitter. (b) Device-based sensing, active transmitter.

LiFi networked sensing

LiFi networked sensing, with multiple nodes collaborating to achieve high-resolution sensing, offers a promising paradigm for applications reliant on accurate sensing and tracking. This approach aims to transform a cluster of LiFi APs into a comprehensive network-wide sensor, thereby enhancing sensing capabilities. It is worth noting that LiFi networked sensing can be particularly valuable in mission-critical applications, where accurate and real-time monitoring of physical parameters are essential, such as in healthcare, manufacturing, and public safety. Nevertheless, synchronization among LiFi APs and minimizing latency are key challenges in LiFi-based networked sensing to maintain data accuracy and real-time responsiveness.

Sensing and localization challenges in LiFi

Despite the progress, LiFi-based sensing and localization systems still present several challenges that hinder their robustness and large-scale deployment. The main reasons for these issues are related to environmental dynamics, object diversity, interference, and practical cost-performance trade-offs in real-world environments. We highlight the following representative challenges:

Multi-target interference and obstruction sensitivity: Sensing performance decreases when two or more targets are in close proximity to each other as their effects are harder to separate. Furthermore, unexpected obstacles and dynamic conditions may interfere with LoS paths, causing sporadic or unreliable sensing.
Limited sensing range and object dependency: The effective sensing range is always impaired by optical signal attenuation. Furthermore, object-specific characteristics, including shape, size, surface properties, color, and motion, have a pronounced effect on the quality of sensor recognition; thus, there are no object models for universal situations.
Impact of ambient light and optical interference: Ambient and interference from artificial and natural lighting sources and environmental scattering can severely degrade signal quality. These effects may saturate PDs, decrease SNR, distort received signals, and require frequent re-calibration in practical deployments.
Cost, hardware, and scalability trade-offs: A reliable and continuous LiFi sensing coverage generally necessitates the dense deployment of PDs and suitable hardware placement which lead to higher system cost, power consumption, and installation complexity. Sensing accuracy and scalability are also limited by practical limitations of sensing hardware: limited field of view, dynamic range and bandwidth, analog front-end noise, calibration requirements, hardware aging. These problems can be worsened in dynamic environments where the performance of the sensors may vary over time due to many moving objects or people, and more hardware resources must be employed in order to ensure stability.

Techniques for enhanced LiFi-based sensing

IRS-assisted LiFi systems

Recently, metasurface-based wall-integrated intelligent reflective surfaces (IRSs) have emerged to deliberately control the environment²³. These surfaces are dynamically adjusted to optimize LiFi coverage in single-LED rooms, compensating for signal loss and improving coverage at the cell edge. Additionally, they can enhance the received signal-to-noise ratio (SNR) by strategically reflecting light around obstacles, particularly benefiting users encountering blockage and random orientation challenges. While there has been growing interest in the potential of the IRSs for LiFi communication, leveraging IRS capabilities in the context of LiFi-based sensing remains nascent, necessitating further in-depth investigations. Therefore, it is essential to gain a comprehensive understanding of the unique advantages that IRSs can offer to LiFi-based sensing. In scenarios where IRSs are attached to mobile objects, a key challenge is the uncertainty in the position and orientation of such IRSs. Thus, there is a compelling opportunity for research in exploring the integration of mobility considerations into IRS-assisted LiFi-based sensing systems.

Hybrid RF/LiFi system

LiFi AP primarily serves a limited region, where the sensing range and accuracy significantly degrade when objects are moving outside the transmitter field-of-view (FoV) region. Given that the EM spectrum of LiFi does not overlap with the RF counterpart, both of them can coexist without causing interference to each other. In this context, the high sensing accuracy of LiFi technology along with the ubiquitous coverage of RF cells can ensure uninterrupted sensing functionality over larger distances and support a larger number of devices. In such configurations, the optical and RF APs are connected to a central controller to support various sensing services. Thus, it is important to design a hybrid RF/LiFi network by first optimizing the location of LiFi and RF APs to ensure ubiquitous coverage and minimize dead zones. Due to the mobility and dynamic service requests in such hybrid networks, it is necessary to develop handover models to satisfy both connectivity and sensing requirements. Specifically, both vertical handover models between the RF and optical APs in case of link blockage, and horizontal handover models for seamless switching between optical APs, need to be investigated. It is important to highlight that efficient algorithms for data fusion and processing are also required to combine information from both RF and LiFi APs.

Communication-aided LiFi sensing

As mentioned earlier, LiFi communication signals can support passive sensing, known as communication-aided sensing. In such scenarios, sensing occurs opportunistically by capturing reflected communication signals without dedicated resource allocation. This approach assumes prior knowledge of the wireless standard’s physical layer and frame structure to extract environmental information. However, the focus remains on optimizing LiFi communication performance, neglecting sensing metrics in signal design. While this method minimizes overhead, it poses challenges to sensing accuracy and requires complex signal processing.

Compared to conventional mathematical models and signal processing methods, machine learning (ML) techniques have the capabilities to extract and leverage temporal/spatial patterns, approximate complex models, and solve intricate optimization problems that can be harnessed to efficiently address such complicated tasks. For instance, ML can be utilized to optimize the design of the signal processing for sensing without the need to explicitly model the complicated hardware/channels. Thus, the utilization of ML techniques is of paramount importance in optimizing the performance of LiFi-based passive sensing.

Proposed method: from LiFi-based to multi-modal sensing

Future wireless devices are expected to possess a variety of sensors and communication tools. RF signals, LiFi signals, acoustic signals, and images are the sources of data collected by these devices. Despite that LiFi provides high-resolution sensing, the presence of single sensing modalities is often not enough to ensure strong object identification and situational awareness, especially in dynamic or cluttered environments. Sensing using LiFi enables finer resolution spatial information. However, under shadowing, obstruction, intense ambient light interference, or non-line-of-sight conditions, RF sensing, although suffers from lower spatial resolution and multipath fading, provides greater coverage and penetration capabilities. Camera-based sensing provides a rich semantic and visual context, yet it is also sensitive to lighting levels and susceptible to privacy concerns and obstruction. Accordingly, fusion of LiFi with other types of sensing modalities (i.e., multi-modal sensing) allows greater and richer environment mapping and, therefore, enhances communication, sensing, and localization. For instance, in indoor localization, LiFi can provide centimeter-level accuracy under LoS conditions; RF can keep coarse positioning during light blockage; and cameras provide visual landmarks for map-based correction. For virtual trackpad and fine-grained interaction problems, LiFi-based reflection sensing provides accurate motion tracking; RF sensing strengthens resistance and anti-obstruction features against occlusion of hands, and cameras supply gesture classification and intent recognition. Likewise, LiFi can detect subtle changes in illumination caused by human presence in occupancy detection and gesture recognition, while RF can detect motion through obstacles or in low-light conditions, and cameras can discriminate among activities or postures when privacy restrictions permit.

Artificial intelligence (AI) stands at the forefront of this evolution. With its prowess in merging multi-modal data and dissecting contextual nuances, AI is set to revolutionize multi-modal sensing, granting future networks a more profound and human-like environmental understanding²⁶. Figure 4 showcases how cameras, RF, and LiFi, when combined with AI, can provide object tracking and human sensing services. Subsequent subsections delve into AI’s pivotal role in multi-modal sensing.

Integration of LiFi and other data modalities for enabling ISAC applications.

Multi-modal data fusion

In multi-modal sensing, understanding the environment and generating actionable insights hinge on effectively combining data from diverse sources. Yet, integrating various data modalities into a unified representation poses significant challenges. These challenges arise from disparities in measurement scales (metrics), varied distributions, uneven data availability or granularity, differences in feature spaces (or dimensions), and asynchronous data capture intervals or sequences. A pivotal question in this research area is whether to merge data at higher processed or abstract levels or at the raw data level. In light of this, different data fusion methods have been identified, namely feature-level fusion, decision-level fusion, and hybrid fusion.

Feature-level fusion capitalizes on statistical dependencies between modalities by merging data from multiple sources after preprocessing and feature extraction, but before the classification process. For instance, in indoor localization, features derived from LiFi-based received signal strength or angle information can be combined with RF fingerprints to enhance positioning accuracy under partial LoS obstruction. Likewise, in gesture or fine-grained interaction sensing, LiFi-based reflection features may be combined with visual features from cameras to improve motion discrimination. Despite its effectiveness, feature-level fusion requires careful consideration of feature alignment, synchronization, and handling of missing or imbalanced data across modalities.

Decision-level fusion models and classifies each data modality independently. The individual classifiers or models are then amalgamated using various techniques such as expert rules, simple operators (e.g., majority vote and weighted voting), and probabilistic fusion, to name a few. Such an approach is particularly suitable when modalities operate at different sampling rates or when direct feature alignment is impractical. For example, in occupancy detection systems, LiFi-based presence detection, RF-based motion sensing, and camera-based activity recognition can each generate independent decisions that are fused to improve robustness against occlusions or environmental changes. While decision-level fusion enhances fault tolerance, it typically does not fully exploit inter-modal correlations.

Hybrid fusion takes advantage of both the feature-level and decision-level fusion techniques by combining features extracted from multi-modal data with decision-level fusion techniques. Thus, it offers the flexibility of decision-level approaches by employing different classifiers for each modality. It simultaneously harnesses part of the information from every data modality while taking advantage of statistical inter-dependencies, similar to feature-level methods. Such a strategy is deemed to be effective in complex scenarios such as multimodal indoor monitoring, where LiFi provides fine-grained spatial sensitivity, RF ensures coverage continuity, and cameras contribute semantic context.

Cross-modal learning

Cross-modal learning focuses on the knowledge transfer between different modalities or the relationship inference between data sources. The primary objective of cross-modal learning is to enable an AP to predict or infer missing information from the primary LiFi modality by leveraging the information available in another modality (e.g., RF). Several challenges arise in cross-modal learning, including modality misalignment, heterogeneity in data distributions, and the potential domain shift between training and testing scenarios²⁷. In what follows, we highlight some methodologies to address the challenges.

Representation Learning: by learning a shared or joint representation for the different modalities, this ensures that the representations capture shared semantic concepts across modalities. Methods such as canonical correlation analysis (CCA) and its variants are worth applying and investigating.
Cross-modal translation: mapping of entities from one modality to another while preserving their semantic meaning. Multi-modal translation approaches typically exhibit modality-specific characteristics as they share several common factors.
Co-Learning: transfers the knowledge acquired through one or more modalities to tasks involving other modalities. Co-learning proves valuable in scenarios featuring low-resource target tasks, missing data, or noisy modalities.

Data collection

The accuracy of ML models is intrinsically tied to the quality, informativeness, and volume of available data. While model-based ML algorithms have robust theoretical foundations, they can face scalability challenges and may not fully address the breadth of real-world use cases. This is particularly pronounced in multi-modal sensing, which necessitates training on expansive, high-quality datasets featuring a wide range of scenarios. The traditional approach of manual labeling, albeit reliable, often hinges on the availability of domain expertise, making it less feasible for extensive datasets. Therefore, there is a pressing need for innovative and scalable data acquisition and labeling methods specifically designed for multi-modal sensing applications²⁸. Self-supervised learning can be leveraged to reduce the reliance on manual labeling. Domain adaptation techniques and synthetic data generation can be leveraged to adapt learned features to new emerging scenarios and to generate new data points where data collection is challenging. Active learning techniques can also be used to optimize the data collection process by selectively querying data points that are deemed most informative.

Future research trends and open problems

Generative AI for multi-modal sensing

While real datasets are considered the optimum seed for the training of accurate AI models for sensing and localization, the acquired data from real environments might fail to comprise a wide range of parameters for multiple targets. Specifically, real sensing datasets will be limited to particular geographical areas, constrained by the nature of the surrounding environment, and limited to a small number of targets. Subsequently, limited sensing patterns can be extracted, and trained AI models will not be generalized to an arbitrary environment. In this regard, Generative AI (GAI) models such as generative adversarial networks (GANs), normalizing flows, and variational autoencoders, have been gaining momentum due to their proficiency in data modeling and analysis as well as their ability to capture complex and high-dimensional data distributions, and thus represent an efficient solution to create a generalized framework that experiences a wide range of special network conditions, with diverse sensing features. For instance, in a GAN, a generator, which is enabled by a deep neural network, is trained to generate close-to-real data, and then a discriminator is utilized to quantify the learning accuracy²⁹. By leveraging GANs, limited datasets can be augmented with realistic synthetic samples that capture a broad diversity of LiFi sensing signal patterns. This enables the development of more robust physical-layer designs for LiFi-based sensing by allowing generative AI models to effectively learn and represent the underlying signal variations encountered in practical deployments. Additionally, GAI’s capabilities to reduce the dimensionality of high-dimensional data, through encoding and decoding, enables efficient data compression, storage, and transmission within sensing systems. Similarly, GAI can be employed to boost the efficiency of sensing systems, considering both the network and application layers. For instance, at the network layer, GAI can contribute to resource allocation, scheduling, and incentive mechanisms. Meanwhile, it can also play a role in data generation, analysis, and feature extraction for diverse sensing applications. Therefore, harnessing GAI capabilities for enhancing LiFi-empowered mutlimodal sensing applications is a promising research direction.

Privacy and security

Privacy preservation is a key concern in critical sensing applications, as fine-grained LiFi-based sensing and localization inherently involve a lot of context-sensitive information such as user location, activity patterns and environmental dynamics. This information, if not managed well, has the potential of facilitating unauthorized tracking, inference, or invasion. As LiFi sensing increasingly relies on data-driven and learning-enabled inference, particularly large-scale and multi-modal learning models, large quantities of sensing and communication data are produced across the total data lifecycle from detection, transmission, storage and model training. This adds an additional level of privacy risk and requires strong anonymization strategies, encryption and access control to help protect against inadvertent access. Sensing strategies that are privacy-aware are particularly useful to sensitive ares such as museums, factories, and smart buildings, where presence or activity can be detected without revealing individuals’ true identity or nuances in the object. In this sense, resolution control and feature-based sensing enable the system to balance sensing utility with privacy. Moreover, privacy-preserving learning strategies, including federated learning, can be established so that multiple APs can training sensing models together, without sharing raw measurements, leading to a large reduction in the risk for leaking data. Likewise, split learning partitions deep neural networks across sensing devices and network infrastructure and restricts one entity from possessing full sensing information.

Apart from privacy, security is also important to help LiFi sensing platforms be reliable, authentic and robust. Sensing and communications operations only allow authenticated and authorized devices to come into the process, which requires secure trust establishment, device authentication, and authorization protocols. At the physical layer, LiFi sensing signals are susceptible to eavesdropping, spoofing, and signal manipulation, especially when operated in any open or shared setting. As a result, waveform and signaling designs should be carefully engineered in a way to provide reliable sensing that does not leak too much information, for example through the implementation of feature-specific signaling, controlled sensing resolution, and well-modulated transmission schemes. Also, noise injection, spatial masking, and temporal obfuscation can be employed to prevent undesired interfere with malicious listening. At the learning and inference levels, security threats become exacerbated as learning-based models are embedded into LiFi-based ISAC systems. These kinds of models might be potentially targeted by adversarial attacks, malicious fine-tuning schemes, or model inversion that aim to degrade performance or extract confidential information. To counter these threats, robust model validation, and continuous assessment of the security of learning-based LiFi sensing systems are essential to ensure the stability and resilience of learning-based LiFi sensing systems.

Synchronization

One of the primary challenges in sensing and localization systems, especially in dynamic environments, is maintaining a robust synchronization between the transmitter and receiver. This is vital for determining when to listen for the transmitted signals and for coordinating transmissions to manage interference effectively. It is worth noting that when synchronization is lacking, it can manifest as timing offset and carrier frequency offset at the receiver’s end which lead to ranging ambiguity and speed ambiguity, respectively. Typically, synchronization is achieved by relying on a clock signal, which may not always be readily available in modern communication systems. One potential alternative for the clocking method is triggers where the transmitter initiates communication by emitting a triggering or notification signal to alert the receiver of its intent to transmit LiFi packets for sensing purposes. Upon reception of this signal, the receiver acknowledges the impending transmission and prepares to receive the forthcoming sensing packets. Therefore, achieving precise synchronization between the transmitter and receiver is often necessary to ensure efficient sensing.

Scalability and generalizability

In multi-modal sensing systems with a massive number of devices, it is important to carefully handle and process data from multiple sensors/devices across different modalities efficiently. This is critical to ensure low complexity and interpretability models and to preserve the quantity and quality of accessible data. Consequently, methodologies such as distributed processing and parallel computation come into play, enabling the dispersion of processing tasks across numerous computing nodes and concurrent threads, respectively. On the other hand, in dynamic environments/scenarios or when dealing with limited or incomplete data in one modality, it is imperative to develop ML models and advanced inference techniques capable of learning from limited and diverse datasets, even in the presence of noisy and unstructured data. This is important to ensure the reliability and robustness of the system in real-world sensing applications. In this regard, it is pivotal to leverage Transfer learning (TL) which can enable pre-trained models to adapt to smaller datasets, enhancing the system’s performance in scenarios with constrained data availability. In the context of multi-modal sensing, with the aid of TL, the performance of ML models trained on one data modality can be improved when dealing with data from a different modality.

Sensing-assisted communication

Utilizing sensing and localization functionalities in communication systems could significantly enhance communication performance and efficiency, particularly in dynamic or challenging environments. This approach diverges from conventional communication systems, which typically transmit data without considering environmental conditions. By leveraging information such as environmental conditions, channel state information, interference levels, and user locations, the communication system can dynamically adjust transmitted signals in real-time. For example, in a LiFi setup, the AP could utilize the location and mobility patterns of mobile users to predict LoS link blockage and adjust system configurations, such as users’ handover and precoders, accordingly. Hence, the exploration of LiFi-based sensing-aided communication represents a compelling avenue for future research, necessitating more in-depth investigation and analysis.

Potential of multi-cell architecture

In traditional cellular communication architecture, inter-cell interference is considered detrimental, particularly affecting users at cell edges. To mitigate this interference, frequency reuse methods and coordinated multi-point transmission are typically employed. Conversely, in sensing applications, signals received from multiple cells have the potential to extend the detection range and improve SNR and detection probability. By employing fusion techniques to combine data from multiple cells, a more comprehensive understanding of the environment is achieved, enhancing the accuracy and reliability of environmental sensing and object localization. Furthermore, the use of a multi-cell configuration facilitates distributed sensing by leveraging the widespread deployment of APs and mobile devices. This distributed sensing approach enables real-time monitoring of large areas. However, employing a cellular system design solely for communication may not be optimal for LiFi-based sensing, as interference from adjacent cells could potentially contain valuable information for object detection rather than being eliminated. Hence, it is imperative to reframe the objective in multi-cell configurations to consider both sensing and communication functionalities to ensure that the system is optimized to achieve its objectives effectively in diverse scenarios and applications.

Flexible multiple access schemes

Flexible multi-access schemes, which can accommodate simultaneous heterogeneous sensing, communication, and illumination needs should be explored in future research. While currently implemented, LiFi communication and sensing primarily depend on legacy multiple-access schemes (time division multiple access (TDMA), frequency division multiple access (FDMA), code division multiple access (CDMA), and random access) to isolate signals to different luminaires or users. Although successful, these mechanisms require synchronization, are not easily scalable, or compromise the positioning accuracy, sensing resolution, and communication throughput. Recent advances in flexible coherent passive optical networks (PONs) have brought new possibilities to solve these challenges effectively. Especially hybrid multiple-access schemes that can incorporate frequency- and time-domain multiplexing into coherent point-to-multipoint (P2MP) architectures to dynamically balance capacity, connectivity, and latency over heterogeneous services³⁰. While, these are mostly developed for fixed optical access networks, the underlying principles, namely adaptive resource allocation, service-aware multiplexing, and cross-layer optimization, are perfectly relevant to LiFi-based ISAC systems. Future LiFi-based ISAC systems may be inspired by flexible coherent PON architectures where illumination, communication, and sensing can be supported through hybrid and adaptive multiple access frameworks.

Conclusion

This paper presented a vision for leveraging diverse data modalities to enhance localization and sensing, highlighting LiFi as a transformative solution. We proposed using existing lighting infrastructure to combine illumination with sensing functions. A comprehensive overview of LiFi-based sensing and localization principles was provided, with a focus on technologies for improved performance. Additionally, we explored the potential of multi-modal sensing through LiFi integration with other sensory data, enabling richer environmental characterization. Finally, we addressed challenges in integrating LiFi with other modalities, paving the way for advanced, versatile wireless networks to meet future connectivity demands.

Author contributions

Shimaa Naser (Corresponding Author): Conceptualization, Methodology, Writing Original Draft, Supervision. She devel- oped the core idea of the research, designed the methodology, and supervised the project. Additionally, she wrote the original draft of the manuscript. Omar Alhussein (Co-Author): Formal Analysis, Visualization, Investigation, Writing-Review & Editing. He was responsible for the formal analysis and validation of the results. Also, he contributed to creating visualizations, writing, reviewing, and editing the manuscript. Sami Muhaidat (Co-Author): Validation, Visualization, Investigation, Funding Acquisition, Project Administration. He con- tributed to the study by validating the results, creating visualizations, and conducting specific investigations. Also, he con- tributed to the review and editing of the manuscript.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Bourdoux, A. et al. 6G white paper on localization and sensing. arXiv (2020).
2.Islim, M. et al. Towards 10 Gb/s orthogonal frequency division multiplexing-based visible light communication using a GaN violet micro-LED. Photon. Res5, 35–43 (2017). [Google Scholar]
3.Misra, P. & Enge, P. Global Positioning System: Signals, Measurements, and Performance (Ganga-Jamuna Press, Lincoln, MA, USA, 2001). [Google Scholar]
4.Alizadeh Jarchlo, E. et al. Li-Tect: 3-D monitoring and shape detection using visible light sensors. IEEE Sens. J.19, 940–949. 10.1109/JSEN.2018.2879398 (2019). [Google Scholar]
5.Li, T., An, C., Tian, Z., Campbell, A. T. & Zhou, X. Human sensing using visible light communication. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, MobiCom ’15, 331–344, 10.1145/2789168.2790110 (Association for Computing Machinery, New York, NY, USA, 2015).
6.Li, T., Liu, Q. & Zhou, X. Practical human sensing in the light. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’16, 71–84, 10.1145/2906388.2906401 (Association for Computing Machinery, New York, NY, USA, 2016).
7.Yang, Y., Hao, J., Luo, J. & Pan, S. J. Ceilingsee: Device-free occupancy inference through lighting infrastructure based led sensing. In 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom), 247–256, 10.1109/PERCOM.2017.7917871 (2017).
8.Saghir, Z. & Aboagye, S. Movable access point-aided integrated visible light communication and sensing networks. IEEE Wirel. Commun. Lett.14, 731–735. 10.1109/LWC.2024.3521192 (2025). [Google Scholar]
9.Wang, F. et al. Internet of lamps for future ubiquitous communications: Integrated sensing, hybrid interconnection, and intelligent illumination. China Commun.19, 132–144. 10.23919/JCC.2022.03.009 (2022). [Google Scholar]
10.Wang, J.-Y., Yang, H.-N., Wang, J.-B., Lin, M. & Shi, P. Joint optimization of slot selection and power allocation in integrated visible light communication and sensing systems. IEEE Internet Things J.Early Access, 1–1. 10.1109/JIOT.2023.3303137 (2023).
11.Abuella, H. & Ekin, S. Non-contact vital signs monitoring through visible light sensing. IEEE Sens. J.20, 3859–3870. 10.1109/JSEN.2019.2960194 (2020). [Google Scholar]
12.Chakraborty, A., Singh, A., Bohara, V. A. & Srivastava, A. On estimating the location and the 3-D shape of an object in an indoor environment using visible light. IEEE Photon. J.14, 1–11. 10.1109/JPHOT.2022.3186793 (2022). [Google Scholar]
13.Ma, S. et al. Waveform design and optimization for integrated visible light positioning and communication. IEEE Trans. Commun.71, 5392–5407. 10.1109/TCOMM.2023.3287536 (2023). [Google Scholar]
14.Luo, J., Fan, L. & Li, H. Indoor positioning systems based on visible light communication: State of the art. IEEE Commun. Surveys Tutorials19, 2871–2893. 10.1109/COMST.2017.2743228 (2017). [Google Scholar]
15.Warmerdam, K., Pandharipande, A., Caicedo, D. & Zuniga, M. Visible light communications for sensing and lighting control. IEEE Sens. J.16, 6718–6726. 10.1109/JSEN.2016.2585199 (2016). [Google Scholar]
16.Liang, C. et al. Integrated sensing, lighting and communication based on visible light communication: A review. Digital Signal Process.145, 104340. 10.1016/j.dsp.2023.104340 (2024). [Google Scholar]
17.Huang, Y., Safari, M., Haas, H. & Tavakkolnia, I. Optical wireless 3-d-positioning and device orientation estimation. IEEE Open J. Commun. Soc.5, 4519–4530. 10.1109/OJCOMS.2024.3423420 (2024). [Google Scholar]
18.Xiao, H., Sun, C. & Wang, W. Uav route planning and light searching method based on optical sensing. IEEE Internet Things J.Early Access, 1–1. 10.1109/JIOT.2025.3550126 (2025).
19.Zhu, Z. et al. A survey on indoor visible light positioning systems: Fundamentals, applications, and challenges. IEEE Commun. Surveys Tutorials27, 1656–1686. 10.1109/COMST.2024.3471950 (2025). [Google Scholar]
20.Zhu, B., Cheng, J., Wang, Y., Yan, J. & Wang, J. Three-dimensional vlc positioning based on angle difference of arrival with arbitrary tilting angle of receiver. IEEE J. Sel. Areas Commun.36, 8–22. 10.1109/JSAC.2017.2774435 (2018). [Google Scholar]
21.Zhang, S., Zhang, L., Liu, K. & Wang, Y. A bayesian model based on link distribution features for multi-target passive localization in visible light sensing. IEEE Sensors J. 1–1. 10.1109/JSEN.2025.3561317 (2025).
22.del Campo-Jimenez, G., Perandones, J. M. & Lopez-Hernandez, F. J. A vlc-based beacon location system for mobile applications. In 2013 International Conference on Localization and GNSS (ICL-GNSS), 1–4, 10.1109/ICL-GNSS.2013.6577276 (2013).
23.Guo, Y. et al. Optical irs assisted-visible light positioning in indoor non-los iovs scenarios. IEEE Internet Things J.Early Access, 1–1. 10.1109/JIOT.2025.3563896 (2025).
24.Liu, Z. et al. Reflexgest: Recognizing hand gestures under vlc-capable lamps. IEEE Trans. Mobile Comput.Early Access, 1–13. 10.1109/TMC.2025.3545340 (2025).
25.Alijani, M., Cock, C. D., Joseph, W. & Plets, D. Device-free visible light sensing: A survey. IEEE Commun. Surveys Tutorials Early Access10.1109/COMST.2025.3546166 (2025). [Google Scholar]
26.Cheng, X. et al. Intelligent multi-modal sensing-communication integration: Synesthesia of machines. IEEE Commun. Surveys Tutor.26, 258–301. 10.1109/COMST.2023.3336917 (2024). [Google Scholar]
27.Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell.41, 423–443. 10.1109/TPAMI.2018.2798607 (2019). [DOI] [PubMed] [Google Scholar]
28.Roh, Y., Heo, G. & Whang, S. E. A survey on data collection for machine learning: A big data - ai integration perspective. IEEE Trans. Knowl. Data Eng.33, 1328–1347. 10.1109/TKDE.2019.2946162 (2021). [Google Scholar]
29.Wang, J. et al. Generative ai for integrated sensing and communication: Insights from the physical layer perspective. arXiv (2023). arXiv:abs/2310.01036.
30.Guo, D. et al. Flexible coherent pon for 6g fixed networks: Technologies and solutions. IEEE Commun. Mag.63, 94–100. 10.1109/MCOM.002.2400622 (2025). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request

[CR1] 1.Bourdoux, A. et al. 6G white paper on localization and sensing. arXiv (2020).

[CR2] 2.Islim, M. et al. Towards 10 Gb/s orthogonal frequency division multiplexing-based visible light communication using a GaN violet micro-LED. Photon. Res5, 35–43 (2017). [Google Scholar]

[CR3] 3.Misra, P. & Enge, P. Global Positioning System: Signals, Measurements, and Performance (Ganga-Jamuna Press, Lincoln, MA, USA, 2001). [Google Scholar]

[CR4] 4.Alizadeh Jarchlo, E. et al. Li-Tect: 3-D monitoring and shape detection using visible light sensors. IEEE Sens. J.19, 940–949. 10.1109/JSEN.2018.2879398 (2019). [Google Scholar]

[CR5] 5.Li, T., An, C., Tian, Z., Campbell, A. T. & Zhou, X. Human sensing using visible light communication. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, MobiCom ’15, 331–344, 10.1145/2789168.2790110 (Association for Computing Machinery, New York, NY, USA, 2015).

[CR6] 6.Li, T., Liu, Q. & Zhou, X. Practical human sensing in the light. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’16, 71–84, 10.1145/2906388.2906401 (Association for Computing Machinery, New York, NY, USA, 2016).

[CR7] 7.Yang, Y., Hao, J., Luo, J. & Pan, S. J. Ceilingsee: Device-free occupancy inference through lighting infrastructure based led sensing. In 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom), 247–256, 10.1109/PERCOM.2017.7917871 (2017).

[CR8] 8.Saghir, Z. & Aboagye, S. Movable access point-aided integrated visible light communication and sensing networks. IEEE Wirel. Commun. Lett.14, 731–735. 10.1109/LWC.2024.3521192 (2025). [Google Scholar]

[CR9] 9.Wang, F. et al. Internet of lamps for future ubiquitous communications: Integrated sensing, hybrid interconnection, and intelligent illumination. China Commun.19, 132–144. 10.23919/JCC.2022.03.009 (2022). [Google Scholar]

[CR10] 10.Wang, J.-Y., Yang, H.-N., Wang, J.-B., Lin, M. & Shi, P. Joint optimization of slot selection and power allocation in integrated visible light communication and sensing systems. IEEE Internet Things J.Early Access, 1–1. 10.1109/JIOT.2023.3303137 (2023).

[CR11] 11.Abuella, H. & Ekin, S. Non-contact vital signs monitoring through visible light sensing. IEEE Sens. J.20, 3859–3870. 10.1109/JSEN.2019.2960194 (2020). [Google Scholar]

[CR12] 12.Chakraborty, A., Singh, A., Bohara, V. A. & Srivastava, A. On estimating the location and the 3-D shape of an object in an indoor environment using visible light. IEEE Photon. J.14, 1–11. 10.1109/JPHOT.2022.3186793 (2022). [Google Scholar]

[CR13] 13.Ma, S. et al. Waveform design and optimization for integrated visible light positioning and communication. IEEE Trans. Commun.71, 5392–5407. 10.1109/TCOMM.2023.3287536 (2023). [Google Scholar]

[CR14] 14.Luo, J., Fan, L. & Li, H. Indoor positioning systems based on visible light communication: State of the art. IEEE Commun. Surveys Tutorials19, 2871–2893. 10.1109/COMST.2017.2743228 (2017). [Google Scholar]

[CR15] 15.Warmerdam, K., Pandharipande, A., Caicedo, D. & Zuniga, M. Visible light communications for sensing and lighting control. IEEE Sens. J.16, 6718–6726. 10.1109/JSEN.2016.2585199 (2016). [Google Scholar]

[CR16] 16.Liang, C. et al. Integrated sensing, lighting and communication based on visible light communication: A review. Digital Signal Process.145, 104340. 10.1016/j.dsp.2023.104340 (2024). [Google Scholar]

[CR17] 17.Huang, Y., Safari, M., Haas, H. & Tavakkolnia, I. Optical wireless 3-d-positioning and device orientation estimation. IEEE Open J. Commun. Soc.5, 4519–4530. 10.1109/OJCOMS.2024.3423420 (2024). [Google Scholar]

[CR18] 18.Xiao, H., Sun, C. & Wang, W. Uav route planning and light searching method based on optical sensing. IEEE Internet Things J.Early Access, 1–1. 10.1109/JIOT.2025.3550126 (2025).

[CR19] 19.Zhu, Z. et al. A survey on indoor visible light positioning systems: Fundamentals, applications, and challenges. IEEE Commun. Surveys Tutorials27, 1656–1686. 10.1109/COMST.2024.3471950 (2025). [Google Scholar]

[CR20] 20.Zhu, B., Cheng, J., Wang, Y., Yan, J. & Wang, J. Three-dimensional vlc positioning based on angle difference of arrival with arbitrary tilting angle of receiver. IEEE J. Sel. Areas Commun.36, 8–22. 10.1109/JSAC.2017.2774435 (2018). [Google Scholar]

[CR21] 21.Zhang, S., Zhang, L., Liu, K. & Wang, Y. A bayesian model based on link distribution features for multi-target passive localization in visible light sensing. IEEE Sensors J. 1–1. 10.1109/JSEN.2025.3561317 (2025).

[CR22] 22.del Campo-Jimenez, G., Perandones, J. M. & Lopez-Hernandez, F. J. A vlc-based beacon location system for mobile applications. In 2013 International Conference on Localization and GNSS (ICL-GNSS), 1–4, 10.1109/ICL-GNSS.2013.6577276 (2013).

[CR23] 23.Guo, Y. et al. Optical irs assisted-visible light positioning in indoor non-los iovs scenarios. IEEE Internet Things J.Early Access, 1–1. 10.1109/JIOT.2025.3563896 (2025).

[CR24] 24.Liu, Z. et al. Reflexgest: Recognizing hand gestures under vlc-capable lamps. IEEE Trans. Mobile Comput.Early Access, 1–13. 10.1109/TMC.2025.3545340 (2025).

[CR25] 25.Alijani, M., Cock, C. D., Joseph, W. & Plets, D. Device-free visible light sensing: A survey. IEEE Commun. Surveys Tutorials Early Access10.1109/COMST.2025.3546166 (2025). [Google Scholar]

[CR26] 26.Cheng, X. et al. Intelligent multi-modal sensing-communication integration: Synesthesia of machines. IEEE Commun. Surveys Tutor.26, 258–301. 10.1109/COMST.2023.3336917 (2024). [Google Scholar]

[CR27] 27.Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell.41, 423–443. 10.1109/TPAMI.2018.2798607 (2019). [DOI] [PubMed] [Google Scholar]

[CR28] 28.Roh, Y., Heo, G. & Whang, S. E. A survey on data collection for machine learning: A big data - ai integration perspective. IEEE Trans. Knowl. Data Eng.33, 1328–1347. 10.1109/TKDE.2019.2946162 (2021). [Google Scholar]

[CR29] 29.Wang, J. et al. Generative ai for integrated sensing and communication: Insights from the physical layer perspective. arXiv (2023). arXiv:abs/2310.01036.

[CR30] 30.Guo, D. et al. Flexible coherent pon for 6g fixed networks: Technologies and solutions. IEEE Commun. Mag.63, 94–100. 10.1109/MCOM.002.2400622 (2025). [Google Scholar]

PERMALINK

Towards multi-modal and cross-modal integration in LiFi-based sensing

Shimaa Naser

Omar Alhussein

Sami Muhaidat

Abstract

Introduction

LiFi-based localization and sensing

Localization

Sensing

Figure 1.

Device-based and device-less sensing

Figure 2.

Figure 3.

LiFi networked sensing

Sensing and localization challenges in LiFi

Techniques for enhanced LiFi-based sensing

IRS-assisted LiFi systems

Hybrid RF/LiFi system

Communication-aided LiFi sensing

Proposed method: from LiFi-based to multi-modal sensing

Figure 4.

Multi-modal data fusion

Cross-modal learning

Data collection

Future research trends and open problems

Generative AI for multi-modal sensing

Privacy and security

Synchronization

Scalability and generalizability

Sensing-assisted communication

Potential of multi-cell architecture

Flexible multiple access schemes

Conclusion

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases