Adversarial AI applied to cross-user inter-domain and intra-domain adaptation in human activity recognition using wireless signals

Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman

doi:10.1371/journal.pone.0298888

. 2024 Apr 18;19(4):e0298888. doi: 10.1371/journal.pone.0298888

Adversarial AI applied to cross-user inter-domain and intra-domain adaptation in human activity recognition using wireless signals

Muhammad Hassan ^1,^*,^#, Tom Kelsey ^1,^#, Fahrurrozi Rahman ¹

Editor: Sunder Ali Khowaja²

PMCID: PMC11025916 PMID: 38635837

Abstract

In recent years, researchers have successfully recognised human activities using commercially available WiFi (Wireless Fidelity) devices. The channel state information (CSI) can be gathered at the access point with the help of a network interface controller (NIC card). These CSI streams are sensitive to human body motions and produce abrupt changes (fluctuations) in their magnitude and phase values when a moving object interacts with a transmitter and receiver pair. This sensing methodology is gaining popularity compared to traditional approaches involving wearable technology, as it is a contactless sensing strategy with no cumbersome sensing equipments fitted on the target with preserved privacy since no personal information of the subject is collected. In previous investigations, internal validation statistics have been promising. However, external validation results have been poor, due to model application to varying subjects with remarkably different environments. To address this problem, we propose an adversarial Artificial Intelligence AI model that learns and utilises domain-invariant features. We analyse model results in terms of suitability for inter-domain and intra-domain alignment techniques, to identify which is better at robustly matching the source to target domain, and hence improve recognition accuracy in cross-user conditions for HAR using wireless signals. We evaluate our model performance on different target training data percentages to assess model reliability on data scarcity. After extensive evaluation, our architecture shows improved predictive performance across target training data proportions when compared to a non-adversarial model for nine cross-user conditions with comparatively less simulation time. We conclude that inter-domain alignment is preferable for HAR applications using wireless signals, and confirm that the dataset used is suitable for investigations of this type. Our architecture can form the basis of future studies using other datasets and/or investigating combined cross-environmental and cross-user features.

Introduction

Commercial-off-the-shelf (COTS) WiFi devices were initially invented for wireless communication and local area networking using wireless networking protocols. Owing to the ubiquitous nature of WiFi technologies, there are tens of billions of devices connected together in a network. Today, we are surrounded by various types of wireless signals such as WiFi, LoRa, and LTE. Earlier research has shown that the radio signals travel through multiple paths and can be used to identify the presence, location and movement of surrounding objects, via superposition at the receiver. The pervasive nature of the radio signals and their capability to demodulate the activities of the surrounding environment open the way for a new wireless sensing technology. WiFi Sensing is hence the use of commercially available WiFi devices for carrying information about users’ behavior.

The field of wireless sensing is involved with the key concepts of Multiple Input Multiple Output (MIMO) [1] and Orthogonal Frequency Division Multiplexing (OFDM). MIMO is a technology that creates multiple versions of the same signal using multiple antennas at both the source and destination. These multiple versions of the same signal are helpful to both increase the signal-to-noise-ratio and reduce signal fading, since multiple copies of same signal increase the chances of the signal arriving at the receiving end successfully [1]. MIMO in WiFi devices can supply diverse and rich data concerning how signals carry information related to the surrounding environment, which we refer to as channel state information (CSI). OFDM is a modulation technique that supports a large number of carriers, each separated from the other orthogonally. It is less susceptible to selective fading, interference, and multi path effects [2]. Modern WiFi devices with IEEE 802.11 n/ac standards utilize OFDM with MIMO systems. In OFDM, data is transmitted over multiple orthogonal sub-carriers with quite narrow bandwidth. Therefore, it suffers from flat fading but this is not very severe, while co-channel interference is also avoided to a great extent. CSI data has benefits compared to received signal strength indicator (RSSI) [3]. RSSI measures the signal power on the receiver side and associates it with the distance either from the reflected object or the transmitter. This signal strength is susceptible to multi-path fading. When the transmitted signal is emitted in the environment, it gets obstructed with the surrounding objects such as buildings, vehicles and humans, which takes multiple paths before reaching at the receiver. Different signals presume different path lengths, thus suffering from fading and delay. This results in the reduction of the received signal power.

When radio signals emerge from a COTS WiFi device and spread out in the surrounding environment, they follow a multi-path propagation which induces a pattern of channel state information (CSI) at the receiving end. As a target (object or human) performs some activity under the presence of wireless environment, it creates fluctuations which exhibit distinct characteristics due to different movements in the CSI pattern. These distinct fluctuating patterns are used to train a deep learning model to predict specific activities. Fig 1 illustrates the concept of wireless sensing along with the phasor representation of a target moving from location A to a new location B covering a distance d. Target activity whilst between A and B will be reflected by the dynamic movement of vectors in the I-Q plane at the receiving end. When radio waves emerge from a device, they are broadly classified into three main vectors in terms of a phasor diagram. The reflection and diffraction from static objects such as walls or furniture and line of sight (LOS) contact between a transmitter and a receiver forms a static vector. In the I-Q plane, V_s (in blue) is the static vector, whose length represents the magnitude and angle from I-axis to Vs is its phase value. The direct reflection from the target forms a dynamic vector. As the target moves, it causes changes in the magnitude and phase of the dynamic vector. In the same I-Q plane, V_d (in red) is the dynamic vector appearing for two different target positions at A and B. The vector length represents its magnitude and angles from I-axis to V_d at location A and location B are its phase values for these two locations. Since this vector is dynamic in nature, the phase and magnitude differences between the dynamic vectors at location A and B can be used to track the target movement. The summation of a static and dynamic vector forms a composite vector [4].

Fig 1 — Left: Concept of wireless sensing. Right: Phasor representation.

Since the fluctuations in CSI data are dependent upon surrounding objects and in fact the target characteristics can severely affects the model performance, its a challenging task to generalize a model for different cross-user conditions. Hence the work described of this study is the proposal of an adversarial model and detailed evidence to support the use of such models in this context. Our key contributions are:

We apply inter-domain and intra-domain adaptation on an adversarial model for nine cross-user conditions using a publicly available Wi-Fi data. We achieve this by using mean discrepancy loss (MMD) and local mean discrepancy loss (LMMD).
We evaluate the proposed model performance on different target training data proportions and show that the model is less susceptible with reduced target training data samples.
Model average F1-micro score for nine different cross-user conditions with varying target training data proportions is 68.53% with MMD loss and 66.58% with LMMD loss.
Model average F1-macro score for nine different cross-user conditions with varying target training data proportions is 64.28% with MMD loss and 62.6% with LMMD loss.
Model average simulation time for nine different cross-user conditions with varying target training data proportions is not more than two to three minutes which indicates that it’s a lightweight model with simple model configuration.

Related work

The field of human activity recognition has gained popularity due to it’s valuable usages in the field of activity recognition, mobile health monitoring and patient rehabilitation. The typical challenge is to concern about the model performance in cross-domain conditions such as cross-user (a classifier is trained on known users and tested on some unknown users), cross-environment (a classifier is trained on a seen environment and tested on some unseen environment) and a combination of both of them. Models proposed by the researchers in the past performed well when they were tested on the same conditions which were used during the model training. Unfortunately, their performance suffers from acute degradation when they are tested on different environments and subjects other than those used for the model training. The activity patterns for new users and environments differ from those in the training data which makes the model less efficient in predicting activities in cross-domain conditions. Additionally, training a classifier for unseen users and environments is time consuming which also takes high computational resources [5]. Domain adaptation [6], a sub-field of transfer learning [7], is considered to be an appropriate solution for adjusting a model’s parameters (weights and biases) to transfer them from one domain, refer to as source domain, to another as target domain whereas both the domains consist of domain variant features (source and target features are different from each other). Researchers in recent years resorted to unsupervised domain adaptation (UDA) [8] where adversarial learning approaches are applied to transfer domain independent features from source domain with labelled data to match with the target domain features, however, this new unseen target domain has unlabelled data samples. Virtual sample generation via geometric modelling [9], is the representation of drafting a translation function between source and target configurations. Translation function is a mathematical modelling to generate virtual samples for target movements in different locations and orientations, thus saves time to collect new training data for user’s new locations and orientations. However, this method is not very effective all the time because of it’s initial essential parameters estimation requirements such as users’ moving speed and directions in both the configurations and their initial locations and orientations etc. Signals reflected by static objects in a specific environment are considered to be domain dependent features. These components are removed through user’s motion and velocity profile modelling across different domains so that the dynamic components of the target movement can be retrieved. These dynamic components are domain invariant features as velocity profiles of different users show unique kinetic characteristics which cannot be changed with cross-environmental conditions. Also, users’ velocity profiles of movements are different for different users. v = (fλ)/2 is the relation built between a user’s velocity and frequency of movement that can estimate the velocity changes during the target movement [10, 11]. Transfer learning [12–14], is a way to use transferable knowledge of one domain already trained on a specific training condition (known user and environment) to train a new domain with few data samples which saves computational cost. There are two types of transfer learning as parameter transfer and feature-representation transfer. In parameter transfer [15], pre-trained models are used to fine-tune new testing domain without the need of training the entire network from scratch. These re-trained models are used to fix initial learned parameters of new domain as these layers are responsible to generate features only focused on model abstraction. They can not contribute to the model final output. A few samples from new testing domain are used to fine-tune only particular layers of the network. In feature representation [13, 14], a shared space is created between the extracted features of training and testing domain by mitigating the distinct features between them. Domain Adversarial Neural Network (DANN) [8], a type of feature representation, is one of the pioneers in the field of domain adaptation that has been applied to many of the cross-domain deep learning problems including device-free WiFi sensing. Its training works in an adversarial fashion to mismatch a generator and a domain discriminator. The generator converges to its optimal performance when discriminator fails to predict domain labels. EI [16] made the use of DANN [8] architecture to extract subject and environment independent features. They worked on three constraints to make the model effective and tolerant against over-fitting. Confidence Control Constraint is responsible to avoid the model getting stuck on local optimum. Smoothing Constraint saves the model to be significantly different in it’s predictions on neighbouring samples. Balance Constraint comes into play when model tends to assign same labels to different but similar type of activities. They changed different source domains and showed in all cases their model accuracy is higher than baseline models (VADA [17], RF [18]). Few-shot learning, is a classification problem of identifying the similarity and differences between training and testing domains using a very few labelled samples from training data. Fidora [19] is a Wireless-based localization system which can locate an objects’ location fingerprints without being subject to WiFi fingerprint inconsistency such as body shapes of new users, objects in the background and daily changes in the environment. Synthetic data fingerprints are generated from labelled data fingerprints and a data augmenter (Variational Auto-Encoder) is applied for this purpose. [20]. The precedence of VAE’s over traditional Auto-Encoders is their capability to generate augmented data samples from a Gaussian distribution N(0, γkI) of original data fingerprints. Baseline models considered in the original paper are AutoFi [21], VAE-only, and FiDo [22] which were tested on cross-user and cross-environmental conditions against Fidora [19]. Evaluation results show its average F1 score is 17.8% and 23.1% better than the benchmark in unlabeled user and varied environment respectively. WiGR [23] is a lightweight few-shot learning based gesture recognition system using WiFi devices. Network ability is its transferable domain shifting learning in new domains. Few- shot learning [24, 25] uses supervised learning to generalize a model for new tasks using only a few data samples. Model was tested against WiGeR [26], WiCatch [27], SignFi [28] and Siamese-LSTM [29] for cross- user, cross-environment and cross-location evaluations. It outperformed all of these conditions against the baseline models. They also analyzed the model complexity in terms of model’s parameters and calculation required. It outperforms other few-shot learning models in model complexity such as [29–32]. JADA [33] is an unsupervised domain adaptation scheme which is proposed to tackle with the vulnerability of spatial dynamics. Evaluation results show that the model achieves 87.8% and 90.3% average recognition accuracy in cross-environmental conditions between large and small conference rooms respectively. Model is also outperforming to 2 state-of-the-art adversarial methods (DIFA [34] and ADDA [35]) under spatial dynamics. CrossGR [36] is a low cost cross-target gesture recognition model which uses generative adversarial network (GAN) for generating synthetic data samples from a small set of real-world data collected on a specific number of users. After data augmentation, it uses those labelled and synthetic data samples for eliminating out the user-related information in order to obtain gesture related features. During the back propagation, these gesture related features help the model to be trained for recognizing new users’ activities. Contrastive Supervision by considering “where” to contract is a novel approach to apply contrastive loss on a time series wearable sensor data on HAR. Their key contribution is to tackle the problem of data augmentation introduced by information loss at different depth of a neural network. By using contrastive loss on intermediate layers of a network, they pushed positive augmented invariant pairs nearby and negative pairs far apart [37]. DSAN [38] is a non-adversarial model which tries to minimize the local sub-domain discrepancies within the same class of the source and target domains using local maximum mean discrepancy (LMMD) loss. DASAN [39] is an adversarial variant of DSAN [38] which is presented to solve fault diagnosis problems in different rotationary parts of machines. It focuses on global adaptation by using a discriminator for domain alignment and LMMD loss calculation between source and target activations for sub-domain alignment. During the LMMD loss calculation, they introduced pseudolabel learning [40] for generating pseudolabels for unlabelled target data.

Preliminaries

Channel state information

Suppose there are M Tx antennae and N Rx antennae in a MIMO system. Let H be a CSI matrix, or called channel fading factor matrix,

\begin{matrix} H = [\begin{matrix} h_{1, 1} & h_{1, 2} & . . . . . & h_{1, M} \\ h_{2, 1} & h_{2, 2} & . . . . . & h_{2, M} \\ . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ h_{N, 1} & h_{N, 2} & . . . . . & h_{N, M} \end{matrix}] \end{matrix}

Each term in H is a complex value representing the magnitude and phase shift of an OFDM sub-carrier in CSI stream as [41],

\begin{matrix} h_{i, j} (f_{k}) = h_{i, j} (f_{k}) e^{j ∠ h_{i, j} f_{k}}, \end{matrix}

(1)

where h_i,j(f_k) and ∠h_i,j(f_k) are the magnitude and phase shift of individual OFDM sub-carrier respectively. f_k is the OFDM sub-carrier central frequency.

With H, the transmitted and received signals can be represented as

\begin{matrix} B (t) = H * A (t) + n (t), \end{matrix}

(2)

where A(t) and B(t) are the matrices of MIMO system transmitting and receiving antennae respectively, and n(t) is the additive White Gaussian noise matrix.

CSI is effective in providing precise information of a channel state. CSI streams are generated by multiple antenna pairs of a transmitter with a receiver, working at different OFDM sub-channels. These OFDM sub-channels operate at their own frequencies. Each sub-channel is associated with CSI amplitude and phase measurements. The collected CSI information over time is 4D matrix M_T,C,N,M, where T is the number of WiFi signal packets, C is the number of subcarriers, and N and M are the number of antennae. From each packet, we can extract CSI features into a magnitude and phase vector of dimension N * M * C. These sub-frequency carriers make different patterns for different activities, thus forming a good foundation for human activity recognition.

Maximum mean discrepancy (MMD) loss

The maximum mean discrepancy (MMD) loss [39] measures the global distribution discrepancy between the source mean embedding and target mean embedding in the reproducing kernel Hilbert space (RKHS) provided that the source and target probability distribution is marginal. It takes two inputs, feature representations of source and target domain generated by the classifier layers as shown in Fig 2. It can be calculated as,

\begin{matrix} L_{M M D} (p_{s}, p_{q}) \equiv | | E_{p} [ϕ (x^{s})] - E_{q} [ϕ (x^{t})] {| |}_{H}^{2} \end{matrix}

(3)

where p_s is the source marginal probability distribution, p_q is the target marginal probability distribution, H is the reproducing kernel Hillbert space (RKHS) endowed with a characteristic kernel k, and ϕ(.) is a mapping function which maps the features into the RKHS. ϕ(.) is associated with characteristic kernel k(x^s, x^t) = < ϕ(x^s), ϕ(x^t) >, where (., .) represents the standard inner product of vectors. According to the theoretical results in [42], the source marginal probability distribution is equal to the target marginal probability distribution if, and only if, L_MMD(p_s, p_q) = 0.

Local maximum mean discrepancy (LMMD) loss

The local maximum mean discrepancy (LMMD) [38] is a variant of MDD loss, measuring the relevant sub-domains distribution discrepancies between the source mean embedding and target mean embedding in the reproducing kernel Hilbert space (RKHS). Unlike MMD loss, it focuses on the alignment of two sub-domains’ relevant features within the same class of an activity. According to a particular class to which samples belong, it introduces weighted samples for each class of the activity. It takes four inputs, feature representations of source and target domain generated by the classifier layers, source true labels and target predicted labels as shown in Fig 3. Mathematically, it can be calculated as,

\begin{matrix} L_{L M M D} (p^{(c)}, q^{(c)}) \equiv E_{c} | | E_{p}^{(c)} [ϕ (x^{s})] - E_{q}^{(c)} [ϕ (x^{t})] {| |}_{H}^{2} \end{matrix}

(4)

where p^(c) and q^(c) are distributions of subdomains $D_{s}^{(c)}$ and $D_{t}^{(c)}$ , and x^s and x^t are samples from source and target domains D_s and D_t, respectively.

Problem definition

Key challenges remain for the widespread deployment of WiFi-based sensing systems, in particular real-world environments involving users with different age, gender, height, body movement speed, location and orientation with respect to the WiFi transmitter and receiver. These aspects can severely impact the WiFi signals features and characteristics such as amplitude, phase and Doppler Frequency Shift (DFS). Consequently, if any of these factors changes from training to the testing of a model there is an inevitable degradation in the system performance caused by varying fluctuations in CSI measurements from training to the testing data samples of same activities. This creates a need to re-train the model for each new domain, requiring the extra burdens of new data collection and re-learning of model parameters and hyperparameters. Moreover, data annotation is cumbersome and time consuming because each domain carries its own specific information related to multi-path wireless propagation. Therefore, re-training a model every time for a new domain is neither feasible nor practical [15]. In order to tackle with the aforementioned problem, researchers have relied on global and sub-domain alignments on an adversarial/non-adversarial model as shown in Fig 4. These models converge easily for inter-domain alignment tasks by matching a source and target domain globally. Unfortunately, global domain adaptation neglects fine-grained information of sub-domains within the same group of different domains. Whereas, it is a time consuming process to converge these models for intra-domain alignment tasks using several loss functions. This leads to a poor transfer learning performance [38]. Cross-user transfer learning in HAR using wireless signals is a sub-domain alignment task within the same class of different activities, yet it is still unknown which type of alignment is best suitable for CSI-image based Wi-Fi data. A global alignment would be a better idea for learning domain-invariant features, by minimizing the distribution discrepancy between the source domain and target domain since CSI data for different activities appears to be quite similar without much significant domain shifting within the data. Thus, it is likely not to align perfectly on relevant sub-domain distributions. In this study we adapt an existing adversarial AI architecture in order to analyze the suitability of global and intra-class alignments for HAR domain shifting applications using wireless signals.

Fig 4 — Left: domain adaptation with global alignment. Right: sub-domain adaptation with intra-class alignment [38, 39].

Materials and methods

Proposed method

We accessed a public dataset available at [43] on 8th June 2023. We have not had access to information that could identify individual participants during or after data collection. Available dataset has CSI magnitude values obtained from 52 sub-carriers. From these raw measurements high frequency content is filtered out as noise. Based on the nature of processed data, architecture of the feature generator can play a crucial role. Researchers have focused more on recurrent neural networks to process CSI data as a time series input with memory cells to keep track of the past inputs. This is because of the nature of CSI data which is continuous and sequential. Recurrent Neural Networks (RNNs) are supposed to be very functional in handling temporal data. These RNNs are a good option to extract key features from input CSI measurements but they need high memory requirements and their processing time is pretty long. For fully exploit the functionality of CNN with time series models in extracting shift invariant features along with the temporal information, there are plenty of 1D-CNN variants. However, these CNNs are merged with RNNs to achieve high precision but model convexity is increased thus simulation time. Our input to a 2D-CNN is a three channel RGB 64×64 CSI-image representation array of colored cells varying in intensity depending upon the magnitude values. Convolutional Neural Networks have widely been used for many applications and revolutionized the field of computer vision because of their low pre-processing requirements and remarkable results for image recognition task. Such networks can adjust filter parameters, thus useful in finding spatial and temporal dependencies in an image. ConvNets are also capable to deal with huge datasets due to their ability to reduce data dimensions. Our proposed model does not depend upon any memory cell to keep track of past inputs and its a very simple yet robust adversarial model which is suitable for applying the global and sub-domain alignments for a multi-class problem. Model is particularly chosen to investigate the impact of different alignments on cross-user domain shifting tasks using wireless sensing.

Our proposed architecture is inspired by the work presented in [39]. The main idea is to examine the effects of global as well as subdomain adaptation on HAR using device free sensing. The proposed model, Deep Adversarial Sub-Domain Adaptation (DASAN), works in three adversarial training steps. Our model architecture is shown in Fig 5 with its simulation parameters represented in Table 1. The domain shared feature extractor is a 2-D CNN. This module is responsible to extract high-level features from the raw source and target domain data samples. Since this module is shared between source and target, it maps source samples x_s and target samples x_t using mapping function F_f with mapping parameter θ_f in such a way that Z_s = F_f(x_s; θ_f) and Z_t = F_f(x_t; θ_f)(Z_s, Z_t ∈ R^M×D) where Z_s, Z_t are corresponding source and target output features with M is the mini-batch size and D is the feature dimensional length. Next comes a label classifier and a domain discriminator. Input to these modules is the extracted features from the previous module. The domain discriminator is responsible for predicting the corresponding domains from source and target data features. The label classifier predicts the labels’ category of the extracted source and target domain features. Classifier is a mapping function C_c with mapping parameter θ_c which maps the generated features to the predicted label $\hat{y}$ in such a way that $\hat{y} = C_{c} (Z_{s}, θ_{c})$ . Finally, the LMMD and MMD loss functions are calculated to isolate the distribution discrepancy between the source and target activations. The LMMD loss measures the distribution discrepancy among relevant sub-domains, whereas the MMD loss measures the distribution discrepancy between the source and target distribution globally.

Fig 5 — The network is constructed of three modules: feature extractor, label classifier and domain discriminator. Step 1 is the training of feature extractor and classifier to obtain discriminative features. Target unlabelled samples are also used to generate pseudolabels. Step 2 is the training of feature extractor, classifier and discriminator using gradient reversal layer. Step 3 is the classification of activities on new target data samples.

Table 1. Structure parameters.

Networks	Layers	Operations
Feature extractor	Conv-Pool-1	Kernel 64-5×5, Stride 1, Padding 0; BN; ReLU; Max-Pool 3×3, Stride 2; Dropout
	Conv-Pool-2	Kernel 64-5×5, Stride 1, Padding 0; BN; ReLU; Max-Pool 3×3, Stride 2; Dropout
	Conv-Pool-3	Kernel 128-5×5, Stride 1, Padding 0; BN; ReLU; Max-Pool 3×3, Stride 2; Dropout
	Conv-Pool-4	Kernel 256-3×3, Stride 1, Padding 0; ReLU
	Flatten	Nodes 256
Label classifier	Linear-1	Node 3072; ReLU
	Linear-2	Node 2048; ReLU
	Linear-3	Node 7; Softmax
Domain classifier	Linear-1	Node 1024; ReLU
	Linear-2	Node 1024; ReLU
	Linear-3	Node 1; Sigmoid

Open in a new tab

The label classifier is trained using the source domain labelled samples and cross entropy loss is measured between the real and predicted source labels to maximize the activity recognition accuracy on source domain that can be defined as,

\begin{matrix} L_{c l s} = - \frac{1}{M} [\sum_{i = 1}^{M} \sum_{c = 1}^{C} I [y_{i_{s}} = c] l o g (C_{c} (F_{f} (x_{i_{s}}; θ_{f}; θ_{c}))] \end{matrix}

(5)

It also leverages pseudolabel learning for reducing the prediction uncertainty of target data unlabelled samples. Pseudolabel learning loss can be calculated as,

\begin{matrix} L_{P s e u d o} = - \frac{1}{M} [\sum_{j = 1}^{M} \sum_{m = 1}^{C} p [{\hat{y}}_{j_{t}} = m | x_{j_{t}}] l o g (p [{\hat{y}}_{j_{t}} = m | x_{j_{t}})] \end{matrix}

(6)

Also, the predicted labels of the label classifier for the target domain unlabelled data samples are used to calculate the LMMD and MMD losses. Thus, the objective function of label classifier can be defined as,

\begin{matrix} L_{c} = L_{c l s} + α L_{P s e u d o} + β (L_{M M D} / L_{L M M D}) \end{matrix}

(7)

where α, and β are the tradeoff parameters.

The purpose of domain discriminator is to minimize the global distribution discrepancy by learning domain invariant features. This adversarial role of domain discriminator is played by a two-player minmax game. The domain discriminator itself is liable to differentiate between the source and target domains as first player. The feature extractor is trained to fool the domain discriminator as second player of the game. Domain Discriminator is a mapping function D_d with mapping parameter θ_d which maps the generated features in domain d such as d = D_d(f, θ_d)(x_i ∈ D_s if d_i = 1 otherwise x_j ∈ D_t if d_j = 0. Its adversarial loss can be defined as,

\begin{matrix} L_{a d v} = - \frac{1}{M} \sum_{i = 1}^{M} d_{i} l o g [D_{d} (F_{f} (x_{i_{s}}; θ_{f}); θ_{d})] - \frac{1}{M} \sum_{i = 1}^{M} (1 - d_{i}) l o g [D_{d} (F_{f} (x_{j_{t}}; θ_{f}); θ_{d})] \end{matrix}

(8)

The total loss of the model can be calculated as,

\begin{matrix} L_{t o t a l} = L_{c l s} - γ L_{a d v} + β L_{L M M D} + α L_{P s e u d o} (in case of LMMD Loss) \end{matrix}

(9)

\begin{matrix} L_{t o t a l} = L_{c l s} - γ L_{a d v} + β L_{M M D} + α L_{P s e u d o} (in case of MMD Loss) \end{matrix}

(10)

where L_cls is the classifier loss, L_adv is the discriminator adversarial loss, α, β and γ are the tradeoff parameters.

Experimental results

Dataset

We use a public dataset available at [43] to assess model performance, named as the Parisafm dataset. The dataset was collected with the involvement of 3 volunteers, thus suitable for cross-user domain adaptation. The participants performed 7 different activities including walk, run, fall, lie down, sit down, stand up, and bend in an experimental environment. Each activity was repeated for 20 trials. In total, there are 420 labelled data samples which are equally divided among three different subjects. For adversarial training the source domain is always equipped with labeled samples for a particular subject/combination of subjects, while the target domain is treated as unlabeled data samples coming from the other subject/combination of subjects during model training. The Raspberry Pi was used as a WiFi-enabled platform for packet reception and a Nexmon Tool [44] was employed for data collection process. Each subcarrier has a complex representation of CSI values. These complex values have magnitude and phase information about a specific activity. For mode simulation, only CSI magnitude values are being employed. A low pass filter is used for the reduction of high-frequency content which is treated as noise. These values are normalized between 0 and 255 for a colored image representation. These RGB colored images are then generated as a MATLAB pseudocolor plot, shown in Fig 6. This results in an array of colored cells also known as a face. Each image is resized to 64×64 scale.

Model evaluation

To evaluate the inter- and intra-domain adaptation on HAR using wireless sensor data comprehensively, we have two different variations of the proposed model, with their transfer results being compared to another model, Deep Subdomain Adaptation Network (DSAN) [38]. DSAN [38] is a non-adversarial model with a simple architecture of a shared feature extractor and a classifier. Features generated by the extractor for source labelled data and target unlabelled data are fed to the classifier layers one at a time. Thereafter, maximum mean discrepancy (MMD) and local maximum mean discrepancy (LMMD) losses are calculated between these source and target activations for examining the effects of global and subdomain alignments respectively. The proposed model is also tested for global and local sub-domain adaptations using the same loss minimization functions that is DASAN-LMMD and DASAN-MMD. Finally, these two variations of proposed model are compared with DSAN-MMD and DSAN-LMMD against the measuring parameters of model activity recognition micro- and macro-F1 scores, the harmonic mean of precision and recall, on cross-user domain shifting tasks. Micro-F1 score aggregates the contributions of all instances, and the macro-F1 score computes the metric independently for each class and then takes the average [45]. Since we have an imbalanced dataset, we also report macro-F1 score, which takes equal contribution from majority and minority classes to achieve objective results. Simulation time for each model is additionally measured for comparison.

The dataset used for model evaluation has three different subjects involved for performing seven different activities. We have tested each model for nine different domain shifting tasks with subject 1, 2 and 3 are interchangeably used for source to target domains. In order to report our model simulation results, we are following evaluation approach mentioned in [45]. Each case is run ten times and their average is calculated for an unbiased models comparison. We also compute and report 95% confidence intervals for each performance metric. The cross-user domain shifting task measures the accuracy of adopting an activity model trained on one user (male/female) with some physical appearance (e.g., weight, height, age) to another with different physical appearance.

Models comparison of micro- and macro-F1 scores

Tables 2–9, report the micro- and macro-F1 scores of DASAN and baseline technique with MMD and LMMD losses on nine cross-user experiments with different target data training samples varying from 100% to 10%. These are averaged F1 scores over 10 runs of the nine cross-user experiments reported in the table. DASAN-MMD obtains the highest average of averaged micro- and macro-F1 scores of nine cross-user domain-shifting tasks on varying target data training samples: 0.69 and 0.64, which is 0.019 and 0.017 higher than DASAN-LMMD, the second best performing technique. In addition, DASAN-MMD outperforms DSAN-MMD with 0.094 and 0.105 in micro- and macro-F1 scores, whereas it is 0.118 and 0.14 higher in micro- and macro-F1 scores than DSAN-LMMD, the least performing technique among all. We can also observe the DASAN-MMD model reliability with reduced target data training samples that is no less than 0.62 and 0.57 for averaged micro- and macro-F1 scores even for the worst case of only 10% of target data training samples. This concludes that global adaptation is a better option for HAR using wireless signals in terms of achieving higher model micro- and macro-F1 scores. Looking more closely at different cross-user tasks on the Parisafm dataset, we have plotted the averaged micro- and macro-F1 scores on varying target data training samples depicted in Figs 7 and 8.