Skip to main content
. 2024 Oct 31;24(21):7053. doi: 10.3390/s24217053
Algorithm 1. Pseudocode of the 2SpamH algorithm which has two stages: (1) prototype selection in the feature space of device use and sensor activity levels to label data points as “missing” or “non-missing” with some confidence based on a threshold, and (2) a k-nearest neighbors (KNN) approach to label non-prototype data points in the feature space based on their proximity to the labeled prototypes. The algorithm returns “missing” labels for all data points.
2SpamH Algorithm
Input: Sensor activity matrix W, Device usage matrix Z, Prototype selection percentiles {θlower, θupper}, Number of nearest neighbors k
Output: Missing label matrix M
Stage 1: Prototype Selection
1. Perform PCA on Z and W to obtain the principal components:
CZ=PCA(Z), CW=PCA(W)
where CZ and CW are vectors of length T of the first principal components of Z and W. If ncol(Z) = 1, then CZ=Z; if ncol(W) = 1, then CW= W.
2. Construct the feature space F as the set of points ft=(CtZ,CtW) for each t:
F={ft|t=1, , T}
where ft=(CtZ,CtW) represents the coordinates of the tth data point in the constructed feature space.
3. Compute the lower and upper quantiles for CZ and CW:
qlowerZ,qupperZ=Q(CZ, θlower),Q(CZ, θupper)
qlowerW,qupperW=Q(CW, θlower),Q(CW, θupper)
4. Identify the set of missing prototypes in the feature space F:
Pmissing={ftF|CtZ<qlowerZ and CtW<qlowerW}
5. Identify the set of non-missing prototypes in the feature space F:
Pnon-missing={ftF|CtZ>qupperZ and CtW<qupperW}
6. For each data point ftF:
7. Assign labels to rows of M based on whether data points fall within the prototype regions:
Mt,:=Missing if ftPmissingNon-missingif ftPnon-missingNA otherwise
Stage 2: Labeling Unlabeled Data Using KNN
8. For each unlabeled data point ftF that was not assigned a label in Stage 1:
9. Implement KNN with K=k and Euclidean distance function
          dft, ft=(CtZCtZ2+CtWCtW2)
to label the remaining unlabeled data points:
          Mt,::=KNN(ft|Pmissing, Pnon-missing, k)
Return: Missing label matrix M.