Algorithm 1. Pseudocode of the 2SpamH algorithm which has two stages: (1) prototype selection in the feature space of device use and sensor activity levels to label data points as “missing” or “non-missing” with some confidence based on a threshold, and (2) a k-nearest neighbors (KNN) approach to label non-prototype data points in the feature space based on their proximity to the labeled prototypes. The algorithm returns “missing” labels for all data points. | |
2SpamH Algorithm | |
Input: Sensor activity matrix , Device usage matrix , Prototype selection percentiles , Number of nearest neighbors | |
Output: Missing label matrix | |
Stage 1: Prototype Selection | |
1. | Perform PCA on and to obtain the principal components: |
where and are vectors of length T of the first principal components of and . If ncol() = 1, then ; if ncol() = 1, then = . | |
2. | Construct the feature space as the set of points for each t: |
where represents the coordinates of the tth data point in the constructed feature space. | |
3. | Compute the lower and upper quantiles for and : |
4. | Identify the set of missing prototypes in the feature space : |
5. | Identify the set of non-missing prototypes in the feature space F: |
6. | For each data point : |
7. | Assign labels to rows of based on whether data points fall within the prototype regions: |
Stage 2: Labeling Unlabeled Data Using KNN | |
8. | For each unlabeled data point that was not assigned a label in Stage 1: |
9. | Implement KNN with and Euclidean distance function to label the remaining unlabeled data points: |
Return: Missing label matrix |