Comprehensive evaluation of a hybrid neural network model for classifying five types of activities from accelerometer data. Top-left: Confusion matrix displaying the recall of model classifications for activities (‘other’, ‘eating’, ‘exercise’, ‘medication’, and ‘smoking’). Top-right: Scatter plot of the macro F1 score versus average network confidence, with a Pearson correlation coefficient (r = 0.71) indicated by the red line. Bottom-left: Box plots showing the distribution of confidence, F1 score, precision, and recall metrics across cross-validation folds, illustrating performance consistency and variability. Bottom-right: Curve illustrating the relationship between the confidence threshold and the average F1 score, demonstrating how model performance optimizes at higher thresholds. These panels collectively highlight the model’s effectiveness and potential utility in real-world applications for health monitoring and behavior analysis.