The table summarizes the performance of our PAT models versus various baseline models (including LSTM, CNN, ConvLSTM, and 3D CNN) across four tasks: predicting SSRI usage, history of any sleep disorder, abnormal sleep patterns, and depression. Each model is trained on dataset sizes “500”, “1,000”, “2,500”, and all available data (5,769 for SSRI usage, 3,429 for Sleep Disorder and Sleep abnormalities, and 2,800 for Depression) and evaluated using AUC on a held-out test set of 2,000 participants. The score for each model here represents the averaged AUC scores across each training dataset size. If the model name has “smoothing” after it, it denotes that the model was trained on smoothed data. An underline indicates the best baseline model. PAT-S/M/L denotes Small, Medium, Large. A bolded PAT model indicates that it performed better than the best baseline, and a bolded and underlined PAT indicates the model with the best performance. The results suggest that PATs outperform baseline models in various actigraphy understanding tasks and at various dataset sizes.