[Preprint]. 2025 Jan 14:arXiv:2411.15240v3. [Version 3]

Table 3. PAT-M Experiments.

All models are pretrained and then end-to-end fine-tuned on the benzodiazepine training data. For the experiments, 1,000 participants were removed from the training data to create an evaluation set, and these 1,000 participants are separate from the held-out test set seen in Table 1. The “Avg Score” metric is the average AUC score on the evaluation set after the medium model was trained on dataset sizes “500”, “1,000”, “2,500”, and “4,769”. (a) We tested a PAT-M pretrained using MSE loss on every data point. We found that a higher mask ratio during pretraining leads to better results. (b) We tested PAT-M pretrained on 90% masking and MSE loss on all data and found that smoothing does not improve performance. (c) We tested a PAT-M with 90% masking and found that MSE on only the masked patches decreased performance drastically.

(a) MODEL (Mask Ratio)	Avg Score*
Medium 0.25	0.737
Medium 0.50	0.707
Medium 0.75	0.743
Medium 0.90	0.773

(b) MODEL (Smoothing)	Avg Score*	(c) MODEL (Loss Function)	Avg Score*
Medium (Smoothed)	0.741	Medium (MSE Only Masked)	0.541
Medium (Not Smooth)	0.773	Medium (MSE All)	0.773