Skip to main content
[Preprint]. 2024 Nov 26:arXiv:2411.15240v2. [Version 2]

Table 3. PAT-M Experiments.

All models are pretrained and then end-to-end fine-tuned on the benzodiazepine training data. For the experiments, 1,000 participants were removed from the training data to create an evaluation set, and these 1,000 participants are separate from the held-out test set seen in Table 1. The “Avg Score” metric is the average AUC score on the evaluation set after the medium model was trained on dataset sizes “500”, “1,000”, “2,500”, and “4,769”. (a) We tested a PAT-M pretrained using MSE loss on every data point. We found that a higher mask ratio during pretraining leads to better results. (b) We tested PAT-M pretrained on 90% masking and MSE loss on all data and found that smoothing does not improve performance. (c) We tested a PAT-M with 90% masking and found that MSE on only the masked patches decreased performance drastically.

(a) MODEL (Mask Ratio) Avg Score*
Medium 0.25 0.737
Medium 0.50 0.707
Medium 0.75 0.743
Medium 0.90 0.773
(b) MODEL (Smoothing) Avg Score* Avg Score*
Medium (Smoothed) 0.741
Medium (Not Smooth) 0.773
(c) MODEL (Loss Function) Avg Score* Avg Score*
Medium (MSE Only Masked) 0.541
Medium (MSE All) 0.773