Fig. 1. Performance of XGBoost models in predicting penumbra-to-core ratio >= 1.8 across different text-cutoff thresholds.
Receiver-operating characteristic (ROC) curves for models trained using structured features only (red), document embeddings only (green), and both structured features and document embeddings (blue). Panels (a), (b), and (c) correspond to models trained with text data generated with cutoffs of 500, 1000, and 5000 characters, respectively. The dashed line represents the performance of a random classifier (AUROC = 0.5).
