Model Performance. Results for performance metrics on the test set are displayed for the machine learning models using manually and automatically segmented masks. Accuracy, sensitivity, and specificity for the CNN models are shown with 95% confidence intervals in parentheses. Confidence intervals were calculated using the adjusted Wald method. Comparisons between performance metrics between models was performed with a McNemar test for paired proportions. Statistically significant p-values are highlighted in bold (p<0.05); CNN: convolutional neural network; RSF: random survival forest; AUC: area under the receiver operating characteristic curve.