All included images from the North American, Nepali, and combined datasets were split such that 90% of the images were distributed into the training set (A) and 10% of the images were distributed into the test set (B). Each dataset was stratified to maintain similar ratios of images with and without stage in each training and test set, as well as separated on a patient-level to ensure all training and test sets contained no overlapping patients. Each training set (A) was further split into 5 cross validation splits, retaining the underlying distribution of stage. Models were created using the North American dataset alone, the Nepali dataset alone, or both datasets combined. A) Training Set, divided into 5 splits for 5-fold Cross Validation