Skip to main content
. Author manuscript; available in PMC: 2025 Jan 27.
Published in final edited form as: JCO Clin Cancer Inform. 2023 Aug;7:e2300049. doi: 10.1200/CCI.23.00049

Table 1.

Dataset details for each model

Dataset
Study Size Median follow-up time (years) Age range Distribution Source Availability

[16] 586 12 39 88 - - Istituto Nazionale dei Tumori of Milan, Italy (INT) -

[17] 1868 5.6 25 89 - R:11%, NR:89%1 Memorial Sloan-Kettering Cancer Center -

[18] 378 9 - - Central Pathology Office in Milan, Italy -

[19] 679 8 - R:71%, NR:29% Korean tertiary teaching hospital -

[20] 579 16 37 57 - R:19%, NR:81% Isfahan Sayed-o-Shohada cancer research center -

[21] 679 7.1 21 83 - R:29%, NR:71% Korean tertiary teaching hospital -

[22] 15314 6.09 19 98 - R:12%, NR:88% University of Texas MD Anderson Cancer Center -

[23] 217 10 32 85 - 27.64%, 23.50%, 23.50%, 25.34%2 Omid Hospital of Mashhad -

[34] 4735 9.8 57 71 - R:7%, NR:93% ATAC (Arimidex, Tamoxifen, Alone or in Combination) dataset Dataset link

[24] 192 5 25 67 - LR:22%, IR:64%, HR:14%3 Single institute in Korea -

[25] 217 - - R:40%, NR:60% Grade-A3 hospital in eastern China -

[26] 6447 3.99 - R:7%, NR:93% Northwestern Memorial Hospital -

[27] 320 6 30 84 - LR:45%, IR:44%, HR: 11% 3 French hospitals: Besançon, Belfort and Dijon -

[35] 1984 6 - R:24%, NR:76% UCI Machine Learning Repository Dataset link

1R: Recurrent cases, NR: Non recurrent cases.
2Four classes distributed according to the time to recurrence (0–149 months).
3LR: Low risk of recurrence, IR: Intermediate risk, HR: High risk.
4The study used two datasets for model training, but we only considered the dataset on which the model performed the best.

Dataset
Study Size Median follow-up time (years) Age Distribution Source Availability

[28] 575 7 - Mizoram Cancer Institute -

[29] 13117 6 43 55 - R:9%, NR:91%1 Samsung Medical Center -

[36] 127 8 - R:25%, NR:75% Cancer Hospital of the Chinese Academy of Medical Sciences Dataset link

[30] 8956 7.46 41 67 - - Oncoshare breast cancer database from Stanford University -

[31] 138 - 27 86 - LR:60%, HR:40%2 Taipei Veterans General Hospital -

[14] 3532 9 45 70 - - Five North American institutions -

[32] 4757 4 20 70+ - R:4%, NR:96% Japanese Association for Theranostics -

[15] 64044 15 85+ - SEER database, U.S. National Cancer Institute Dataset link

[33] 6486 4.75 20 90 - R:8%, NR:92% 29 hospitals affiliated with the Chinese Society of Breast Surgery hospitals -

1R: Recurrent cases, NR: Non recurrent cases.
2LR: Low risk of recurrence, HR: High risk.