Skip to main content
. 2021 Jul 19;11:14645. doi: 10.1038/s41598-021-94243-z

Table 10.

Distribution of patient categorical features over training, validation, and testing sets.

Train Validation Testing
Lymph node
Num 326 186 200
NEG 113 34% 56 30% 67 33%
POS 213 65% 130 69% 133 66%
No data 4 4 2
Stage
Num 240 141 - 155
1 50 20% 25 17% 27 17%
0 1 0% 0 0% 0 0%
3 27 11% 25 17% 25 16%
2 161 67% 91 64% 100 64%
4 1 0% 0 0% 3 1%
No data 90 49 47
Grade
Num 323 185 197
1 19 5% 5 2% 9 4%
3 207 64% 120 64% 127 64%
2 97 30% 60 32% 61 30%
No data 7 5 5
Subtype
Num 328 187 202
Normal 27 8% 15 8% 10 4%
Basal 75 22% 27 14% 32 15%
Her2 52 15% 27 14% 39 19%
LumB 67 20% 50 26% 42 20%
Claudin-low 37 11% 23 12% 29 14%
LumA 70 21% 45 24% 50 24%
No data 2 3 0
Surgery
Num 326 186 200
MASTECTOMY 209 64% 118 63% 130 65%
BREAST-CONSERVING 117 35% 68 36% 70 35%
No data 4 4 2
Histology
Num 330 190 202
IDC+ILC 11 3% 14 7% 10 4%
IDC-MUC 6 1% 6 3% 5 2%
ILC 25 7% 9 4% 15 7%
OTHER-INVASIVE 1 0% 1 0% 0 0%
OTHER 1 0% 0 0% 0 0%
IDC-MED 6 1% 2 1% 3 1%
INVASIVE-TUMOUR 3 0% 0 0% 1 0%
IDC-TUB 6 1% 3 1% 3 1%
DCIS 1 0% 0 0% 0 0%
IDC 270 81% 155 81% 165 81%
No data 0 0 0
Menopause
Num 330 190 202
Pre 111 33% 56 29% 60 29%
Post 219 66% 134 70% 142 70%
No data 0 0 0
Her2 SNP6
Num 330 190 202
NEUT 224 67% 131 68% 137 67%
LOSS 20 6% 8 4% 8 3%
GAIN 86 26% 51 26% 57 28%
No data 0 0 0
Laterality
Num 306 181 189
r 140 45% 92 50% 87 46%
l 166 54% 89 49% 102 53%
No data 24 9 13
Cluster
Num 330 190 202
4.5 34 10% 19 10% 21 10%
10 78 23% 36 18% 33 16%
1 24 7% 17 8% 10 4%
3 33 10% 22 11% 23 11%
2 8 2% 12 6% 8 3%
5 49 14% 27 14% 30 14%
4 17 5% 5 2% 12 5%
7 20 6% 11 5% 8 3%
6 10 3% 7 3% 10 4%
9 25 7% 16 8% 19 9%
8 32 9% 18 9% 28 13%
No data 0 0 0
Cohort
Num 330 190 202
1 95 28% 58 30% 57 28%
3 115 34% 61 32% 73 36%
2 44 13% 29 15% 36 17%
5 27 8% 19 10% 14 6%
4 49 14% 23 12% 22 10%
No data 0 0 0
ER IHC
Num 328 189 199
Neg 150 45% 65 34% 72 36%
Pos 178 54% 124 65% 127 63%
No data 2 1 3
ER/HER status
Num 292 167 185
HER2+ 53 18% 24 14% 31 16%
ER−/HER2− 86 29% 37 22% 43 23%
ER+/HER2–High-Prolif 82 28% 65 38% 65 35%
ER+/HER2–Low-Prolif 71 24% 41 24% 46 24%
No data 38 23 17
Cellularity
Num 320 186 200
High 180 56% 92 49% 106 53%
Moderate 108 33% 70 37% 75 37%
Low 32 10% 24 12% 19 9%
No data 10 4 2