Abstract
Purpose
Oncotype DX (ODX) predicts the risk of recurrence and benefits of adding chemotherapy for patients with estrogen receptor positive (ER+)/human epidermal growth factor receptor 2 negative (HER2−) early-stage breast cancer. We aimed to develop a simplified scoring system using readily available clinicopathological parameters to predict a high-risk ODX recurrence score (RS) while minimizing reproducibility issues regarding Ki-67 index evaluation methods.
Methods
We enrolled 300 patients with ER+/HER2− early breast cancer, for whom ODX RS data were available in the test set. Using the QuPath image analysis platform, we systematically evaluated the average, hotspot, and hottest spot Ki-67 scores in the test set. Logistic regression analyses were conducted to establish a predictive scoring system for high-risk ODX RS. An independent validation set comprising 117 patients over different periods was established.
Results
Factors such as age ≤ 50 years, invasive ductal carcinoma tumor type, histologic grade 2 or 3, tumor necrosis, progesterone receptor negativity, and a high Roche-analyzed Ki-67 score (> 20) were associated with high-risk ODX RS. These variables were incorporated into our scoring system. The area under the curve of the scoring system was 0.8057. When applied to both the test and validation sets with a cutoff value of 3, the sensitivity of our scoring system was 92%.
Conclusion
We successfully developed a scoring system based on the systematic evaluation of Ki-67 scoring methods. We believe that our user-friendly predictive scoring system for high risk ODX RS could help clinicians in identifying patients who may or may require additional ODX testing.
Keywords: Breast, Carcinoma, Drug Therapy, Ki-67 Antigen, Recurrence
INTRODUCTION
The Oncotype DX (ODX) test, a commercially available 21-gene breast cancer recurrence score (RS) assay, provides both prognostic and predictive information for estrogen receptor positive (ER+)/human epidermal growth factor receptor 2 negative (HER2−) early-stage breast cancer [1,2]. However, its cost and limited accessibility pose challenges in routine clinical practice.
Several studies have developed models predicting ODX RS using easily obtainable clinicopathological data in routine clinical practice. However, only a few predictive models have been developed in Korea [3,4,5], mostly derived from Western populations [6,7,8].
The ODX RS is largely influenced by the proliferation group score due to its considerable weighting in the calculation, contrasting with the relatively lower contributions from the ER and HER2 group scores [9]. Consequently, the Ki-67 labeling index (LI) determined through immunohistochemistry (IHC) may be a surrogate for ODX RS. Several studies have highlighted Ki-67 LI as the main contributor to influencing ODX RS [2,10,11].
Despite this potential, Ki-67 LI analysis has not been widely adopted for clinical breast cancer management primarily because of reproducibility and standardization issues across observers and institutions. There is no consensus on whether the average counting performed for the entire slide image or only hotspot counting for the selected hotspot images should be used. There is also an ongoing debate about the exact cutoff point for high Ki-67 LI. For this reason, Ki-67 LI limitations are frequently cited in studies developing ODX RS predictive models, with most Western models excluding it as a parameter.
In a 2019 consensus meeting by the International Ki-67 in Breast Cancer Working Group (IKWG) [12], while average counting showed better reproducibility than hotspot counting, no statistically significant difference was found between the methods. Moreover, automated scoring methods were deemed comparable to manual visual scoring. Accordingly, Acs et al. [13] demonstrated that digital image analysis (DIA) to pathological evaluation could improve the standardization of Ki-67 assessment. Furthermore, among the DIA platforms used in this study, the QuPath had the highest reproducibility and lowest variability [13].
To address Ki-67-related challenges and develop a robust ODX RS predictive model, we systematically evaluated three Ki-67 scoring methods: average, hotspot, and hottest spot Ki-67 scores, using the QuPath open-source image analysis software. Additionally, we correlated these scores with an automated Ki-67 scoring method used in actual clinical settings at our institution. Subsequently, we developed a predictive scoring system for high-risk ODX RS, integrating Ki-67 scores and readily available clinicopathological variables.
METHODS
Patient selection
Following approval by the Institutional Review Board (IRB) of Samsung Medical Center (2023-03-064), a cohort of 300 ER+/HER2− early breast cancer patients, diagnosed between January 2011 and December 2019 at Samsung Medical Center, was retrospectively enrolled for the test set. The requirement for informed consent was waived by the IRB due to the retrospective nature of the study. All patients underwent surgery. A comprehensive database was developed using the information obtained from ODX testing and other clinicopathological variables (age, tumor size, tumor type, tumor histologic grade, tumor nuclear grade, tumor necrosis, lymphovascular invasion, lymph node [LN] status, ER and progesterone receptor [PR] Allred scores, and HER2 status) available for these patients. Tumor necrosis (TN) is histologically defined as homogeneous clusters and sheets of dead and degraded tumor cells that merge into an amorphous coagulum, mixed with nuclear and cytoplasmic debris [14]. ER, PR, and HER2 interpretations were based on updated American Society of Clinical Oncology/College of American Pathologists recommendations [15,16]. ER and PR statuses were evaluated using Allred scores. ODX RS was categorized as low-risk (patients aged > 50 years with ODX RS < 25 or patients aged < 50 years with ODX RS < 20) and high-risk (for all other cases) according to the results from the Trial Assigning Individualized Option for Treatment (TAILORx) trial [17].
Assessment of Ki-67 index
To determine the optimal Ki-67 scoring method, we used QuPath (version 0.4.3), an open-source image analysis software known for its high intra-platform reproducibility [13]. Additionally, we utilized an automated Ki-67 scoring method similar to the IKWG scoring method for Ki-67 [12], implemented in clinical settings at our institution using the Roche Ki-67 (30-9) image analysis platform (Roche Diagnostics, Indianapolis, USA). This platform calculates Ki-67 scores as percentages of positively stained cells among all tumor cells by counting at least 1,000 invasive cancer cells.
An overview of the Ki-67 analysis workflow using QuPath is shown in Figure 1, based on the methods described by Thakur et al. [18] and Paik et al. [19]. Square grids were created by a pathologist (J.M.K.) to cover the entire invasive tumor area, with a grid size of 500 μm. Grids containing ductal carcinoma in situ or lymphoid aggregates and those containing staining artifacts were manually excluded upon visual inspection. After identifying Ki-67 stained cells using a single threshold for positive cell detection, a case-specific random forest cell classifier was created by identifying representative tumor and stromal cells and applied to all grids. The annotated data for each grid were exported to a spreadsheet, and grids with fewer than 100 tumor cells were excluded. Ki-67 LI was calculated for all grids. The maximum Ki-67 LI was labeled as the hottest spot score. The average of the top five grids was labeled as the hotspot score. The average score was defined as the sum of Ki-67 stained tumor cells divided by the sum of tumor cells across all grids.
Figure 1. Overview of QuPath image analysis workflow. (A) Entire invasive tumor area is annotated as a region of interest using a hand drawing tool. (B) Grids measuring 500 × 500 µm are generated to fill the region of interest, and then positive cell detection within each grid is performed. (C) Each tumor and stromal cell are annotated to classify each differently. (D) Magnified view of a single grid after cell classifier is created based on annotation of tumor and stroma cells; positive tumor cell nuclei (red), negative tumor cell nuclei (blue), stroma cell nuclei (yellow green).
Simplified high risk ODX RS predictive scoring system development and statistical analysis
The primary outcome was the probability of the ODX RS being at high risk. Clinicopathological variables correlated with high-risk ODX RS were assessed using univariate logistic regression analysis. Variables with p < 0.05 from the univariate analysis were selected. Multivariable logistic regression analysis with significant variables was conducted to identify predictors of high-risk ODX RS. This model was used to develop the current high-risk ODX RS predictive scoring system. Correlations between the Ki-67 scoring methods were statistically tested using Pearson’s correlation coefficient (Pearson’s r and two-tailed p-values). All statistical tests were two-sided, and statistical significance was set at p < 0.05. Statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, USA) and R 4.3.0 (R Foundation for Statistical Computing, Vienna, Austria; http://www.R-project.org/).
Simplified high risk ODX RS predictive scoring system validation
Performance of the scoring system was evaluated in terms of discrimination and calibration. Discrimination was evaluated using the area under the receiver operating characteristic (ROC) curve. Calibration was evaluated by plotting calibration curves against the actual ODX high-risk outcomes. The simplified high-risk ODX RS predictive scoring system was validated using an independent validation set. This validation set consisted of 117 patients with ER+/HER2− early breast cancer who underwent ODX at the same institution between January 2019 and December 2020. The same predictor and outcome variables from the test set were used to generate the logistic regression model for the validation set. The performance assessment for the validation set was conducted using the same method as that used for the test set.
RESULTS
Baseline patient characteristics
The clinicopathological characteristics of the test and validation groups are summarized in Table 1. The median age of both groups was 48 years (range, 29–74 years in the test set and 31–73 years in the validation set). In the test group, 248 patients (82.7%) were classified as low-risk group based on the ODX RSs, and the remaining 52 (17.3%) were classified as high-risk. In the validation group, 92 (78.6%) patients had low-risk ODX RS, and 25 (21.4%) had high-risk RS. Most patients in both sets (88.0% and 87.2% in the test and validation groups, respectively) had invasive ductal carcinoma. The average tumor size was 1.9 cm (range, 0.4–11 cm) in the test set and 2.2 cm (range, 0.7–6.5 cm) in the validation set. LN metastasis was absent in 218 patients (72.7%) in the test group and in 95 (81.2%) in the validation group. The mean percentage of Roche-analyzed Ki-67 scores and QuPath-analyzed Ki-67 average scores was higher in the validation group than in the test group (14.94 ± 11.12 and 8.89 ± 6.62 vs. 22.02 ± 14.62 and 13.11 ± 8.7, respectively).
Table 1. Clinicopathologic characteristics of the test set and the validation set.
Characteristic | Test set (n = 300) | Validation set (n = 117) | p-value | |
---|---|---|---|---|
ODX RS risk | 0.340 | |||
Low | 248 (82.7%) | 92 (78.6%) | ||
High | 52 (17.3%) | 25 (21.4%) | ||
Age (yr) | 0.143 | |||
≤ 50 | 212 (70.7%) | 74 (63.2%) | ||
> 50 | 88 (29.3%) | 43 (36.8%) | ||
Histologic type | 0.818 | |||
IDCa | 264 (88.0%) | 102 (87.2%) | ||
Other | 36 (12.0%) | 15 (12.8%) | ||
Tumor size (cm) | 0.030 | |||
≤ 2 | 196 (65.3%) | 63 (53.8%) | ||
> 2 | 104 (34.7%) | 54 (46.2%) | ||
LN status | 0.071 | |||
pN0 | 218 (72.7%) | 95 (81.2%) | ||
pN1 | 82 (27.3%) | 22 (18.8%) | ||
Nuclear grade | < 0.001 | |||
1 | 31 (10.3%) | 1 (0.9%) | ||
2 | 253 (84.3%) | 98 (83.8%) | ||
3 | 16 (5.3%) | 18 (15.4%) | ||
Histologic grade | < 0.001 | |||
1 | 76 (25.3%) | 9 (7.7%) | ||
2 | 210 (70.0%) | 95 (81.2%) | ||
3 | 14 (4.7%) | 13 (11.1%) | ||
Necrosis | 0.982 | |||
Absent | 269 (89.7%) | 105 (89.7%) | ||
Present | 31 (10.3%) | 12 (10.3%) | ||
LVI | 0.473 | |||
Absent | 194 (64.7%) | 80 (68.4%) | ||
Present | 106 (35.3%) | 37 (31.6%) | ||
ER (Allred score) | 0.802 | |||
5 | 1 (0.3%) | 0 (0.0%) | ||
6 | 1 (0.3%) | 1 (0.9%) | ||
7 | 26 (8.7%) | 9 (7.7%) | ||
8 | 272 (90.7%) | 107 (91.5%) | ||
PR status | 0.383 | |||
Negative | 12 (4.0%) | 7 (6.0%) | ||
Positive | 288 (96.0%) | 110 (94.0%) | ||
Ki-67 expression (Roche-analyzed) | < 0.001 | |||
Mean (range) | 14.94 (0.09–63.85) | 22.02 (0.79–71.39) | ||
Ki-67 expression (Roche-analyzed) | < 0.001 | |||
≤ 20 | 232 (77.3%) | 63 (53.8%) | ||
> 20 | 68 (22.7%) | 54 (46.2%) | ||
Ki-67 expression (Average QuPath) | < 0.001 | |||
Mean (range) | 8.89 (0.05–38.01) | 13.11 (0.47–42.49) | ||
Ki-67 expression (Average QuPath) | < 0.001 | |||
≤ 10 | 211 (70.3%) | 53 (45.3%) | ||
> 10 | 89 (29.7%) | 64 (54.7%) |
ODX RS = Oncotype DX recurrence score; IDCa = invasive ductal carcinoma; LN = lymph node; LVI = lymphovascular invasion; ER = estrogen receptor; PR = progesterone receptor.
Automated Ki-67 analysis
To avoid subjectivity in the selection of microscopic fields for scoring and achieve unbiased, objective Ki-67 index scoring results, we examined whole Ki-67 stained slides from all 300 cases in the test set. We performed only automated Ki-67 analysis and compared Roche-analyzed Ki-67 scores, according to the IKWG recommendations used for pathology reports at our institution, with QuPath-analyzed Ki-67 average, hotspot, and hottest spot scores.
We observed a strong correlation between the QuPath-analyzed Ki-67 average scores and Roche-analyzed Ki-67 scores according to IKWG recommendations (Pearson r = 0.95, p < 0.0001 for continuous variables and Pearson r = 0.521, p < 0.0001 for categorical variables). However, there was a poor correlation between the QuPath-analyzed KI-67 hotspot scores and Roche-analyzed Ki-67 scores according to IKWG recommendations (Pearson r = 0.006, p = 0.946 as continuous variables and Pearson r = −0.064, p = 0.49 as categorical variables). We also observed a poor correlation between the QuPath-analyzed KI-67 hottest spot scores and Roche-analyzed Ki-67 scores according to the IKWG recommendations (Pearson r = 0.021, p = 0.822 as continuous variables and Pearson r = −0.041, p = 0.658 as categorical variables).
Clinicopathologic factors associated with high risk ODX RS in the test set
Univariate logistic regression analysis revealed that age, tumor type, nuclear grade, histological grade, tumor necrosis, PR status, Roche-analyzed Ki-67 score according to the IKWG recommendations (continuous and categorical variables), and QuPath-analyzed Ki-67 average score (continuous and categorical variables) were significantly associated with ODX RS (Tables 2 and 3). In contrast, the QuPath-analyzed Ki-67 hotspot score and the QuPath-analyzed Ki-67 hotspot score showed no statistically significant association with ODX RS (Table 3).
Table 2. Comparison of clinicopathologic variables in the low risk and high risk groups in the test set and univariate analysis results for predicting high risk Oncotype DX recurrence score in the test set.
Variable | Low risk (n = 248) | High risk (n = 52) | p-value | |
---|---|---|---|---|
Age (yr) | 0.0076 | |||
≤ 50 | 167 (69.4%) | 45 (86.7%) | ||
> 50 | 81 (30.6%) | 7 (13.3%) | ||
Tumor type | 0.0405 | |||
IDCa | 215 (86.9%) | 49 (93.3%) | ||
Others | 33 (13.1%) | 3 (6.7%) | ||
Tumor size | 0.1994 | |||
≤ 2 | 158 (63.7%) | 38 (73.3%) | ||
> 2 | 90 (36.3%) | 14 (26.7%) | ||
LN status | 0.2740 | |||
pN0 | 177 (68.0%) | 41 (75.6%) | ||
pN1 | 71 (32.0%) | 11 (24.4%) | ||
Nuclear grade | 0.0074 | |||
1 or 2 | 239 (96.4%) | 45 (86.5%) | ||
3 | 9 (3.6%) | 7 (13.5%) | ||
Histologic grade | 0.0017 | |||
1 | 74 (29.8%) | 2 (3.9%) | ||
2 | 165 (66.5%) | 45 (86.5%) | ||
3 | 9 (3.7%) | 5 (9.6%) | ||
Necrosis | 0.0130 | |||
Absent | 225 (91.7%) | 44 (84.4%) | ||
Present | 23 (8.3%) | 8 (15.6%) | ||
ER Allred score | 0.9381 | |||
< 8 | 23 (9.3%) | 5 (9.6%) | ||
8 | 225 (90.7%) | 47 (90.4%) | ||
PR status | 0.0001 | |||
Positive | 244 (99.0%) | 44 (86.7%) | ||
Negative | 4 (1.0%) | 8 (13.3%) |
IDCa = invasive ductal carcinoma; LN = lymph node; ER = estrogen receptor; PR = progesterone receptor.
Table 3. Comparison of Ki-67 labeling index in the low risk and high risk groups in the test set and univariate analysis results for predicting high risk Oncotype DX recurrence score in the test set.
Variable | Low risk (n = 248) | High risk (n = 52) | p-value | |
---|---|---|---|---|
IKWG scoring method (automated Roche) | 13.34 (0.18–60.42) | 22.55 (0.09–63.85) | < 0.0001 | |
≤ 20 | 207 (85.4%) | 25 (48.9%) | < 0.0001 | |
> 20 | 41 (14.6%) | 27 (51.1%) | ||
IKWG scoring method (automated Roche) | ||||
≤ 14 | 155 (66.5%) | 21 (46.7%) | 0.0039 | |
> 14 | 93 (33.5%) | 31 (53.3%) | ||
Average QuPath | 7.94 (0.11–35.96) | 13.40 (0.05–38.01) | < 0.0001 | |
≤ 10 | 186 (79.1%) | 25 (48.9%) | 0.0002 | |
> 10 | 62 (20.9%) | 27 (51.1%) | ||
Hotspot QuPath | 17.35 (2.96–49.15) | 13.85 (2.38–41.1) | 0.195 | |
≤ 20 | 175 (70.4%) | 45 (84.4%) | 0.090 | |
> 20 | 73 (29.6%) | 7 (15.6%) | ||
Hottest spot QuPath | 20.06 (4.13–56.74) | 17.62 (3.81–45.13) | 0.547 | |
≤ 20 | 155 (60.7%) | 35 (62.2%) | 0.786 | |
> 20 | 93 (39.3%) | 17 (37.8%) |
Continuous variable presented as mean (range) and categorical variable presented as number (%).
IKWG = International Ki-67 in Breast Cancer Working Group.
All significant variables from the univariate analysis were included in the multivariable logistic regression analysis. Age ≤ 50 years, invasive ductal carcinoma tumor type, histologic grade 2 or 3, tumor necrosis, PR negativity, and high Roche-analyzed Ki-67 score according to the IKWG recommendations (> 20) were statistically significant variables associated with high ODX RS by multivariable analysis. The odds ratio and β coefficient associated with each significant factor in the predictive model are shown in Table 4.
Table 4. The final simplified high risk Oncotype DX recurrence score predictive scoring system.
Variable | Multivariable model | Score* (if applicable) | ||||
---|---|---|---|---|---|---|
OR | 95% CI | p-value | β-coefficient | |||
Age ≤ 50 | 6.138 | 2.019–18.660 | 0.0014 | 1.8146 | 1 | |
IDCa tumor type | 3.317 | 0.738–14.914 | 0.0118 | 1.1991 | 1 | |
Histologic grade 2 or 3 | 7.102 | 1.611–31.309 | 0.0096 | 1.9604 | 1 | |
Necrosis | 1.684 | 0.615–4.632 | 0.0309 | 0.5232 | 1 | |
PR negativity | 42.56 | 7.392–245.049 | < 0.0001 | 3.7509 | 1 | |
Ki-67 LI > 20 | 5.053 | 2.433–10.492 | < 0.0001 | 1.6199 | 1 | |
Max score | 6 |
OR = odds ratio; CI = confidence interval; IDCa = invasive ductal carcinoma; PR = progesterone receptor; Ki-67 LI = Ki-67 labeling index.
*The item point is 1 if the covariate is valid, otherwise 0. The score is calculated by summing the points associated with each variable that is present.
Development of a simplified high risk ODX RS predictive scoring system and validation
A predictive model was used to develop a simplified predictive scoring system for high-risk ODX RS. Three scoring calculation methods were used for score calculation, differing only in how item points were assigned to each variable, with the final score being the sum of these points. In the first score-calculation method, the item points were the rounded estimates of the regression coefficients for the covariates in the predictive model. In the second score-calculation method, the item points were the rounded estimates of the regression coefficients divided by the smallest regression coefficients of all covariates in the predictive model. In the third score calculation method, the item points were assigned “1” if the covariates were valid, otherwise “0.” Thus, we obtained three candidate scoring systems, whose predictive capabilities were determined using ROC curve analysis and a calibration plot (Supplementary Figure 1). While all three showed similar area under the curve (AUC) values, the third method demonstrated the best correspondence between predicted and observed probabilities in the test set (AUC, 0.8057; 95% confidence interval [CI], 0.7479–0.8635) (Figure 2A).
Figure 2. Receiver operating characteristic curves of our simplified high-risk Oncotype DX recurrence score predictive scoring system in the (A) test set and (B) validation set.
The performance of these scoring systems was independently validated in a cohort of 117 patients from 2019 to 2020 (Supplementary Figure 2). The third method again showed the best predictive capability (AUC, 0.7154; 95% CI, 0.6104–0.8205) (Figure 2B). Consequently, this predictive model was selected as the final simplified high-risk ODX RS predictive scoring system, with total scores for individual patients ranging from 0 to 6 (Table 4).
Table 5 displays the sensitivity and specificity of each cutoff value. A score of “3” provided the highest sensitivity (92%). When applied to the validation group, the sensitivity was also 92%.
Table 5. Sensitivity and specificity according to each cutoff value in the test set and the validation set.
Score cut-off | Test set | Validation set | ||
---|---|---|---|---|
Sensitivity | Specificity | Sensitivity | Specificity | |
0 | 1.00 | 0.00 | 1.00 | 0.00 |
1 | 1.00 | 0.00 | 1.00 | 0.01 |
2 | 1.00 | 0.13 | 1.00 | 0.02 |
3* | 0.92 | 0.49 | 0.92 | 0.32 |
4 | 0.60 | 0.87 | 0.56 | 0.79 |
5 | 0.08 | 0.99 | 0.12 | 0.96 |
*A score of 3 or higher means that there is a 92% probability that it is a true high risk Oncotype DX recurrence score.
Based on our simplified scoring system with a cut-off value of 3, patients scoring 1 or 2 were classified as low-risk, and those with a score of ≥ 3 as high-risk. Comparing these categories to the actual ODX RS risk categories, in the test set, 69 cases (23.0%) were categorized as low-risk. Among these, four (5.8%) had high-risk ODX RS, indicating that 4 of 69 patients (5.8%) might miss the potential benefit from chemotherapy if the simplified scoring system was used for decision-making. In the validation set, 12 cases (10.3%) were categorized as low risk, with only one (8.3%) having a high-risk ODX RS. Overall, 5 of 81 patients (6.2%) were falsely classified as low risk if ODX was regarded as the gold standard.
DISCUSSION
In the present study, we identified several clinicopathological variables significantly associated with high-risk ODX RS, including age ≤ 50 years, absence of PR expression, histologic grade 2 or 3, Ki-67 LI > 20, invasive ductal carcinoma type, and tumor necrosis. Using these variables, we developed and validated a simplified scoring system to predict high-risk ODX RS. The scoring system demonstrated good performance in terms of both calibration and discrimination.
Given the high cost and time-consuming nature of the ODX test, some single-institution-based studies have proposed ODX RS predictive models using clinicopathological parameters available from routine pathology reports [3,4,6,7,8,10,20,21,22,23,24,25,26].
For instance, Klein et al. [10] developed the Magee equation, which includes the Nottingham score, ER H-score, PR H-score, tumor size, and Ki-67 percentage. However, this model used potentially unreliable Ki-67 scores from manual scoring and semiquantitative H-scores for ER and PR, which may not be part of routine pathological evaluation. Similarly, Eaton et al. [22] created a simplified risk score model that included ER, PR, tumor size, tumor nuclear grade, and tumor histological grade. The ER and PR scores in this model do not reflect the Allred score, which is widely used in clinical practice, but rather the staining percentage of ER and PR multiplied by the staining intensity. Furthermore, this model excluded Ki-67 score, which is a significant contributor to ODX RS, as a variable. Orucevic et al. [23] developed a nomogram including age, tumor size, tumor grade, PR status, and histological tumor type using multivariate logistic regression analysis in 84,339 patients, which is the largest cohort to date. Table 6 [10,22,23] shows a summary of selected published studies that developed the ODX RS predictive model using routine clinicopathological variables in Western countries and a comparison with our present study.
Table 6. Summary of the selected published studies that used clinicopathologic parameters to predict Oncotype DX recurrence score in Western countries and comparison to our current study.
Variable | Klein et al., 2013 [10] | Eaton et al., 2017 [22] | Orucevic et al., 2019 [23] | Current study | ||
---|---|---|---|---|---|---|
Patients | ||||||
Training group | 817 | 766 | 65,754 | 300 | ||
Validation group | 255 | 299 | 18,585 | 117 | ||
Clinicopathological parameters included in predictive model | Tumor size, histologic grade (Nottingham score), ER (H-score), PR (H-score), HER2 (negative or positive), Ki-67 (%) | Tumor size, histologic grade (1–2 or 3), nuclear grade (1 or 2–3), ER (< 80% or ≥ 80%), PR (< 80% or ≥ 80%) | Age, histologic type (IDCa, ILCa, or others), tumor size, histologic grade (1, 2, or 3), PR (negative or positive) | Age (≤ 50 or > 50), histologic type (IDCa or others), histologic grade (1 or 2–3), necrosis, PR (negative or positive), Ki-67 (≤ 20% or > 20%) | ||
ODX RS cutoff values | < 18 (LR), 18–30 (IR), > 30 (HR) | < 18 (LR), 18–30 (IR), > 30 (HR) | ≤ 25 (LR), > 25 (HR) | Age ≤ 50:≤ 20, Age > 50:≤ 25 (LR) | ||
Age ≤ 50:> 20, Age > 50:> 25 (HR) | ||||||
(regardless of age) | (regardless of age) | (regardless of age) | ||||
Ki-67 scoring method | No detailed explanation, Ki-67 LI (0–100) | Not involved | Not involved | Systematic evaluation of Ki-67 scoring methods: | ||
1) Average vs. hotspot vs. hottest spot using QuPath | ||||||
2) Roche-analyzed Ki-67 scores according to IKWG | ||||||
Type of prediction model | Automatic calculator based on equation | Simplified risk score: Summing the points associated with each risk factor that is present | Nomogram | Simplified score: At least three of the six items are applicable from routine pathology report | ||
Computational process required | No computational process required | Computational process required | No computational process required |
IDCa = invasive ductal carcinoma; ER = estrogen receptor; ILCa = invasive lobular carcinoma; PR = progesterone receptor; HER2 = human epidermal growth factor receptor 2; ODX RS = Oncotype DX recurrence score;.LR = low risk; IR = intermediate risk; HR = high risk; Ki-67 LI = Ki-67 labeling index; IKWG = International Ki-67 in Breast Cancer Working Group.
A study by Kim et al. [27] verified the Tennessee nomogram in 218 patients at a single institution in Korea and demonstrated that the C-index of the nomogram was much lower than that reported at the University of Tennessee Medical Center (0.642 vs. 0.890). Kim et al. [27] explained the cause of this discrepancy as a difference in the mean age and ethnic differences between patients with breast cancer in Western and Asian countries. Some studies have also reported that the differences between Western and Asian ODX RS predictive models are due to tumor biology and genetic differences between Western and Asian breast cancer patients [28,29]. Thus, even the ODX RS predictive model created using the largest cohort cannot be generalized to Asian patients.
To the best of our knowledge, there have been a total of three single-institution-based ODX RS predictive models developed in Korea before our ODX RS predictive scoring system. Table 7 [3,4,5] presents a summary of published studies that created an ODX RS predictive model using routine clinicopathological variables in Korea and a comparison with our present study.
Table 7. Summary of the published studies that used clinicopathologic parameters to predict Oncotype DX recurrence score in Korea and comparison to our current study.
Variable | Lee et al., 2019 [3] | Yoo et al., 2020 [4] | Kim et al., 2023 [5] | Current study | ||||
---|---|---|---|---|---|---|---|---|
Patients | ||||||||
Training group | 340 | 191 | 175 | 300 | ||||
Validation group | 145 | 264 | 122 | 117 | ||||
Clinicopathological parameters included in predictive model | Nuclear grade (1, 2 or 3), ER (Allred score), PR (Allred score), LVI (present or absent), Ki-67 (%) | Nuclear grade (1–2 or 3), PR (negative or positive), Ki-67 (%) | Nuclear grade (1–2 or 3), PR (negative or positive), Ki-67 (%) | Age (≤ 50 or > 50), histologic type (IDCa or others), histologic grade (1 or 2–3), necrosis, PR (negative or positive), Ki-67 (≤ 20% or > 20%) | ||||
ODX RS cutoff values | ≤ 25 (LR), > 25 (HR) | ≤ 25 (LR), > 25 (HR) | ≤ 25 (LR), > 25 (HR) | Age ≤ 50:≤ 20, Age > 50:≤ 25 (LR) | ||||
Age ≤ 50: > 20, Age > 50:> 25 (HR) | ||||||||
(regardless of age) | (regardless of age) | (regardless of age) | ||||||
Ki-67 scoring method | No detailed explanation, Ki-67 LI (0–100) | No detailed explanation, Ki-67 LI (0–100): | Ki-67 LI (0–100): | Systematic evaluation of Ki-67 scoring methods: | ||||
Percentage of positive cells by IHC | IKWG scoring method using an automated image analysis program used for pathology reports at their institution | 1) Average vs hotspot vs hottest spot using QuPath | ||||||
2) Roche-analyzed Ki-67 scores according to IKWG recommendations | ||||||||
Type of prediction model | Nomogram | Nomogram | Automatic calculator based on equation | Simplified score: At least three of the six items are applicable from routine pathology report | ||||
Computational process required | Computational process required | Computational process required | No computational process required |
PR = progesterone receptor; ER = estrogen receptor; IDCa = invasive ductal carcinoma; LVI = lymphovascular invasion; ODX RS = Oncotype DX recurrence score; LR = low risk; HR = high risk; Ki-67 LI = Ki-67 labeling index; IHC = immunohistochemistry; IKWG = International Ki-67 in Breast Cancer Working Group.
Lee et al. [3] developed a nomogram that included ER Allred score, PR Allred score, nuclear grade, lymphovascular invasion, and Ki-67 LI to predict a low-risk ODX RS subgroup. Yoo et al. [4] reported that high nuclear grade, absence of PR expression, and high Ki-67 LI were associated with a high-risk ODX RS group and created a nomogram based on these variables to predict high-risk ODX RSs. In another Korean study, Kim et al. [5] proposed an automatic calculator called the CPP model, which included three parameters: nuclear grade, PR status, and Ki-67 LI, to predict high-risk ODX RS subgroups. A comparison of published models for predicting the ODX RS with our current scoring system could be the best measure to assess the additional value offered by our scoring system.
First, our scoring system is the first developed predictive model for high risk ODX RS, reflecting the criteria of low risk and high risk according to ages > 50 or ≤ 50 years based on the results of the TAILORx trial [17]. All three predictive models in Korea were created by comparing patients with high-risk ODX RSs (> 25) and those with low-risk ODX RSs (≤ 25) regardless of patients’ ages. Unlike the other three predictive models, our scoring system included age as a parameter, which is similar to the current ODX cutoff categories. Because our scoring system reflects ODX cutoff categories that are relevant for the use of chemotherapy in ER+/HER2− early breast cancer patients based on the most recently reported TAILORx trial results [17], our scoring system is the most up-to-date with clinical practice recommendations compared to other published ODX RS predictive models in Korea.
Second, to compensate for the issue of consensus on scoring for Ki-67 LI, which has been demonstrated as a main contributor to ODX RS in several studies [2,10,11], we performed a systematic evaluation of Ki-67 LI scoring methods using not only Roche-analyzed Ki-67 scores according to IKWG guidelines, which are used in clinical settings at our institution, but also average, hotspot, and hottest spot Ki-67 scores analyzed by the open-source image analysis software QuPath, which is known to have the highest intra-platform reproducibility [13]. We also investigated the correlation between Roche-analyzed Ki-67 scores according to IKWG guidelines and average, hotspot, and hottest spot Ki-67 scores analyzed by QuPath, and created our scoring system by including all these Ki-67 scoring methods as variables for our logistic regression analysis. Lee et al. [3] did not provide a detailed explanation of their Ki-67 measurement method. Yoo et al. [4] did not report their Ki-67 measurement method in detail. They interpreted Ki-67 staining as the percentage of cells positive by IHC (0%–100%). Kim et al. [5] measured the Ki-67 LI using an image analysis program used by their institution and followed the recommendations of the IKWG. However, they did not perform a systematic validation of the Ki-67 scoring methods as in the present study. Notably, through a logistic regression analysis including all Ki-67 scores as individual variables, Roche-analyzed Ki-67 scores according to the IKWG guidelines were included as one of the parameters in our simplified scoring system to predict high-risk ODX RS. These findings suggest that the IKWG scoring method for Ki-67 might be the most reproducible Ki-67 scoring method, in line with a study by the IKWG in 2020 [12]. Our current scoring system has a differentiating strength from other predictive models in Korea in that Ki-67 LI selected through this systematic evaluation process was included as a parameter in the final scoring system.
Third, our scoring system is much easier and simpler to use in clinical practice than the other three predictive models. Lee et al. [3] and Yoo et al. [4] developed nomograms and Kim et al. [5] created an automated calculator to predict ODX RS. These tools necessitate specific computational processes for determining risk scores of individual patients. In contrast, our simplified high-risk ODX RS predictive scoring system, offers a straightforward approach for clinicians. They simply need to review the routine pathology report and check whether three or more of the six items in our scoring system are applicable to each patient.
One limitation of this study was its retrospective nature of the study design. This implies that our scoring system might have some degree of selection bias. The results could have been influenced by case selection bias, as not all patients diagnosed with ER+, HER2−, T1-3N0-1M0 and breast cancer underwent ODX testing for financial reasons. Another issue was that the RS risk cutoff criterion for patients < 50 years was 20 in our study. Additionally, patients with pN1 status accounted for approximately 20% of the test and validation sets. For patients aged < 50 years, RS between 16 and 25 years remains a challenge between adjuvant chemotherapy and endocrine therapy. As there are no established standard adjuvant treatments for patients aged < 50 years with RS between 16 and 25, the approach to counseling these patients depends on the clinical status, patient or physician preference, and financial circumstances. Based on findings indicating a notable disparity in invasive disease-free survival and freedom from recurrence between patients undergoing endocrine therapy and those receiving chemoendocrine therapy, specifically within the RS score of 21–25 among patients aged ≤ 50 years in the TAILORx trial [17], we took into account the clinical protocols of our institution. Notably, chemotherapy is typically not recommended for patients < 50 years with RS scores of ≤ 20 at our facility, where the ODX test is commonly recommended for T1-3N0-1M0 breast cancer patients. Consequently, we established the RS risk cutoff criteria for patients < 50 years as 20 and included patients with pN1 status in the present study.
In the present study, we created a simple high-risk ODX RS predictive scoring system based on the systematic evaluation of Ki-67 scoring methods using the highest intra-platform reproducible image analysis software, QuPath, and an image analysis platform used in clinical practice. We believe that our simplified scoring system, which predicts high-risk ODX RS, could help clinicians select patients who may or may not require additional ODX testing. Further, larger prospective studies are warranted to clarify the utility of our scoring system in patients with ER +/HER2− early breast cancer.
Footnotes
Funding: The authors declare that no funds, grants, or other support was received during the preparation of this manuscript.
Data Availability: The data supporting the findings of this study are available from the corresponding author upon reasonable request.
Conflict of Interest: The authors declare that they have no competing interests.
- Conceptualization: Cho EY.
- Data curation: Kim JM.
- Formal analysis: Kim JM.
- Investigation: Kim JM.
- Methodology: Kim JM.
- Resources: Kim JM.
- Software: Kim JM.
- Validation: Kim JM.
- Visualization: Kim JM.
- Writing - original draft: Kim JM.
- Writing - review & editing: Kim JM.
SUPPLEMENTARY MATERIALS
Receiver operating characteristic curves and calibration plots of the three candidate scoring systems of the simplified high-risk Oncotype DX recurrence score predictive scoring system for the test set. (A) Multivariable model using the first score calculation method. (B) Multivariable model using the second score calculation method. (C) Multivariable model using the third score calculation method.
Receiver operating characteristic curves of the three candidate scoring systems of the simplified high-risk Oncotype DX recurrence score predictive scoring system in the validation set. (A) Multivariable model using the first score calculation method. (B) Multivariable model using the second score calculation method. (C) Multivariable model using the third score calculation method.
References
- 1.Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, et al. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol. 2006;24:3726–3734. doi: 10.1200/JCO.2005.04.7985. [DOI] [PubMed] [Google Scholar]
- 2.Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351:2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
- 3.Lee SB, Kim J, Sohn G, Kim J, Chung IY, Kim HJ, et al. A nomogram for predicting the Oncotype DX Recurrence score in women with T1-3N0-1miM0 hormone receptor–positive, human epidermal growth factor 2 (HER2)–negative breast cancer. Cancer Res Treat. 2019;51:1073–1085. doi: 10.4143/crt.2018.357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yoo SH, Kim TY, Kim M, Lee KH, Lee E, Lee HB, et al. Development of a nomogram to predict the recurrence score of 21-gene prediction assay in hormone receptor-positive early breast cancer. Clin Breast Cancer. 2020;20:98–107.e1. doi: 10.1016/j.clbc.2019.07.010. [DOI] [PubMed] [Google Scholar]
- 5.Kim MC, Kwon SY, Choi JE, Kang SH, Bae YK. Prediction of Oncotype DX recurrence score using clinicopathological variables in estrogen receptor-positive/human epidermal growth factor receptor 2-negative breast cancer. J Breast Cancer. 2023;26:105–116. doi: 10.4048/jbc.2023.26.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Geradts J, Bean SM, Bentley RC, Barry WT. The Oncotype DX recurrence score is correlated with a composite index including routinely reported pathobiologic features. Cancer Invest. 2010;28:969–977. doi: 10.3109/07357907.2010.512600. [DOI] [PubMed] [Google Scholar]
- 7.Allison KH, Kandalaft PL, Sitlani CM, Dintzis SM, Gown AM. Routine pathologic parameters can predict Oncotype DX recurrence scores in subsets of ER positive patients: who does not always need testing? Breast Cancer Res Treat. 2012;131:413–424. doi: 10.1007/s10549-011-1416-3. [DOI] [PubMed] [Google Scholar]
- 8.Orucevic A, Bell JL, McNabb AP, Heidel RE. Oncotype DX breast cancer recurrence score can be predicted with a novel nomogram using clinicopathologic data. Breast Cancer Res Treat. 2017;163:51–61. doi: 10.1007/s10549-017-4170-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Baxter E, Gondara L, Lohrisch C, Chia S, Gelmon K, Hayes M, et al. Using proliferative markers and Oncotype DX in therapeutic decision-making for breast cancer: the B.C. experience. Curr Oncol. 2015;22:192–198. doi: 10.3747/co.22.2284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Klein ME, Dabbs DJ, Shuai Y, Brufsky AM, Jankowitz R, Puhalla SL, et al. Prediction of the Oncotype DX recurrence score: use of pathology-generated equations derived by linear regression analysis. Mod Pathol. 2013;26:658–664. doi: 10.1038/modpathol.2013.36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sahebjam S, Aloyz R, Pilavdzic D, Brisson ML, Ferrario C, Bouganim N, et al. Ki 67 is a major, but not the sole determinant of Oncotype DX recurrence score. Br J Cancer. 2011;105:1342–1345. doi: 10.1038/bjc.2011.402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Nielsen TO, Leung SC, Rimm DL, Dodson A, Acs B, Badve S, et al. Assessment of Ki67 in breast cancer: updated recommendations from the International Ki67 in Breast Cancer Working Group. J Natl Cancer Inst. 2021;113:808–819. doi: 10.1093/jnci/djaa201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Acs B, Pelekanou V, Bai Y, Martinez-Morilla S, Toki M, Leung SC, et al. Ki67 reproducibility using digital image analysis: an inter-platform and inter-operator study. Lab Invest. 2019;99:107–117. doi: 10.1038/s41374-018-0123-7. [DOI] [PubMed] [Google Scholar]
- 14.Chen J, Li Z, Han Z, Kang D, Ma J, Yi Y, et al. Prognostic value of tumor necrosis based on the evaluation of frequency in invasive breast cancer. BMC Cancer. 2023;23:530. doi: 10.1186/s12885-023-10943-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Allison KH, Hammond ME, Dowsett M, McKernin SE, Carey LA, Fitzgibbons PL, et al. Estrogen and progesterone receptor testing in breast cancer: ASCO/CAP guideline update. J Clin Oncol. 2020;38:1346–1366. doi: 10.1200/JCO.19.02309. [DOI] [PubMed] [Google Scholar]
- 16.Wolff AC, Hammond ME, Allison KH, Harvey BE, Mangu PB, Bartlett JM, et al. Human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline focused update. J Clin Oncol. 2018;36:2105–2122. doi: 10.1200/JCO.2018.77.8738. [DOI] [PubMed] [Google Scholar]
- 17.Sparano JA, Gray RJ, Makower DF, Pritchard KI, Albain KS, Hayes DF, et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N Engl J Med. 2018;379:111–121. doi: 10.1056/NEJMoa1804710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Thakur SS, Li H, Chan AM, Tudor R, Bigras G, Morris D, et al. The use of automated Ki67 analysis to predict Oncotype DX risk-of-recurrence categories in early-stage breast cancer. PLoS One. 2018;13:e0188983. doi: 10.1371/journal.pone.0188983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Paik S, Kwon Y, Lee MH, Kim JY, Lee DK, Cho WJ, et al. Systematic evaluation of scoring methods for Ki67 as a surrogate for 21-gene recurrence score. NPJ Breast Cancer. 2021;7:13. doi: 10.1038/s41523-021-00221-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Flanagan MB, Dabbs DJ, Brufsky AM, Beriwal S, Bhargava R. Histopathologic variables predict Oncotype DX recurrence score. Mod Pathol. 2008;21:1255–1261. doi: 10.1038/modpathol.2008.54. [DOI] [PubMed] [Google Scholar]
- 21.Gage MM, Rosman M, Mylander WC, Giblin E, Kim HS, Cope L, et al. A validated model for identifying patients unlikely to benefit from the 21-gene recurrence score assay. Clin Breast Cancer. 2015;15:467–472. doi: 10.1016/j.clbc.2015.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Eaton AA, Pesce CE, Murphy JO, Stempel MM, Patil SM, Brogi E, et al. Estimating the OncotypeDX score: validation of an inexpensive estimation tool. Breast Cancer Res Treat. 2017;161:435–441. doi: 10.1007/s10549-016-4069-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Orucevic A, Bell JL, King M, McNabb AP, Heidel RE. Nomogram update based on TAILORx clinical trial results - Oncotype DX breast cancer recurrence score can be predicted using clinicopathologic data. Breast. 2019;46:116–125. doi: 10.1016/j.breast.2019.05.006. [DOI] [PubMed] [Google Scholar]
- 24.Thibodeau S, Voutsadakis IA. Prediction of Oncotype DX recurrence score using clinical parameters: a comparison of available tools and a simple predictor based on grade and progesterone receptor. Hematol Oncol Stem Cell Ther. 2019;12:89–96. doi: 10.1016/j.hemonc.2019.02.001. [DOI] [PubMed] [Google Scholar]
- 25.Batra A, Nixon NA, Roldan-Urgoiti G, Hannouf MB, Abedin T, Hugh J, et al. Developing a clinical-pathologic model to predict genomic risk of recurrence in patients with hormone receptor positive, human epidermal growth factor receptor-2 negative, node negative breast cancer. Cancer Treat Res Commun. 2021;28:100401. doi: 10.1016/j.ctarc.2021.100401. [DOI] [PubMed] [Google Scholar]
- 26.Mattes MD, Mann JM, Ashamalla H, Tejwani A. Routine histopathologic characteristics can predict Oncotype DX™ recurrence score in subsets of breast cancer patients. Cancer Invest. 2013;31:604–606. doi: 10.3109/07357907.2013.849725. [DOI] [PubMed] [Google Scholar]
- 27.Kim JM, Ryu JM, Kim I, Choi HJ, Nam SJ, Kim SW, et al. Verification of a western nomogram for predicting Oncotype DX™ recurrence scores in Korean patients with breast cancer. J Breast Cancer. 2018;21:222–226. doi: 10.4048/jbc.2018.21.2.222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bhoo-Pathy N, Yip CH, Hartman M, Saxena N, Taib NA, Ho GF, et al. Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. Eur J Cancer. 2012;48:982–989. doi: 10.1016/j.ejca.2012.01.034. [DOI] [PubMed] [Google Scholar]
- 29.Agarwal G, Pradeep PV, Aggarwal V, Yip CH, Cheung PS. Spectrum of breast cancer in Asian women. World J Surg. 2007;31:1031–1040. doi: 10.1007/s00268-005-0585-9. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Receiver operating characteristic curves and calibration plots of the three candidate scoring systems of the simplified high-risk Oncotype DX recurrence score predictive scoring system for the test set. (A) Multivariable model using the first score calculation method. (B) Multivariable model using the second score calculation method. (C) Multivariable model using the third score calculation method.
Receiver operating characteristic curves of the three candidate scoring systems of the simplified high-risk Oncotype DX recurrence score predictive scoring system in the validation set. (A) Multivariable model using the first score calculation method. (B) Multivariable model using the second score calculation method. (C) Multivariable model using the third score calculation method.