Abstract
Assessing hormone receptor status is an essential part of the breast cancer diagnosis, as this biomarker greatly predicts response to hormonal treatment strategies. As such, hormone receptor testing laboratories are strongly encouraged to participate in external quality control schemes to achieve optimization of their immunohistochemical assays. Nine Dutch pathology departments provided tissue blocks containing invasive breast cancers which were all previously tested for estrogen receptor and/or progesterone receptor expression during routine practice. From these tissue blocks, tissue microarrays were constructed and tested for hormone receptor expression. When a discordant result was found between the local and TMA result, the original testing slide was revised and staining was repeated on a whole-tissue block. Sensitivity and specificity of individual laboratories for testing estrogen receptor expression were high, with an overall sensitivity of 99.7 and 95.4 %, respectively. Overall sensitivity and specificity of progesterone receptor testing were 94.8 and 92.6 %, respectively. Out of 96 discordant cases, 36 cases would have been concordant if the recommended cut-off value of 1 % instead of 10 % was followed. Overall sensitivity and specificity of estrogen and progesterone receptor testing were high among participating laboratories. Continued enrollment of laboratories into quality control schemes is essential for achieving and maintaining the highest standard of care for breast cancer patients.
Electronic supplementary material
The online version of this article (doi:10.1007/s10549-015-3444-x) contains supplementary material, which is available to authorized users.
Keywords: Estrogen receptor, Progesterone receptor, Quality control, Tissue microarray, Breast cancer
Introduction
Testing estrogen receptor (ER) expression is mandatory for all breast carcinomas as this biomarker predicts response to estrogen-modulating therapy [1]. Adequate testing of ER expression via immunohistochemistry is considered the gold standard for selecting patients for neoadjuvant and adjuvant hormonal therapies [2]. The progesterone receptor (PR) has been assessed as a prognostic factor [3] and as a potential predictive marker [4, 5]. Initial studies on the quality of hormone receptor (HR) testing have shown cause for concern with a low percentage of laboratories showing acceptable performance [6]. An American Society of Clinical Oncology (ASCO) and College of American Pathologists (CAP) panel addressed the need for improving ER and PR testing and published a set of guidelines concerning this matter [7]. Recommendations were also made to lower the positivity threshold from 10 to 1 %. Unfortunately, a significant (although decreasing) number of laboratories still fail to achieve sufficient testing quality in the NordiQC and/or NEQAS ER and PR assessment runs.
This current study was designed to evaluate a tissue microarray (TMA)-based method for assessing ER and PR testing quality. This method allows pathology laboratories to evaluate the reproducibility of IHC testing results by retesting a high number of ER and PR assays on TMAs. By comparing the original result to the retested assay on TMAs, discordances between local report and retested tumors can be easily assessed at large scale. Additionally, the effect of the recommended threshold change of 10–1 % positive cells on testing reproducibility was investigated.
Methods
Tissues
Formalin-fixed paraffin-embedded (FFPE) tumor blocks were collected for TMA construction from nine laboratories in the Netherlands: the Academic Medical Center (AMC, Amsterdam), Netherlands Cancer Institute/Antoni van Leeuwenhoek (NKI/AVL, Amsterdam), Diakonessenhuis (Utrecht), Isala (Zwolle), Leiden University Medical Center (LUMC, Leiden), University Medical Center Groningen (UMCG, Groningen), Eramus Medical Center (EMC, Rotterdam), Radboud University Medical Center (Radboud UMC Nijmegen), and Laboratory Pathology Eastern Netherlands (LabPON) (Table S1). The tissue blocks contained invasive breast carcinomas that were previously tested for ER, PR, and/or HER2 expression by immunohistochemistry as part of routine pathological diagnostics. HER2 testing quality for a subset of the included tumors was investigated in a previous publication [8]. According to Dutch law, these tissue blocks can be freely used for research purposes after anonymization, provided that these are handled according to national ethical guidelines (‘Code for Proper Secondary Use of Human Tissue’, Dutch Federation of Medical Scientific Societies). TMA sections were stained with SP1 (for ER) and 1E2 (for PR) antibodies using the Benchmark XT autostainer (Ventana Medical Systems, Tucson, AZ, United States).
Comparison of ER and PR test results
The TMA cores were scored by determining the percentage of nuclear staining and invasive tumor cells (staining intensity was not accounted) in increments of 10 %. ER and PR results from the original tests were retrieved from the local pathology reports. These ER and PR scores were compared to the results that were obtained from the TMA cores. For discordant cases, whole-tissue sections were sectioned and stained for ER and PR. This was done to rule out that discordant results were due to sampling errors introduced by the use of TMAs. If the results between the local pathology reports were concordant with the whole slide, the final result was considered concordant. If the result was still discordant with the original pathology report, this tumor was considered as truly discordant and the reason for the discordancy was then investigated. For this purpose, the original slides used for the local ER and PR diagnosis were centrally reviewed. If the revision of the original testing slide by the central revision panel revealed discordance with the local observer, the reason of the discordant result was considered to be observer inaccuracy. If the original testing slide showed positive nuclear staining in revision, but this positive IHC result could not be reproduced on both TMA and subsequent whole-sized slides despite appropriate positive controls, the reason for the discordant result was a false-positive IHC procedure. In case of the opposite result (negative local IHC result with ER-positive results on TMA and whole-sized slides), the reason of discordance was considered to be inaccurate IHC leading to false-negative results. The workflow of the study is summarized in Fig. S1.
Adjustment from 10 to 1 % threshold for HR positivity
Since all these materials were originally tested prior to the recommended threshold of 1 % for HR positivity, we then investigated the influence of the change of this threshold from 10 to 1 % positive cells as is recommended by the ASCO/CAP guidelines. For all discordant cases, we investigated whether this discordancy would still exist after changing this scoring methodology.
Results
ER concordance
A number of 1736 invasive breast carcinomas that were tested for ER in nine different pathology laboratories were included in this study. Of these, 163 tumors were omitted from the analysis when the original ER result could not be retrieved, when TMA cores were lost during the staining procedure, or due to the absence of invasive breast cancer on the TMA cores. A further four tumors were excluded because material was not available for subsequent retesting after an initial discordant result was found between the TMA and the original testing result. The subsequent analysis was performed on the remaining cohort of 1569 breast tumors (Fig. 1). When comparing the local testing result with the TMA result, 52 tumors were considered to be discordant. For these tumors, the whole-sized sections were stained for ER in order to assess the reason for discordance. If the whole-slide result was concordant with the original ER testing result, the discordance was decided to be due to sampling error due to use of a TMA and the final results were thus concordant (N = 36). If the discordance remained, this was considered a true discordant result (N = 16). Of the 16 discordant cases, 12 were false positive and 4 were false negative (Fig. 1; Table 1). Overall concordance was 99.0 %, and the sensitivity and specificity for all ER tests performed by the combined nine centers showed a sensitivity of 99.7 % (range 98.7–100.0 %) and specificity of 95.4 % (range 83.3–100.0 %). Positive predictive value (PPV) and negative predictive value (NPV) for all centers combined were 99.1 % (range 97.4–100.0 %) and 98.4 % (range 90.9–100 %), respectively.
Table 1.
N | Local ER testing result | ER result after revision of original slide | TMA and whole-slide ER result | Conclusion | Reason for discordance |
---|---|---|---|---|---|
1 | Negative | Negative | Positive | False negative | IHC error |
3 | Negative | Positive | Positive | False negative | Observer error |
2 | Positive | Positive | Negative | False positive | IHC error |
9 | Positive | Negative | Negative | False positive | Observer error |
1 | Positive | Unknown | Negative | False positive | Unknown |
The next step was to investigate whether the discordant results were due to observer inaccuracy or inaccurate IHC procedures. To assess the possibility of observer error, the original slides were revised when available (N = 15). In 12 tumors, discordance between the local observer and the revision panel was present, which can be considered to be observer inaccuracy. Three discordant cases were due to inaccurate IHC procedures. Two showed ER-positive staining in the local testing center (which was also verified with slide revision), while no positive test result was obtained if the staining was repeated (example shown in Fig. 2). The opposite was true for the third discordant case. The reason for the discordant result could not be ascertained for the sole remaining tumor, since the unavailability of the original slide leaves it impossible to determine whether the discordance was due to inaccurate scoring or IHC procedure (Table 1).
PR concordance
A number of 1518 PR-tested cases were provided by 8 laboratories that performed PR testing. A number of 171 cases were excluded from the final analysis. This left a number of 1347 PR-tested tumors available for the comparison with the TMA results (Fig. S2). A total number of 150 tumors were discordant between the original PR testing result and the TMA, and for all these cases, the PR test was performed centrally on a whole-tissue block. True discordant results were seen in 80 cases, which led to an overall concordance of 94.1 %. Of these 80 discordant cases, 32 tumors were deemed false positive and 48 tumors were considered false negative (Table S2; Fig. S2). Overall sensitivity and specificity for PR testing were slightly lower than for ER testing, with overall sensitivity of 94.8 % and overall specificity of 92.6 %. Sensitivity and specificity values of individual laboratories ranged from 87.1 to 97.8 % and 85.7–97.0 %, respectively. PPV and NPV overall were 96.4 % (range 92.6–98.7 %) and 89.3 % (range 80.0–96.6 %), respectively. With the aid of the revision of the local PR test (available for 59 of the 80 tumors) and the whole-tissue retesting, the reason for discordant results was investigated. Observer inaccuracy was detected in 20 cases, and the IHC test was irreproducible in 39 cases (Table S2).
Consequence of threshold adjustment
All discordant cases were again reviewed to determine whether adjusting the original or retested ER or PR result, based on the 2010 ASCO/CAP guidelines, would influence the discordant result. For some cases, this required the availability of data regarding the number of HR-positive cells (if any) observed during the original, local HR testing. This is important in the case of a tumor that was determined to be negative at local testing according to the 10 % cut-off, since such tumors might either be completely negative or have some positive staining but less than 10 % overall. For some cases, this information was unavailable in the pathology report (N = 8). Regardless, out of 96 initially discordant results, applying the recommended 1 % cut-off leads to a concordant result for 36 tumors (further described in Table 2).
Table 2.
N | Percentage of HR-positive cells in local result | Threshold at 1 % | Threshold at 10 % (reported in pathology report) | Percentage of HR-positive cells at retesting | Threshold at 1 % | Threshold at 10 % | Discordant at 10 % threshold? | Discordant at 1 % threshold? |
---|---|---|---|---|---|---|---|---|
12 | <10 % but ≥ 1 % | Positive | Negative | ≥10 % | Positive | Positive | Yes | No |
24 | ≥10 % | Positive | Positive | <10 % but ≥ 1 % | Positive | Negative | Yes | No |
8 | Not reported | Unknown | Negative | ≥10 % | Positive | Positive | Yes | Unknown |
29 | ≥10 % | Positive | Positive | 0 % | Negative | Negative | Yes | Yes |
23 | 0 % | Negative | Negative | ≥10 % | Positive | Positive | Yes | Yes |
Discussion
Our study assessed the reproducibility of immunohistochemical ER and PR testing performed in nine testing laboratories in the Netherlands. For this purpose, TMAs were used to facilitate retesting relatively high numbers of previously tested tumors and thus provide an accurate assessment of the reproducibility of these IHC tests. We compared the original ER and PR results from the pathology archives with the result that was detected on TMA. For discordant results, whole-tissue sections were tested to rule out the possibility of sampling error. If a tumor tested negative at a local center, but showed positive HR expression on both TMA and whole-slide examination, this tumor is likely to indeed have HR expression. If a tumor shows positive HR expression at the local center, but both TMA and whole-sized stainings are unable to replicate this staining (despite appropriate internal and external controls), it is hard to say whether the first positive result was truly false positive. Careful examination of the slide with knowledge of expected staining patterns might however be helpful (Fig. 2). Unfortunately, no gold standard exists that could have been used to determine which assessment is correct which remains a weakness of this study design. Response to hormonal therapy should be the gold standard in these cases, but this is also dependent on other known and unknown variables, and information regarding hormonal response is not always available. Viale et al. showed that a group of tumors that were locally ER-positive while centrally ER-negative tended to follow the overall survival patterns of ER-negative tumors (namely early relapse with following plateau, whereas ER-positive tumors follow a slower rate of relapse) [9]. These observations speak in favor of centrally performed HR tests in general, but this cannot be applied to each individual. Other studies have used RT-PCR as an additional method for determining HR status in addition to local and central IHC, but these assays are neither free from reproducibility issues themselves nor have been shown to correlate more closely to response to hormonal therapy [10].
Fortunately, concordance between local and retested HR results was high for both ER (99.0 %) and PR (94.1 %) in this current study. Remarkably, irreproducible test results obtained for ER were only rarely due to errors in the IHC procedure, whereas the ratio of IHC procedure error to observer error was more balanced in the PR-tested group. This might be due to the quality of the antibodies, as traditionally more emphasis has been placed on ER testing quality.
A 2010 report by an ASCO/CAP panel has suggested lowering the threshold of positivity from 10 % HR-positive cells to 1 %. These guidelines were established along a similar methodology as an earlier report concerning HER2 testing which recommended increasing the positivity threshold to 30 % positive cells [11]. The ER/PR guideline adjustments were not designed to improve testing accuracy, but were based on the observation that even patients with low percentage HR cells (1–10 %) still respond to tamoxifen. This is despite the observation that most tumors with 1–10 % HR+ cells share more common biologic features with ER- tumors [12]. Regardless, this change might also have consequences for HR testing reproducibility in this rare [13] group of tumors, which was investigated in this study. We found that a substantial number of these cases that were discordant between local and TMA testing were concordant when following the 2010 ASCO/CAP guidelines, suggesting that adherence to the 2010 guidelines improves the reproducibility of HR testing results.
Central assessment of ER and PR status of tumors that were included into the Breast International Group (BIG) 1–98 trial showed that locally tested ER-negative tumors tend to show ER positivity in a relatively high number of cases (69.5 %) [9]. Discordance was even more pronounced for PR testing [9]. Retesting of HR-tested tumors, included in the Eastern Cooperative Oncology Group (ECOG) study E2197, showed a concordance of 90 and 84 %, respectively, between locally tested and centrally tested ER and PR results [10]. Central review of local HR testing performed in the Adjuvant Lapatinib and/or Trastuzumab Treatment Optimisation (ALTTO) showed that local ER-positives could not be reproduced for 4.3 % of cases. Even more worrisome was the poor reproducibility of 21.6 % of ER-negative results which displayed positive staining when retesting of the original result was performed [14]. All of these studies indicate (i) a relatively poor reproducibility of ER-negative test results, (ii) an average reproducibility of ER testing below 95 %, and (iii) an even lower reproducibility for PR testing. A 2014 report by Viale et al. published the concordance from the ER and PR testing performed locally for the first 800 participants of the MINDACT trial with central IHC retesting [15]. Concordance for ER and PR IHC tests was determined as 97.6 and 89.6 %, respectively. These last results and ours indicate an improving trend in ER and PR testing reproducibility. The relatively high reproducibility in our study might be explained by the routine use of autostainers among all participating laboratories. Also, the participating centers in this study were all accredited laboratories in the Netherlands, leaving the question whether these results apply to all individual centers.
Continuous improvement of local IHC methods and validation of these are of essential importance to provide and maintain optimal care for breast cancer patients. Participation in such quality control schemes should be considered as mandatory for every individual HR testing laboratory. The tissue microarray approach described in this study can provide important feedback regarding testing reproducibility.
Electronic supplementary material
Acknowledgments
We would like to acknowledge the NKI- AVL Core Facility Molecular Pathology & Biobanking (CFMPB) for supplying NKI-AVL Biobank material.
Conflict of interest
E. Schuuring has received funding from Hoffman-La Roche. M.J. van Vijver has an advisory role for Hoffman-La Roche and Genomic Health. The other authors declare no conflicts of interest.
Funding
This research has in part been funded by Hoffman-La Roche.
Contributor Information
T. J. A. Dekker, Email: T.J.A.Dekker@lumc.nl
M. J. van de Vijver, Email: m.j.vandevijver@amc.uva.nl
References
- 1.Early Breast Cancer Trialists’ Collaborative Group Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet. 2005;365:1687–1717. doi: 10.1016/S0140-6736(05)66544-0. [DOI] [PubMed] [Google Scholar]
- 2.Harvey JM, Clark GM, Osborne CK, Allred DC. Estrogen receptor status by immunohistochemistry is superior to the ligand-binding assay for predicting response to adjuvant endocrine therapy in breast cancer. J Clin Oncol. 1999;17:1474–1481. doi: 10.1200/JCO.1999.17.5.1474. [DOI] [PubMed] [Google Scholar]
- 3.Reiner A, Neumeister B, Spona J, Reiner G, Schemper M, Jakesz R. Immunocytochemical localization of estrogen and progesterone receptor and prognosis in human primary breast cancer. Cancer Res. 1990;50:7057–7061. [PubMed] [Google Scholar]
- 4.Bardou VJ, Arpino G, Elledge RM, Osborne CK, Clark GM. Progesterone receptor status significantly improves outcome prediction over estrogen receptor status alone for adjuvant endocrine therapy in two large breast cancer databases. J Clin Oncol. 2003;21:1973–1979. doi: 10.1200/JCO.2003.09.099. [DOI] [PubMed] [Google Scholar]
- 5.Davies C, Godwin J, Gray R, Clarke M, Cutter D, Darby S, McGale P, Pan HC, Taylor C, Wang YC, Dowsett M, Ingle J, Peto R. Relevance of breast cancer hormone receptors and other factors to the efficacy of adjuvant tamoxifen: patient-level meta-analysis of randomised trials. Lancet. 2011;378:771–784. doi: 10.1016/S0140-6736(11)60993-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rhodes A, Jasani B, Balaton AJ, Barnes DM, Anderson E, Bobrow LG, Miller KD. Study of interlaboratory reliability and reproducibility of estrogen and progesterone receptor assays in Europe. Documentation of poor reliability and identification of insufficient microwave antigen retrieval time as a major contributory element of unreliable assays. Am J Clin Pathol. 2001;115:44–58. doi: 10.1309/H905-HYC1-6UQQ-981P. [DOI] [PubMed] [Google Scholar]
- 7.Hammond ME, Hayes DF, Dowsett M, Allred DC, Hagerty KL, Badve S, Fitzgibbons PL, Francis G, Goldstein NS, Hayes M, Hicks DG, Lester S, Love R, Mangu PB, McShane L, Miller K, Osborne CK, Paik S, Perlmutter J, Rhodes A, Sasano H, Schwartz JN, Sweep FC, Taube S, Torlakovic EE, Valenstein P, Viale G, Visscher D, Wheeler T, Williams RB, Wittliff JL, Wolff AC. American Society of Clinical Oncology/College Of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J Clin Oncol. 2010;28:2784–2795. doi: 10.1200/JCO.2009.25.6529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dekker TJ, Borg ST, Hooijer GK, Meijer SL, Wesseling J, Boers JE, Schuuring E, Bart J, van GJ, Mesker WE, Kroep JR, Smit VT, van de Vijver MJ. Determining sensitivity and specificity of HER2 testing in breast cancer using a tissue micro-array approach. Breast Cancer Res. 2012;14:R93. doi: 10.1186/bcr3208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Viale G, Regan MM, Maiorano E, Mastropasqua MG, Dell’Orto P, Rasmussen BB, Raffoul J, Neven P, Orosz Z, Braye S, Ohlschlegel C, Thurlimann B, Gelber RD, Castiglione-Gertsch M, Price KN, Goldhirsch A, Gusterson BA, Coates AS. Prognostic and predictive value of centrally reviewed expression of estrogen and progesterone receptors in a randomized trial comparing letrozole and tamoxifen adjuvant therapy for postmenopausal early breast cancer: BIG 1-98. J Clin Oncol. 2007;25:3846–3852. doi: 10.1200/JCO.2007.11.9453. [DOI] [PubMed] [Google Scholar]
- 10.Badve SS, Baehner FL, Gray RP, Childs BH, Maddala T, Liu ML, Rowley SC, Shak S, Perez EA, Shulman LJ, Martino S, Davidson NE, Sledge GW, Goldstein LJ, Sparano JA. Estrogen- and progesterone-receptor status in ECOG 2197: comparison of immunohistochemistry by local and central laboratories and quantitative reverse transcription polymerase chain reaction by central laboratory. J Clin Oncol. 2008;26:2473–2481. doi: 10.1200/JCO.2007.13.6424. [DOI] [PubMed] [Google Scholar]
- 11.Wolff AC, Hammond ME, Hicks DG, Dowsett M, McShane LM, Allison KH, Allred DC, Bartlett JM, Bilous M, Fitzgibbons P, Hanna W, Jenkins RB, Mangu PB, Paik S, Perez EA, Press MF, Spears PA, Vance GH, Viale G, Hayes DF. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J Clin Oncol. 2013;31:3997–4013. doi: 10.1200/JCO.2013.50.9984. [DOI] [PubMed] [Google Scholar]
- 12.Iwamoto T, Booser D, Valero V, Murray JL, Koenig K, Esteva FJ, Ueno NT, Zhang J, Shi W, Qi Y, Matsuoka J, Yang EJ, Hortobagyi GN, Hatzis C, Symmans WF, Pusztai L. Estrogen receptor (ER) mRNA and ER-related gene expression in breast cancers that are 1% to 10% ER-positive by immunohistochemistry. J Clin Oncol. 2012;30:729–734. doi: 10.1200/JCO.2011.36.2574. [DOI] [PubMed] [Google Scholar]
- 13.Collins LC, Botero ML, Schnitt SJ. Bimodal frequency distribution of estrogen receptor immunohistochemical staining results in breast cancer: an analysis of 825 cases. Am J Clin Pathol. 2005;123:16–20. doi: 10.1309/HCF035N9WK40ETJ0. [DOI] [PubMed] [Google Scholar]
- 14.Gelber RD, Gelber S. Facilitating consensus by examining patterns of treatment effects. Breast. 2009;18(Suppl 3):S2–S8. doi: 10.1016/S0960-9776(09)70265-6. [DOI] [PubMed] [Google Scholar]
- 15.Viale G, Slaets L, Bogaerts J, Rutgers E, van’t Veer L, Piccart-Gebhart MJ, de Snoo FA, Stork-Sloots L, Russo L, Dell’Orto P, van den Akker J, Glas A, Cardoso F. High concordance of protein (by IHC), gene (by FISH; HER2 only), and microarray readout (by TargetPrint) of ER, PgR, and HER2: results from the EORTC 10041/BIG 03-04 MINDACT trial. Ann Oncol. 2014;25:816–823. doi: 10.1093/annonc/mdu026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.