Skip to main content
NPJ Breast Cancer logoLink to NPJ Breast Cancer
editorial
. 2023 Jan 31;9:5. doi: 10.1038/s41523-023-00507-4

Artificial intelligence in breast pathology – dawn of a new era

Sunil S Badve 1,2,
PMCID: PMC9889344  PMID: 36720886

Abstract

Artificial intelligence methods are been increasingly used for analysis of pathology slides. In this issue of the Journal, Sandbank et al. describe the validation and utility of a robust second reader system that can distinguish in situ and standfirst invasive carcinomas from non-neoplastic lesions of the breast.

Subject terms: Breast cancer, Breast cancer


The field of pathology is challenging with discordance noted even amongst expert pathologists. Although subjectivity and discordance amongst experts are inherent in the field of medicine, discordance amongst pathologists have been traditionally viewed as a matter of grave concern. This is in part due to the fact that pathologic analysis forms the foundation for disease management. A diagnosis of cancer or a benign proliferation or presence or absence of predictive biomarker may result in a dramatic change in therapeutic options as compared to regimen A or B, particularly when there is equipoise. Thus, the need for objectivity in pathologic analysis has been clearly voiced by oncologists and pathologists alike.

The search for objectivity has led to the development and popularity of gene expression signatures, which although in some cases are no better than histologic grade, and provide objective numerical values for risk of recurrence. This subjectivity of grade was further highlighted to promote the objectivity of molecular assays. This has recently come back full circle with the adoption of multi-parametric scores such as RSClin, which calls for the incorporation of two subjective parameters (tumor size and grade) with the 21-gene recurrence score in prognostic determination1. It goes without saying that greater objectivity will promote better prognostication.

Artificial intelligence (AI) in pathology (also called Pathomics) has blossomed in to a strong discipline wherein objectivity can be achieved. Whole slide images (WSI) can be generated with relative ease and made available to data scientists, who can extract 1000 s of features from these images. These features are correlated with biologic phenotype to create algorithms that enable recognition of phenotype, akin to that in genomics. In the early days, a variety of machine learning methods such as support vectors, and random-forests, were deployed, however, convoluted neural networks (CNN) has become the workhorse of pathomic analysis. CNNs are designed to use multi-level image structure, where basic image features such as contours are defined by changes in neighborhood pixel intensities and larger patterns are effectively successive combinations of smaller ones2. CNNs make predictions directly from images without relying on manually engineered intermediate steps; the image is gradually transformed in to a set of features that can be used for algorithm development. CNN-based algorithms have been successfully used for tumor detection, classification and prognostication as well as predicting response to therapy24.

Although theoretically simple, the AI-based analyses are complicated by the fact that these algorithms detect everything on the slides including scratches, ink-dots, dust marks and fingerprints. The analysis is also dependent on a number of pre-analytic and analytic factors including section thickness, the tinctorial characteristics of the H&E (hematoxylin and eosin) stain, and scanners used. Therefore, although the literature is full of examples of successful algorithms for tumor classification and prognostication, many tend to do poorly when applied to external cohorts. This “domain shift” needs to be mitigated before an algorithm can be clinically successful.

As of June 2022, a wide range of Artificial Intelligence (AI) as a Medical Device (AIaMDs) have received regulatory clearance internationally, with at least 343 devices cleared by the US Food and Drug Administration (FDA)5. In view of the rapid development of a large number of AlaMDs, the U.S. Food and Drug Administration (FDA), Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) have jointly identified 10 guiding principles that can inform the development of Good Machine Learning Practice (GMLP)6. These guiding principles will help promote safe, effective, and high-quality medical devices that use artificial intelligence and machine learning (AI/ML). There are major concerns regarding the presence of systemic, statistical and computational as well as human biases in AI7. In addition, there is major movement in the field of AI for the development of ethical AI8. This requires assessment of algorithms not only through the lens of performance but also through the various actors, processes, and objectives that drive the development and eventual deployment of the algorithm8.

Whole slide-based AI algorithms are often considered black boxes, as it is far from clear which features they are recognizing. The explainability is often restricted to a few features that by themselves would not explain the success of the algorithm. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias9,10. However, Ghassemi et al.11 suggest that this represents a false hope. They argue that rigorous internal and external validation of AI models could be a more direct means of achieving the goals often associated with explainability. They caution against having explainability be a requirement for clinically deployed models. In light of these comments, the work by Sandbank et al.12 provides a route to explainability by training algorithm on histological features. The CNN-based algorithm was developed to detect 51 different features associated with breast cancer. These features included cytological and morphological features of tumor cells in addition to other features such as inflammation, microcalcifications and adenosis.

Sandbank et al.12 have sought to develop and validate an assay for the detection of invasive and in situ breast carcinomas in a large series of cases. The initial work involved expert labor-intensive annotations and labeling of 1000s of areas from 2000 slides by a team of 18 pathologists. These 2000 slides were selected from a series of 115,000 slides to ensure representation of rare and unusual morphologies. Furthermore, to overcome the impact of domain shift, these cases were obtained from 9 different laboratories, each with their own pre-analytical variables. Initial training on a large number of cases with additional cross-laboratory training adds to the robustness of AI analysis. The failure to do so often leads to failure of many AI algorithms during the external validation.

The need for a large number of cases and associated manual annotation has been identified as a major bottleneck for AI analysis13. Newer methods are being developed that can circumvent these needs. Ren et al.14 have proposed that unsupervised domain adaptation could be performed using color normalization and/ or adversial training techniques. Unsupervised methods can be used to structure extremely large datasets. Similarly, self-supervised learning can be used to help models learn morphological, geometrical and contextual content of images using unlabeled data. Lastly, generative adversial networks (GAN) can be used to train on real images and synthesize realistic synthetic data; this can augment datasets and increase the performance of models with limited training15. Conditional GAN has been used for color normalization16. Janowczyk et al.17 have developed an open source quality control tool (HistoQC) for digital pathology slides to recognize and address the issues related to H&E quality.

Another important parameter for evaluation is the generalizability of the algorithm. Sandbank et al.12 validate their algorithm by obtaining slides/cases from two different institutions, stained with local methods (H&E and HES) which were scanned using 2 FDA-approved scanners. Furthermore, they use a large number of clinical cases to compare the diagnosis with expert pathologists. The analysis was performed on 5954 cases (12,031 slides) with alerts for invasive and in situ carcinoma. Invasive alert was raised for 363 (4.2%) of the slides of which 272 cases had been diagnosed as benign. Similarly, in situ alert was raised for 333 slides (3.8%; 237 cases). A review of these cases/ slides showed that 75% of the alerts were for necrosis, fibroadenomatous changes, hyperplasia or other features while 25% required additional workup to confirm or refute a malignant diagnosis; 2% of these called for additional second opinion. Overall the study showed that the algorithm could achieve a high AUC (0.99 for invasive cancer and 0.97 for in situ disease). This study design and output supports the notion that the algorithm could be generalizable.

The authors also took the opportunity to study concordance between the study pathologists and the original pathology report12. This analysis highlighted 11 discrepantly called cases between pathologists; seven had been called DCIS/ADH, while four cases were called benign. The review lead to the issuance of amended reports on these cases. From the patient management point of view, this indicates that the pathology labs misdiagnosed only four cases (an error rate of ~0.00067) for invasive cancer and 14 cases for in situ disease (an error rate of ~0.0023) out of 5954 cases, a remarkable performance.

The limitations of the study12 include the fact that the work was performed on biopsies and not excision specimens. The latter tend to be enriched for variants of benign lobules showing varying degrees of atrophy, in addition to other benign proliferations. However, the authors state that they were planning to extend the work in addressing these and other issues related to grading and assessment of margins. AI algorithms can be impacted by patient populations and healthcare disparities. Furthermore, they can systematically mis-represent and exacerbate health problems in minority populations18,19. Although the racial distribution of the patient population is not provided, the current study was involved assessment of the algorithm in a large metropolitan area, which is likely to have multi-ethnic patient population. Furthermore, it is unlikely that the patient ethnicity and health inequities will affect the performance of an algorithm developed for the histological diagnosis of cancer.

Overall, this work offers an excellent blue print for the development and validation of algorithms in digital pathology. The main question before us now is what degree of validation is necessary prior to clinical deployment of the algorithm as a second-read system. Is the development and validation in 7485 cases (15,124 slides) from at least nine different institutions sufficient? Is an error rate of a few percentage points good enough? I for one, would gladly accept such a tool to prevent the less than 0.001% error that pathologists make. The question, however, ultimately boils down to the cost of doing the second reads and what the patients and payers are ready to accept as human error.

Acknowledgements

S.S.B. is supported by funds from the Department of Pathology, Emory University, and from Susan G Komen leadership grant.

Author contributions

The author (S.S.B.) has written and approved this manuscript.

Competing interests

The author declares no competing interests. S.S.B. is an Associate Editor of NPJ Breast Cancer.

References

  • 1.Sparano JA, et al. Development and validation of a tool integrating the 21-gene recurrence score and clinical-pathological features to individualize prognosis and prediction of chemotherapy benefit in early breast cancer. J. Clin. Oncol. 2021;39:557–564. doi: 10.1200/JCO.20.03007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Shmatko A, Ghaffari Laleh N, Gerstung M, Kather JN. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Cancer. 2022;3:1026–1038. doi: 10.1038/s43018-022-00436-4. [DOI] [PubMed] [Google Scholar]
  • 3.Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 2019;16:703–715. doi: 10.1038/s41571-019-0252-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Baxi V, Edwards R, Montalto M, Saha S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod. Pathol. 2022;35:23–32. doi: 10.1038/s41379-021-00919-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ganapathi S, et al. Tackling bias in AI health datasets through the STANDING together initiative. Nat. Med. 2022;28:2232–2233. doi: 10.1038/s41591-022-01987-w. [DOI] [PubMed] [Google Scholar]
  • 6.Good Machine Learning Practice for Medical Device Development: Guiding Principles. https://www.gov.uk/government/publications/good-machine-learning-practice-formedical-device-development-guiding-principles (2021).
  • 7.Schwartz, R. et al. Towards a standard for identifying and Managing Bias in Artificial Intelligence. NIST Special Publication 1270. 10.6028/NIST.SP.1270 (2022).
  • 8.Ng MY, Kapur S, Blizinsky KD, Hernandez-Boussard T. The AI life cycle: a holistic approach to creating ethical AI for health decisions. Nat. Med. 2022;28:2247–2249. doi: 10.1038/s41591-022-01993-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gastounioti A, Kontos D. Is it time to get rid of black boxes and cultivate trust in AI? Radiol. Artif. Intell. 2020;2:e200088. doi: 10.1148/ryai.2020200088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Reyes M, et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol. Artif. Intell. 2020;2:e190043. doi: 10.1148/ryai.2020190043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health. 2021;3:e745–e750. doi: 10.1016/S2589-7500(21)00208-9. [DOI] [PubMed] [Google Scholar]
  • 12.Sandbank J, et al. Validation and real-world clinical application of an artificial intelligence algorithm for breast cancer detection in biopsies. NPJ Breast Cancer. 2022;8:129. doi: 10.1038/s41523-022-00496-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Srinidhi CL, Ciga O, Martel AL. Deep neural network models for computational histopathology: a survey. Med. Image Anal. 2021;67:101813. doi: 10.1016/j.media.2020.101813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ren J, Hacihaliloglu I, Singer EA, Foran DJ, Qi X. Unsupervised domain adaptation for classification of histopathology whole-slide images. Front. Bioeng. Biotechnol. 2019;7:102. doi: 10.3389/fbioe.2019.00102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jose L, Liu S, Russo C, Nadort A, Di Ieva A. Generative adversarial networks in digital pathology and histopathological image processing: a review. J. Pathol. Inform. 2021;12:43. doi: 10.4103/jpi.jpi_103_20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gadermayr M, et al. Generative adversarial networks for facilitating stain-independent supervised and unsupervised segmentation: a study on kidney histology. IEEE Trans. Med. Imaging. 2019;38:2293–2302. doi: 10.1109/TMI.2019.2899364. [DOI] [PubMed] [Google Scholar]
  • 17.Janowczyk A, Zuo R, Gilmore H, Feldman M, Madabhushi A. HistoQC: an open-source quality control tool for digital pathology slides. JCO Clin. Cancer Inform. 2019;3:1–7. doi: 10.1200/CCI.18.00157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 2021;27:2176–2182. doi: 10.1038/s41591-021-01595-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]

Articles from NPJ Breast Cancer are provided here courtesy of Nature Publishing Group

RESOURCES