Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2024 May 27;48(7):846–854. doi: 10.1097/PAS.0000000000002248

Artificial Intelligence Helps Pathologists Increase Diagnostic Accuracy and Efficiency in the Detection of Breast Cancer Lymph Node Metastases

Juan Antonio Retamero *, Emre Gulturk *, Alican Bozkurt *, Sandy Liu , Maria Gorgan , Luis Moral , Margaret Horton *, Andrea Parke *, Kasper Malfroid *, Jill Sue *, Brandon Rothrock *, Gerard Oakley *, George DeMuth , Ewan Millar *,§, Thomas J Fuchs *,∥,, David S Klimstra *
PMCID: PMC11191045  PMID: 38809272

Abstract

The detection of lymph node metastases is essential for breast cancer staging, although it is a tedious and time-consuming task where the sensitivity of pathologists is suboptimal. Artificial intelligence (AI) can help pathologists detect lymph node metastases, which could help alleviate workload issues. We studied how pathologists’ performance varied when aided by AI. An AI algorithm was trained using more than 32 000 breast sentinel lymph node whole slide images (WSIs) matched with their corresponding pathology reports from more than 8000 patients. The algorithm highlighted areas suspicious of harboring metastasis. Three pathologists were asked to review a dataset comprising 167 breast sentinel lymph node WSIs, of which 69 harbored cancer metastases of different sizes, enriched for challenging cases. Ninety-eight slides were benign. The pathologists read the dataset twice, both digitally, with and without AI assistance, randomized for slide and reading orders to reduce bias, separated by a 3-week washout period. Their slide-level diagnosis was recorded, and they were timed during their reads. The average reading time per slide was 129 seconds during the unassisted phase versus 58 seconds during the AI-assisted phase, resulting in an overall efficiency gain of 55% (P<0.001). These efficiency gains are applied to both benign and malignant WSIs. Two of the 3 reading pathologists experienced significant sensitivity improvements, from 74.5% to 93.5% (P≤0.006). This study highlights that AI can help pathologists shorten their reading times by more than half and also improve their metastasis detection rate.

Key Words: Artificial intelligence, breast cancer, lymph node metastases, digital pathology


Breast cancer is the most common nonskin cancer in the United States and affected an estimated 300 000 women in 2023.1 It represents one-third of all female cancer cases, and about 44 000 women will die of the disease annually. With a median age at presentation of 62, the overall lifetime risk of an American woman developing breast cancer is about 1 in 8, or 13%.1 It has been long recognized that the main prognostic indicator in breast cancer patients is whether the disease is confined to the breast or has spread to other sites, in particular axillary lymph nodes, which serves as a guide to selecting the most suitable treatment.2,3 Breast cancer lymph node examination is tedious and time-consuming, and the literature shows that the diagnostic performance of pathologists could be improved.4 One study showed that when subspecialized breast pathologists reviewed the lymph nodes diagnosed by nonspecialized pathologists, 24% of patients were assigned a different, usually higher, nodal stage.5

Artificial intelligence (AI) systems can accurately detect breast cancer metastases in digital whole slide images (WSIs) of breast lymph nodes (BLN) stained with hematoxylin and eosin (H&E).6 This technology could assist pathologists in the detection of BLN metastases. The CAMELYON16 grand challenge evaluated AI algorithms for the automated detection of metastases in H&E-stained WSIs of lymph node sections. Seven of 32 evaluated AI algorithms showed greater discrimination than a panel of 11 pathologists who operated in a simulated, time-constrained environment.6 In this challenge, the best-performing pathologist in the study was outperformed by the best-performing algorithm on the study set. The literature also shows that pathologists assessing BLN aided by AI also improve their diagnostic efficiency.7,8

We conducted a study to assess how Paige breast lymph node (Paige BLN), an AI-based tumor detection system, influenced pathologists during the assessment of BLN. We measured diagnostic accuracy and efficiency to see whether pathologists’ use of Paige BLN would improve these metrics versus unaided BLN assessment.

MATERIALS AND METHODS

Case Enrollment and Validation

Archival BLN material from Memorial Sloan Kettering Cancer Center (MSKCC, New York, NY) was obtained. The WSIs corresponded to routine diagnostic materials and were retrieved from the digital archive and deidentified. These WSIs had been digitized at 40x via Leica Aperio GT450 and AT2 scanners (0.25 microns per pixel). The WSIs in the dataset met routine quality standards at the diagnosing institution, and no additional curation was performed to remove slides due to artifacts. A total of 167 WSIs from 148 patients were recruited, which contained 98 WSIs that were benign and 69 that contained metastases of various sizes (Table 1). Of the 69 slides showing metastases, 13 contained isolated tumor cells (ITCs, defined as tumor deposits measuring ≤0.2 mm or containing fewer than 200 tumor cells), 20 contained “small” micrometastases (defined as metastatic deposits measuring between >0.2 and 0.5 mm), 17 contained “large” micrometastases (defined as measuring between >0.5 and 2 mm), and 19 harbored macrometastases (measuring over 2 mm). Memorial Sloan Kettering synoptic reports were taken as the first step for ground truth determination. These reports were generated at the part level rather than slide level. Therefore, each selected slide was reviewed by a subspecialist breast pathologist, and slide-level labels were collected. For slides reaching a consensus between the synoptic report and the subspecialist review, the ground truth was established. For a small subset of slides where the subspecialist review did not match the report (usually because the selected slide did not exhibit the findings of other slides in the part that were the basis for the reported diagnosis), a second subspecialist reviewed these slides, and determined the ground truth. There were no slides where the synoptic report, subspecialist number 1, and subspecialist number 2 were not able to reach a ground truth consensus. Paige BLN was not used as an aid in establishing the ground truth diagnosis.

TABLE 1.

Case Enrollment and Ground Truth Diagnosis for Each WSI

Characteristic Suspicious for cancer (N=69) Not suspicious for cancer (N=98)
Benign N/A 98/98
ITC (<0.2 mm) 13/69 N/A
Micrometastasis (0.2 mm–2 mm) 37/69 N/A
 Small (0.2–0.5 mm) 20/37 N/A
 Large (0.5–2 mm) 17/37 N/A
Macrometastasis (>2 mm) 19/69 N/A
Treatment status
 Treated 1/69 10/98
 Not treated 68/69 88/98
Scanner
 Leica AT2 59/69 54/98
 Leica GT450 10/69 44/98
Origin
 Memorial Sloan Kettering 40/69 91/98
 Other institutions 29/69 7/98

A total of 167 WSIs were recruited.

ITC indicates isolated tumor cells; WSIs, whole slide images.

Of the 167 cases recruited, 13 corresponded to lobular carcinoma (about 8%), of which 2 had positive LN. The rest were ductal carcinoma NOS. Also, 11 cases (7%) had received neoadjuvant treatment. Of these, 1 was positive.

Reader Characteristics

Three US pathologists board-certified in anatomic pathology participated in this study. They ranged in experience from 21 to 32 years, and none of them had subspecialized breast pathology training or had previously used digital pathology in routine clinical practice. Before initiation of the study, the pathologists routinely signed out breast lymph nodes as part of their clinical practice. The participating pathologists were compensated for their time participating in this study.

Reader Training

Before study initiation, participants were trained on the use of Paige’s FDA-cleared pathology viewer, FullFocus; Paige BLN, and the data capture tool via a presentation during a live video conference session. Participants had to demonstrate competency at the end of training.

Diagnostic Category Classification

The pathologists were instructed to review all lymph nodes present within a WSI and to classify each WSI as negative or containing either ITC, micrometastasis (small or large) or macrometastasis, depending on the largest tumor deposit present at a slide level.

Paige BLN

Paige BLN is based on the weakly supervised deep learning algorithm described by Campanella et al9, which we briefly summarize here. Paige BLN was trained on 32 000 WSIs from over 8000 patients using multiple instance learning, by which each WSI is coupled with its corresponding pathology report, so no pixel-level annotations are required. Paige BLN breaks each WSI into 224 × 224-pixel tiles, with all tiles identified as background (i.e., not containing tissue) removed from the analysis. During prediction, a ResNet-34 convolutional neural network outputs the probability of cancer for all nonbackground tiles. Next, a 512-dimensional feature vector (embedding) is extracted for the top tiles with the largest probabilities, and these are then passed into a recurrent neural network that aggregates information across tiles to make the final prediction.

Paige BLN offers a binary classification at the WSI level. In slides where Paige BLN does not detect cancer, the message “Suspicious tissue not detected” is displayed (Fig. 1). In those WSIs where cancer is detected, the system marks the area with the highest probability of harboring metastasis (Fig. 2). The pathologist has the option to toggle off the cancer indicator after it is displayed to better visualize the focus of interest. In addition, the viewer could optionally display a tissue map, showing areas with benign tissue at reduced contrast while the original contrast is maintained in the suspicious areas, thus highlighting the regions with suspicions findings (Fig. 3).

FIGURE 1.

FIGURE 1

Paige BLN offers a binary classification at the WSI level. In slides where Paige BLN does not detect cancer, the message “Suspicious tissue not detected” is displayed. BLN indicates breast lymph nodes, WSI, whole slide image.

FIGURE 2.

FIGURE 2

In WSIs where cancer is detected, the system marks the area with the highest probability to harbor metastasis with a crosshair mark. WSI indicates whole slide image.

FIGURE 3.

FIGURE 3

The viewer can optionally display a tissue map, showing areas with benign tissue at reduced contrast while the original contrast is maintained in the suspicious areas, thus highlighting the regions with suspicions findings.

In October 2023 Paige BLN was granted breakthrough designation by the US Food and Drug Administration (FDA). The Breakthrough Devices Program is intended for certain medical devices that provide for more effective treatment or diagnosis of serious medical conditions. Its purpose is to expedite the assessment and review for marketing authorization, as long as the device meets the FDA’s safety and effectiveness standards.10

Study Design

The study followed a multireader, multicase design with modality crossover and consisted of the evaluation of the challenging dataset described above and summarized in Table 1 by the same 3 participating pathologists, who read this dataset twice in 2 phases (phases I and II) separated by at least 3 weeks. The phases were randomized between readers to control for recall bias. Slide order was randomized between each phase and each reader to reduce reading order bias. Phase I involved reading the WSIs digitally without AI assistance and phase II involved reading the same WSIs by the same pathologists but with Paige BLN assistance. In each phase, pathologists timed themselves while they read the WSIs using provided stopwatches, entering a unique code and time to the database to unlock the next slide image, and the reading times were recorded on a WSI basis.

Phases I and II were identical, except that in phase II each WSI was preanalyzed by Paige BLN, which was used to perform cancer detection. During phase II, the outputs of Paige BLN were only visible when the reading pathologists activated the outputs by clicking an AI icon on the screen.

Statistical Analysis

For sensitivity and specificity analyses, a U-statistics approach was used to obtain 95% confidence intervals (CIs) for both aided and unaided modalities, and the 2-sided 95% CIs for the difference and corresponding P value (iMRMC R package). For individual readers, 2-sided 95% exact binomial CIs were obtained for the aided and unaided rates. McNemar’s test was used to calculate the P value for the hypothesis that shifts in slide classifications were homogeneous with respect to the reading mode. The analysis of pathologist read time was assessed using a mixed model linear regression with mode, reader, and mode x reader interaction included as fixed effects, slide as a random effect, and assuming a common covariance matrix for pairs of read times per side for each reader.

RESULTS

Paige BLN Standalone Performance

The standalone evaluation of Paige BLN showed an overall sensitivity of 92.8%. For ITCs, the standalone sensitivity of Paige BLN was 78%. The standalone specificity of Paige BLN was 94.9%, whereas the positive predictive value (PPV) was 92.8%, and the negative predictive value (NPV) was 94.9%. Thirteen lobular carcinoma cases (of which 2 were positive) and the 11 cases that had received neoadjuvant treatment (of which 1 was positive) were correctly classified by Paige BLN.

Pathologist Diagnostic Accuracy Performance

On average, the sensitivity of pathologists without AI assistance diagnosing this challenging dataset was 81.2% (95% CI, 87.7%–98.8%). Of the 69 slides with metastases, on average 13 were given an erroneous benign diagnosis by the pathologists. The greatest number of false negatives corresponded to ITCs, for which pathologists had an average sensitivity of 46.2% (95% CI, 15.2%–77.1%), followed by small micrometastases (average sensitivity 73.3% [95% CI 95%, 46.1%–100.0%]), and large micrometastases (average sensitivity 96.1% [95% CI, 88.4%–100.0%]). The specificity of the pathologists without AI assistance was 96.3% (95% CI, 91.5%–100.0%).

During the assisted phase the sensitivity of pathologists to detect metastases was on average 93.2%, which represents an increase of 12.1% compared with the unassisted phase (sensitivity of 81.2%). When examined as a group, this result did not reach statistical significance (P=0.09). Of the 3 participating pathologists, 2 of them experienced statistically significant increases in tumor detection sensitivity (from 72.5% and 78.3% to 94.2% and 92.8%, respectively [P≤0.006], see Table 2 and Fig. 4). The third pathologist did not experience a sensitivity improvement, but this reader spent the longest time reviewing the reading dataset in the unassisted phase, see Table 8 and discussion below. The change in sensitivity was greatest for ITCs. Table 2 shows the change in sensitivity by pathologist and Tables 37 show the diagnostic sensitivity by metastasis size for cancer slides or specificity for benign slides. Two of the 3 pathologists showed a statistically significant increase in sensitivity in the detection of ITCs (Table 3, P=0.031) and 1 pathologist also experienced a statistically significant increase in sensitivity in the detection of small micrometastases (Table 4, P=0.008). There was no difference in specificity between both phases (Table 7).

TABLE 2.

Pathologist Sensitivity on Cancer Slides

Aided Unaided Aided-unaided
Assessment %* 95% CI %* 95% CI %* 95% CI P §
Overall 93.2 87.7%–98.8% 81.2 67.6%–94.7% 12.1 −2.0% to 26.1% 0.092
Pathologist 1 92.8 83.9%–97.6% 92.8 83.9%–97.6% 0.0 NA 1.000
Pathologist 2 94.2 85.8%–98.4% 72.5 60.4%–82.5% 21.7 NA <0.001
Pathologist 3 92.8 83.9%–97.6% 78.3 66.7%–87.3% 14.5 NA 0.006
*

Based on n of 69 slides.

Confidence interval and P values obtained from iMRMC software, U-Statistics analysis. Exact binomial CIs for individual readers.

For overall assessment, null hypothesis that the difference is equal to 0.0.

§

For individual reader’s P-value from McNemar’s test that off diagonal counts are evenly distributed.

Overall results obtained based on average of 3 reader pathologists.

FIGURE 4.

FIGURE 4

Change in sensitivity with AI assistance. There was an improvement in sensitivity on average, but this was only statistically significant for some of the readers. Reader 1 was the most accurate, but also took the longest time to read. AI indicates artificial intelligence.

TABLE 8.

Pathologist Read Timing on All Slides

Reader time analysis (N=167)
Least squared means (SE) Aided-unaided difference ANOVA tests*
Measure Assessment Aided Unaided Estimate 95% CI P Factor P
Time (s) Overall 58.3 (3.50) 128.5 (5.69) −70.2 −78.9 to −61.5 <0.001 Mode <0.001
Pathologist 1 71.3 (4.03) 212.0 (8.75) −140.8 −155.8 to −125.7 <0.001 Reader <0.001
Pathologist 2 38.2 (4.03) 76.3 (8.75) −38.0 −53.1 to −23.0 <0.001 Mode-Reader <0.001
Pathologist 3 65.3 (4.03) 97.2 (8.75) −31.9 −47.0 to −16.8 <0.001
*

Read time model as function of fixed effects of mode, reader, and mode x reader interaction with slide as random effect.

P value for the 2-sided null hypothesis that the difference is equal to 0.0.

Type III tests for null hypothesis that levels of factor are homogeneous.

TABLE 3.

Pathologist Sensitivity for ITCs (<0.2 mm)

Aided Unaided Aided-unaided
Assessment %* 95% CI %* 95% CI % 95% CI P §
Overall 79.5 59.9%–99.1% 46.2 15.2%–77.1% 33.3 −0.6% to 67.2% 0.054
Pathologist 1 76.9 46.2%–95.0% 69.2 38.6%–90.9% 7.7 NA 1.000
Pathologist 2 84.6 54.6%–98.1% 38.5 13.9%–68.4% 46.2 NA 0.031
Pathologist 3 76.9 46.2%–95.0% 30.8 9.1%–61.4% 46.2 NA 0.031
*

Based on n of 13 slides.

Confidence interval and P values obtained from iMRMC software, U-Statistics analysis. Exact binomial CIs for individual readers.

For overall assessment, null hypothesis that the difference is equal to 0.0.

§

For individual reader’s P value from McNemar’s test that off-diagonal counts are evenly distributed.

Overall results were obtained based on average of 3 reader pathologists.

TABLE 7.

Pathologist Specificity on Benign Slides

Aided Unaided Aided-unaided
Assessment %* 95% CI %* 95% CI % 95% CI P
Overall 96.9 93.9%–99.9% 96.3 91.5%–100.0% 0.7 −4.1%–5.5% 0.782
Pathologist 1 96.9 91.3%–99.4% 91.8 84.5%–96.4% 5.1 NA 0.125
Pathologist 2 98.0 92.8%–99.8% 99.0 94.4%–100.0% −1.0 NA 1.000
Pathologist 3 95.9 89.9%–98.9% 98.0 92.8%–99.8% −2.0 NA 0.625
*

Based on n of 98 slides.

Confidence interval and P values obtained from iMRMC software, U-Statistics analysis. Exact binomial CIs for individual readers.

For overall assessment, null hypothesis that the difference is equal to 0.0.

TABLE 4.

Pathologist Sensitivity for Small Micrometastases (0.2–0.5 mm) Slides

Aided Unaided Aided-unaided
Assessment %* 95% CI %* 95% CI % 95% CI P §
Overall 90.0 76.5%–100.0% 73.3 46.1%–100.0% 16.7 −12.2% to 45.5% 0.258
Pathologist 1 90.0 68.3%–98.8% 95.0 75.1%–99.9% −5.0 NA 1.000
Pathologist 2 90.0 68.3%–98.8% 50.0 27.2%–72.8% 40.0 NA 0.008
Pathologist 3 90.0 68.3%–98.8% 75.0 50.9%–91.3% 15.0 NA 0.375
*

Based on n of 20 slides.

Confidence interval and P values obtained from iMRMC software, U-Statistics analysis. Exact binomial CIs for individual readers.

For overall assessment, null hypothesis that the difference is equal to 0.0.

§

For individual reader’s P-value from McNemar’s test that off diagonal counts are evenly distributed.

Overall results obtained based on average of 3 reader pathologists.

TABLE 5.

Pathologist Sensitivity for Large Micrometastases (0.5–2.0  mm)

Aided Unaided Aided-unaided
Assessment %* 95% CI %* 95% CI % 95% CI P §
Overall 100.0 NA–NA 96.1 88.4%–100.0% 3.9 −3.8% to 11.6% 0.318
Pathologist 1 100.0 80.5%–100.0% 100.0 80.5%–100.0% 0.0 NA 1.000
Pathologist 2 100.0 80.5%–100.0% 94.1 71.3%–99.9% 5.9 NA 1.000
Pathologist 3 100.0 80.5%–100.0% 94.1 71.3%–99.9% 5.9 NA 1.000
*

Based on n of 17 slides.

Confidence interval and P values obtained from iMRMC software, U-Statistics analysis. Exact binomial CIs for individual readers.

For overall assessment, null hypothesis that the difference is equal to 0.0.

§

For individual reader’s P value from McNemar’s test that off diagonal counts are evenly distributed.

Overall results were obtained based on average of 3 reader pathologists.

TABLE 6.

Pathologist Sensitivity for Macrometastases (>2.0 mm) Slides

Aided Unaided Aided-unaided
Assessment %* 95% CI %* 95% CI % 95% CI P §
Overall 100.0 NA–NA 100.0% NA–NA 0.0 0.0%–0.0% NA
Pathologist 1 100.0 82.4%–100.0% 100.0% 82.4%–100.0% 0.0 NA 1.000
Pathologist 2 100.0 82.4%–100.0% 100.0% 82.4%–100.0% 0.0 NA 1.000
Pathologist 3 100.0 82.4%–100.0% 100.0% 82.4%–100.0% 0.0 NA 1.000
*

Based on n of 19 slides.

Confidence interval and P values obtained from iMRMC software, U-Statistics analysis. Exact binomial CIs for individual readers.

For overall assessment, null hypothesis that the difference is equal to 0.0.

§

For individual reader’s P-value from McNemar’s test that off diagonal counts are evenly distributed.

Overall results were obtained based on average of 3 reader pathologists.

Pathologist Reading Times

The average reading time of pathologists during the unassisted phase was 128.5 seconds per WSI. The unassisted reading times varied considerably among the 3 pathologists (from 76.3 to 212.0 seconds per slide). As noted above, the pathologist with the longest average reading time (pathologist 1) also had a significantly greater accuracy in the unassisted phase (Tables 2 and 8). When not aided by AI, reading times were longer for benign WSIs (160.4 s average) than for malignant WSIs (83.1 s average). For the malignant WSIs, the average reading time was the longest for small micrometastases (128.3 s) followed by ITCs (116.7 s), large micrometastases (79.2 s), and macrometastases (16.1s).

During the assisted phase, all reading times were significantly shorter. The average slide was read in 58.3 s (compared with 128.5 s [P<0.001]) (Table 8). Reading time for suspicious slides was reduced by 69.1% on average, from 83.1 s to 25.7 s (P<0.001). These efficiency benefits were present across all metastasis sizes, although Paige BLN had the greatest impact when reading large micrometastases (79.2 s down to 19.5s or a 75.4% time reduction, P=0.003), followed by small micrometastases (from 128.3s down to 35.1s or a 72.6% time reduction, P<0.001), ITCs (116.7 s down to 44.6s or a 61.8% time reduction, P<0.001) and macrometastases (16.1 s down to 8.2 s or a 49.1% time reduction, P=0.041). Reading benign WSIs with AI assistance also demonstrated a significant efficiency gain, from 160.4 s during the unassisted phase to 81.2 s with Paige BLN assistance, or 49.4% gain (P<0.001).

Table 8 and Figure 5 show the reading time changes by pathologist. The average reading time in seconds per slide during the unassisted phase varied between 212 s and 76.3 s, suggesting different reading behaviors among the participants. Table 9 shows the change in reading times between malignant and benign WSIs, and Table 10 shows the change in reading times for malignant slides, broken down by metastasis size.

FIGURE 5.

FIGURE 5

AI assistance helped pathologist become faster. There was a statistically significant 55% reduction in reading times for all readers, from 128 seconds without AI assistance, to 58 seconds with AI assistance to read the average slide.

TABLE 9.

Pathologist Read Time (sec) by Slide Type

Reader time analysis
Least squared means (SE) Aided-unaided difference ANOVA tests*
Slide type Assessment Aided Unaided Estimate 95% CI P Factor P ‡.
Suspicious (n=69) Overall 25.7 (3.33) 83.1 (7.85) −57.5 −71.6 to −43.4 <0.001 Mode <0.001
Pathologist 1 27.2 (4.00) 122.1 (12.94) −94.9 −119.3 to −70.5 <0.001 Reader <0.001
Pathologist 2 15.0 (4.00) 51.9 (12.94) −37.0 −61.4 to −12.6 0.003 Mode-Reader 0.001
Pathologist 3 34.8 (4.00) 75.4 (12.94) −40.6 −65.0 to −16.1 0.001
Non-Suspicious (n=98) Overall 81.2 (4.24) 160.4 (6.64) −79.2 −89.5 to −68.9 <0.001 Mode <0.001
Pathologist 1 102.3 (5.10) 275.3 (10.22) −173.1 −190.9 to −155.2 <0.001 Reader <0.001
Pathologist 2 54.6 (5.10) 93.4 (10.22) −38.8 −56.6 to −20.9 <0.001 Mode-Reader <0.001
Pathologist 3 86.7 (5.10) 112.6 (10.22) −25.8 −43.7 to −7.9 0.005
*

Read time model as function of fixed effects of mode, reader, and mode x reader interaction with slide as random effect.

P value for 2-sided null hypothesis that the difference is equal to 0.0.

Type III tests for null hypothesis that levels of factor are homogeneous.

TABLE 10.

Pathologist Read Timing by Slide Characteristic for Positive Slides

Reader time analysis
Least squared means (SE) Aided-unaided difference ANOVA tests*
Characteristic Assessment Aided Unaided Estimate 95% CI P Factor P
ITC (n=13) Overall 44.6 (9.21) 116.7 (14.48) −72.1 −100.5 to −43.6 <0.001 Mode <0.001
Pathologist 1 48.6 (10.80) 173.2 (22.18) −124.5 −173.8 to −75.3 <0.001 Reader <0.001
Pathologist 2 23.7 (10.80) 80.9 (22.18) −57.2 −106.4 to −8.0 0.023 Mode-Reader 0.033
Pathologist 3 61.6 (10.80) 96.0 (22.18) −34.4 −83.6 to 14.8 0.167
Small Micromets (n=20) Overall 35.1 (7.67) 128.3 (15.49) −93.1 −119.1 to −67.2 <0.001 Mode <0.001
Pathologist 1 39.3 (9.23) 199.9 (25.08) −160.6 −205.6 to −115.6 <0.001 Reader <0.001
Pathologist 2 19.7 (9.23) 72.1 (25.08) −52.3 −97.3 to −7.3 0.023 Mode-Reader 0.002
Pathologist 3 46.3 (9.23) 112.8 (25.08) −66.5 −111.5 to −21.5 0.004
Large Micromets (n=17) Overall 19.5 (3.47) 79.2 (18.80) −59.8 −98.1 to −21.5 0.003 Mode 0.003
Pathologist 1 20.2 (6.01) 112.9 (32.56) −92.7 −159.1 to −26.3 0.007 Reader 0.299
Pathologist 2 12.6 (6.01) 48.1 (32.56) −35.5 −101.8 to 30.9 0.291 Mode-Reader 0.459
Pathologist 3 25.5 (6.01) 76.7 (32.56) −51.2 −117.5 to 15.2 0.129
Macromets (n=19) Overall 8.2 (1.08) 16.1 (3.88) −7.9 −15.5 to −0.3 0.041 Mode 0.041
Pathologist 1 5.9 (1.46) 13.3 (6.62) −7.4 −20.5 to 5.8 0.268 Reader 0.266
Pathologist 2 6.1 (1.46) 14.4 (6.62) −8.4 −21.5 to 4.8 0.209 Mode-Reader 0.994
Pathologist 3 12.7 (1.46) 20.7 (6.62) −8.0 −21.1 to 5.1 0.229
*.

Read time model as function of fixed effects of mode, reader, and mode x reader interaction with slide as random effect.

P value for 2-sided null hypothesis that the difference is equal to 0.0.

Type III tests for null hypothesis that levels of factor are homogeneous.

DISCUSSION

Previous studies have reported the challenges faced by pathologists when assessing BLN specimens, which is a task commonly performed by generalists under time pressure. Vestjens5 revealed that around one-quarter of all BLN cases diagnosed by general pathologists had their N status changed (usually to a higher stage) when these specimens were reviewed by specialists. A higher N status may result in upstaging of the entire tumor stage, which has a different prognostic implication, can change selection of therapy, and thus may impact the final clinical outcome.5 Furthermore, the distinction between ITCs and micrometastases can be challenging, with low interobserver agreement rates—but this distinction is clinically important, affecting final N staging, subsequent treatment, and clinical outcome.11

AI tools have been developed to help pathologists improve their diagnostic accuracy and efficiency. These novel tools show significant promise, as demonstrated in the CAMELYON challenges. In these challenges, AI tools demonstrated comparable diagnostic accuracy to that of an expert breast pathologist practicing without time constraints, and superior to that of general pathologists under time pressure.6 However, one of the CAMELYON challenges was the detection of ITCs, and reported sensitivity rates were below 40%.12 Paige BLN, however, showed a standalone sensitivity of over 78% in the detection of ITCs, superior to the rate typically reported in the literature.

In our study, although the pathologists were timed, they did not have any time constrains. Pathologists using Paige BLN had marked statistically significant efficiency gains, with significant reductions in average times to read a WSI, whether benign or malignant. In this study, the average reading time was 128.5s when the pathologists were unaided; when aided by Paige BLN, time to read a WSI was reduced by 55%, to an average of 58.3 s. Benign slides in particular saw a decrease in average reading times from 160.4 s down to 81.2 s. Given that in clinical practice most BLN tend to be benign, an efficiency gain measured in seconds and minutes in our study could aggregate with case volume to saving hours for pathologists in a busy practice—time that can be devoted to better attention to challenging cases or other value-added clinical activities. In this study, the negative predictive value of Paige BLN was 94.9%, demonstrating that a high level of trust can be placed on a benign prediction by Paige BLN in practical, everyday use. Although the sensitivity is not as high for ITCs, given that the presence of ITCs does not affect the nodal status for TNM staging, it justifies the cutoff point selection for Paige BLN in the tradeoff between sensitivity and specificity. Therefore, pathologists can review BLN WSIs with greater efficiency when a WSI presents no obvious regions of concern and the output of Paige BLN is negative, knowing that the probabilities of missing a metastasis of any size are minimal.

Steinert et al8 described an increase in diagnostic sensitivity when pathologists were aided by AI in the detection of micrometastases. In our study, 2 of the 3 pathologists experienced an increase in sensitivity from the 1970s to the 1990s in the detection of metastases of any size (P≤0.006). However, one of the readers did not experience an improvement in sensitivity. This reader also spent the longest time reviewing the average slide (212 s unassisted, compared with 97.2 s and 76.3 s for the other 2 pathologists). This highlights the fact that time spent reading a slide can affect the detection rate of metastases, which was one of the key points of the CAMELYON 16 grand challenge.6 In this challenge, pathologists with time constraints performed much worse than pathologists without time constraints. The pathologists in our exercise were timed, but they did not have predefined time limits. However, the demands of daily clinical practice impose time constraints in real life, that are likely to affect the diagnostic sensitivity of busy pathologists. This issue is particularly relevant given the significant (17.5%) decrease in practicing pathologists in recent years.13 In this context is where AI assistance can help pathologists not only to increase their detection rate, but also to sign out their busy caseloads with greater celerity and accuracy.

As previously reported by our group on the use of AI in pathology,14 the intended use of Paige BLN is as an adjunct to the pathologist, and does not replace pathological assessment. Paige BLN is best deployed like other ancillary techniques used today, where their outputs are contextualized and interpreted by the pathologist, much like IHC results are interpreted to rule out benign mimickers of metastasis (such as Mullerian inclusions, breast epithelial inclusions, and nevi). In the interplay between the human and the machine, as indicated in our study, Paige BLN helps pathologists improve their diagnostic performance. Thus, the combined system created by the pathologist and diagnostic AI achieves greater diagnostic efficiency and accuracy than either alone. For that reason, the critical question to ask when evaluating AI tools in pathology is not the standalone performance of AI, but rather how does it interact with pathologists to improve overall, combined clinical performance.

Our study has several limitations. One of them is that only 3 reading pathologists participated in the study. To better understand how AI may influence the performance of pathologists, similar studies should be conducted with a larger number of readers. In addition, most of the cases in this study corresponded to treatment naïve ductal carcinomas (NOS). Only few cases harbored metastases from lobular carcinoma or after neoadjuvant treatment. Although Paige BLN had optimal sensitivity in the detection of metastases in these situations in our study, the numbers were too low and additional work should take place to better understand how the AI will cope in these scenarios. Our study is centered around the detection of breast cancer metastases to BLN, which is the intended use of Paige BLN. The performance of Paige BLN in the detection of other pathologies, like lymphoproliferative or granulomatous diseases, although clinically relevant, was beyond the scope of this study. Similarly, the AI performance in the classification of benign mimickers, like benign epithelial inclusions, capsular nevi, or foreign material reactions remains to be determined.

In summary, our study improves over and adds context to prior studies in the field,7,8 confirming that BLN review by pathologists can be enhanced, and AI assistance is a valuable tool for this improvement. The use of Paige BLN produces a significant reduction in slide review times for both benign and malignant cases and can help pathologists detect challenging metastases. Paige BLN may be a useful complement to pathologists in the clinically necessary, but arduous, review of BLN. Further studies with Paige BLN are warranted to determine whether the accuracy gains we observed improve clinical staging and therapy selection, and thus potentially clinical outcomes.

Footnotes

J.A.R., E.G., A.B., M.H., A.P., K.M., J.S., B.R., G.O., T.J.F., and D.S.K. are employees and shareholders of Paige. This work was funded by Paige, and received no specific grant from any funding agency in the public or not-for-profit sectors.

Conflicts of Interest and Source of Funding: The authors have disclosed that they have no significant relationships with, or financial interest in, any commercial companies pertaining to this article.

Contributor Information

Juan Antonio Retamero, Email: juan.retamero@paige.ai.

Emre Gulturk, Email: emre.gulturk@paige.ai.

Alican Bozkurt, Email: alican.bozkurt@paige.ai.

Sandy Liu, Email: sandy.liu@trinityhealthofne.org.

Maria Gorgan, Email: maria.gorgan@trinityhealthofne.org.

Luis Moral, Email: luis.moral@trinityhealthofne.org.

Margaret Horton, Email: margaret.horton@paige.ai.

Andrea Parke, Email: andrea.parke@paige.ai.

Kasper Malfroid, Email: kasper.malfroid@paige.ai.

Jill Sue, Email: jillian.sue@paige.ai.

Brandon Rothrock, Email: brandon.rothrock@paige.ai.

Gerard Oakley, Email: joe.oakley@paige.ai.

George DeMuth, Email: gdemuth@statonellc.com.

Ewan Millar, Email: ewan.millar@paige.ai.

Thomas J. Fuchs, Email: thomas.fuchs@paige.ai.

David S. Klimstra, Email: david.klimstra@paige.ai.

REFERENCES

  • 1.http://cancerstatisticscenter.cancer.org/ American Cancer Society | Cancer Facts & Statistics. American Cancer Society | Cancer Facts & Statistics. Accessed March 30, 2023.
  • 2. Amin MB, Edge SB, Greene FL, et al. AJCC Cancer Staging Manual, Vol. 1024. Springer; 2017. [Google Scholar]
  • 3. Alderson MR, Hamlin I, Staunton MD. The relative significance of prognostic factors in breast carcinoma. Br J Cancer. 1971;25:646–656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. van Diest PJ, van Deurzen CHM, Cserni G. Pathology issues related to sn procedures and increased detection of micrometastases and isolated tumor cells. Breast Dis. 2010;31:65–81. [DOI] [PubMed] [Google Scholar]
  • 5. Vestjens JHMJ, Pepels MJ, de Boer M, et al. Relevant impact of central pathology review on nodal classification in individual breast cancer patients. Ann Oncol. 2012;23:2561–2566. [DOI] [PubMed] [Google Scholar]
  • 6. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318:2199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Challa B, Tahir M, Hu Y, et al. Artificial intelligence-aided diagnosis of breast cancer lymph node metastasis on histologic slides in a digital workflow. Mod Pathol. 2023;36:100216. [DOI] [PubMed] [Google Scholar]
  • 8. Steiner DF, MacDonald R, Liu Y, et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am J Surg Pathol. 2018;42:1636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25:1301–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. US Food and Drug Administration . Breakthrough Devices Program [Internet]. FDA; 2023. Accessed January 31, 2024. https://www.fda.gov/medical-devices/how-study-and-market-your-device/breakthrough-devices-program [Google Scholar]
  • 11. de Mascarel I, MacGrogan G, Debled M, et al. Distinction between isolated tumor cells and micrometastases in breast cancer: is it reliable and useful? Cancer Interdiscip Int J Am Cancer Soc. 2008;112:1672–1678. [DOI] [PubMed] [Google Scholar]
  • 12. Bandi P, Geessink O, Manson Q, et al. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE Trans Med Imaging. 2019;38:550–560. [DOI] [PubMed] [Google Scholar]
  • 13. Metter DM, Colgan TJ, Leung ST, et al. Trends in the US and Canadian pathologist workforces from 2007 to 2017. JAMA Netw Open. 2019;2:e194337–e194337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Raciti P, Sue J, Retamero JA, et al. Clinical validation of artificial intelligence–augmented pathology diagnosis demonstrates significant gains in diagnostic accuracy in prostate cancer detection. Arch Pathol Lab Med. 2023;147:1178–1185. [DOI] [PubMed] [Google Scholar]

Articles from The American Journal of Surgical Pathology are provided here courtesy of Wolters Kluwer Health

RESOURCES