Abstract
Background
Artificial intelligence (AI) has achieved good performance in image recognition, including identifying cancer cells in pathology images. However, the best mode of AI assistance in diagnostic pathology remains to be explored.
Methods
We compared the influence of two deep-learning assistance modes on pathologists in cancer identification. Ten board-certified pathologists classified 60 cases of nasopharyngeal biopsy as carcinoma or benign with or without AI assistance, which was either a heatmap of cancer probability accompanying a whole-slide image (AI-heatmap mode) or ten high-power field images with the highest cancer probability (AI-HPF mode).
Results
Both assisting modes significantly accelerated the diagnostic process, lowered the subjective difficulty, and maintained high accuracy compared to the unassisted mode. Notably, the acceleration of diagnosis was more significant in AI-HPF mode than in AI-heatmap mode (time reduction: 35.1% vs. 28.1%; P = 0.040), especially for benign cases (time reduction: 49.4% vs. 32.9%; P = 0.0000072). For benign cases, an increased area proportion of false-positive AI prediction slowed down the diagnostic process in AI-heatmap mode (P = 0.00000084) but not in AI-HPF mode (P = 0.62).
Conclusions
We show for the first time that an AI-HPF assistance mode was superior to the commonly used AI-heatmap mode in accelerating cancer identification by pathologists. In our scenario, the AI-HPF mode maintained high diagnostic accuracy and was robust to the influence of false-positive AI prediction. The potential risk caused by AI assistance is also discussed.
Keywords: Deep learning, Artificial intelligence, Assistance, Pathologic diagnosis, Cancer
Background
Integration of artificial intelligence (AI) and digital pathology is revolutionizing the field of pathology and transforming the diagnostic workflow. Modern digital pathology systems provide high-resolution images of glass slides (digital slides) for primary diagnosis with high diagnostic concordance between digital and glass slides [1, 2]. Digitization brings multiple advantages, such as remote access to cases, easy data archiving and retrieval, and online consultation with experts beyond geographic boundaries. Most importantly, the feasibility of computer analysis of digital images enables the integration of artificial intelligence into pathology systems. Recently, deep learning models have achieved good performance in image analysis, such as cancer identification and subtyping [3–8], cell counting [9–11], cell morphometry [12], and tumor microenvironment analysis [13, 14]. However, most of such studies made the models play the diagnostic role without integration into clinical workflow.
One prominent trend in the integration of AI and digital pathology is the concept of utilizing AI as a “copilot” [15, 16]. Instead of training an AI model to perform perfect diagnosis, an AI model integrated into a viewing platform can assist pathologists by providing useful suggestions, such as the probability of malignancy, suspicious area highlights, and quantitative measurements. Such assistance can augment pathologists’ expertise and reduce potential human error. Most previous studies integrated AI models into pathology systems by providing an AI-predicted heatmap alongside the whole-slide image (AI-heatmap mode). Such heatmaps can guide pathologists to focus on the most suspicious areas of a whole-slide image, thus increasing diagnostic efficiency and accuracy. Previous studies have proved that pathologists can identify nodal metastasis of breast cancer [17] or gastric cancer [18] with higher sensitivity and shorter review time when AI-predicted heatmaps were provided.
Previously, a strategy of selecting diagnostic regions of interest (ROIs) for review has been employed in cytopathologic diagnosis to improve diagnostic efficiency without compromising accuracy [19, 20]. However, the impact of this approach on histopathologic diagnosis remains largely unexplored. Adopting a similar concept, we used an AI model to select ten non-overlapping high-power fields (HPFs) with the highest cancer probability, and a pathologist can make a diagnosis based on these ten HPFs instead of a whole-slide image (AI-HPF mode). Theoretically, the diagnostic process might be accelerated by reducing the total area of slide review, but the exact impact on diagnostic accuracy and efficiency remains to be explored.
In this study, ten board-certified pathologists reviewed 60 cases of nasopharyngeal biopsy (30 carcinoma cases and 30 benign cases) using three different modes, including the unassisted mode, AI-heatmap mode, and AI-HPF mode. The diagnostic accuracy, efficiency, and subjective difficulty among different review modes were analyzed.
Methods
Case selection
A total of 70 cases of nasopharyngeal biopsies, including 35 cases of nasopharyngeal carcinoma and 35 cases of benign nasopharyngeal tissue, were retrieved from the archives of three branches (Linkou, Kaohsiung, and Chiayi) of Chang Gung Memorial Hospital, Taiwan. For each case, one slide of hematoxylin and eosin (H&E) stain was used. For all cases, the diagnosis was confirmed by two senior pathologists (W.-Y.C. and C.H.), who were excluded from the participants in subsequent diagnostic experiments. Whole-slide high-resolution digital images were produced using a NanoZoomer S360 digital slide scanner (Hamamatsu Photonics, Hamamatsu, Japan) with a 40x objective mode. The average size ± standard deviation of a whole-slide image was 44.8 ± 6.3 mm and 19.0 ± 4.0 mm in width and height, respectively. The average tissue area ± standard deviation of a benign and carcinoma case was 37.8 ± 23.0 mm2 and 41.1 ± 27.3 mm2, respectively. This study has been approved by the Institutional Review Board of Chang Gung Medical Foundation (IRB No. 202000956B0).
Image preparation for diagnostic experiments
There were three review modes, including the unassisted mode, AI-heatmap mode, and AI-HPF mode. For the unassisted mode, only the original whole-slide image of each case was needed. For the AI-heatmap mode, a colored cancer probability map of each case was generated by a well-trained nasopharyngeal carcinoma-detecting AI model from our previous study [5]. Our model was developed with an independent dataset of 726 cases using ResNeXt and achieved a patch-level area under the receiver operating curve (AUC) of 0.99 in the testing set [5]. The patch size was 256 × 256 pixels. For the AI-HPF mode, ten non-overlapping HPFs with the highest probability of cancer cells were sequentially selected by our AI model [5] and cropped from each whole-slide image. Each HPF image had the size of a 40x objective lens view of a standard microscope (0.238 mm2). Of note, neighboring areas of tissue tend to have similar morphologic features, resulting in similar predicted cancer probabilities. To avoid the concentration of selected HPFs, a Gaussian kernel with four times the radius of an HPF was applied to reweight the probability map after each selection.
Participants in diagnostic experiments
Ten board-certified pathologists from four branches (Linkou, Kaohsiung, Chiayi, and Keelung) of Chang Gung Memorial Hospital participated in the diagnostic experiments. Among them, six practiced in a medical center (Linkou or Kaohsiung), and the other four in a regional hospital (Chiayi or Keelung). None of the ten pathologists performed the original diagnosis or case selection in this study. The experience of practice after board certification ranged from 2 to 29 years, with a median of 15.5 years. None of these participants used a digital pathology platform for routine primary diagnosis before this study.
Design and procedure of diagnostic experiments
A randomized complete block design was adopted (Fig. 1). Each participant underwent three separate reading sessions, each including a training phase (10 cases) and a main phase (60 cases). The participants were free to operate during the training phase to acquaint themselves with the platform. In the main phase, the 60 cases were divided into three blocks (each with 20 cases) of different review modes, including unassisted mode, AI-heatmap mode, and AI-HPF mode. Participants were instructed to take a short break between blocks, and the order of review modes was randomized to avoid potential biases. To minimize memorization of cases, a washout period of at least two weeks with full-time clinical practice was implemented between sessions.
Fig. 1.
Design and procedure of the diagnostic experiments. Participants were randomly assigned to a random combination of review mode sequences and underwent three sessions. Each session started with a training phase followed by three review modes, and the case order was randomized. All 60 cases were reviewed with three different review modes during the three sessions
All images were displayed using aetherSlide (aetherAI, Taipei, Taiwan), a web-based digital pathology image viewer. Examples of our three review modes are shown in Fig. 2. Only the original whole-slide image (H&E stain) was provided for diagnosis in the unassisted mode. The AI-heatmap mode overlayed a switchable cancer probability heatmap on the whole-slide image (Fig. 2A), guiding the pathologist to focus on suspicious areas (Fig. 2B). For the unassisted and AI-heatmap modes, participants were free to select a diagnosis (carcinoma or benign) whenever they felt confident. The AI-HPF mode required the participants to view sequentially ten HPFs with the highest cancer probability (without viewing the whole-slide image). Hotkeys were used to show the next or previous HPF for unlimited times. Participants can only select a diagnosis after viewing the last HPF. For all review modes, a subjective difficulty score ranging from 1 (easiest) to 10 (hardest) must be selected after selecting a diagnosis. The review time was calculated as the time from complete image loading to a diagnosis confirmed by pressing a button.
Fig. 2.
Examples of three review modes. An example case of unassisted mode (A; left panel) and AI-heatmap mode (A; right panel), with a dotted-line rectangle encompassing suspicious areas. A pathologist can focus on suspicious areas (B) to look for carcinoma cells. The ten most suspicious high-power field (HPF) areas of a case were sequentially examined by a pathologist in AI-HPF mode (C)
Statistical analysis
The review time, diagnostic accuracy, and subjective difficulty were measured across three review modes and analyzed using mixed-effect models. Generalized linear mixed-effect models (GLME) were employed, with participants and cases as random effects and review modes (unassisted, AI-heatmap, and AI-HPF) and case types (benign and carcinoma) as fixed effects. To evaluate the interaction between fixed effects, both a full model and an additive model were fitted independently and compared using the ANOVA function in R. Multiple comparisons were conducted using either the simple main effect or the TukeyHSD [21], depending on the significance of the interaction between fixed effects. All statistical analyses were calculated using R (version 4.1.3) (http://www.r-project.org/). Specifically, the GLME models were fitted using ‘lme4’ and ‘lmerTest’, and multiple comparisons were conducted using ‘multcomp’ in R.
Results
Diagnostic accuracy
The average diagnostic accuracy in unassisted, AI-heatmap, and AI-HPF modes was 97.6%, 97.4%, and 98.0%, respectively. The details of diagnostic accuracy in different review modes and case types are listed in Table 1. To evaluate the impact of review modes and case types on diagnostic accuracy, we fitted both full and additive models to examine the interaction between variables. The review mode and the case type had no significant influence on diagnostic accuracy, and no significant interaction was found between these variables. Post hoc analysis also showed no significant difference in diagnostic accuracy between review modes or case types (Table 2; Fig. 3A). Fig. 4 shows the diagnostic accuracy of all participating pathologists across different review modes. No significant difference in diagnostic accuracy was found among participants.
Table 1.
Results of review time, diagnostic accuracy, and subjective difficulty in different review modes
| Review Mode | Case Type | Review Time (seconds) | Diagnostic Accuracy | Subjective Difficulty |
|---|---|---|---|---|
| Unassisted | Benign (n = 30) | 93.5 ± 54.6 | 99.0% ± 1.7% | 3.99 ± 0.83 |
| Carcinoma (n = 30) | 60.1 ± 23.6 | 96.3% ± 6.6% | 3.79 ± 1.35 | |
| All (n = 60) | 76.6 ± 37.2 | 97.6% ± 3.2% | 3.89 ± 1.03 | |
| AI-heatmap | Benign (n = 30) | 61.5 ± 26.3 | 98.3% ± 2.9% | 3.43 ± 0.82 |
| Carcinoma (n = 30) | 48.1 ± 17.8 | 96.6% ± 4.9% | 3.37 ± 1.52 | |
| All (n = 60) | 54.7 ± 17.8 | 97.4% ± 3.8% | 3.40 ± 1.13 | |
| AI-HPF | Benign (n = 30) | 46.7 ± 30.6 | 98.3% ± 3.2% | 3.31 ± 1.43 |
| Carcinoma (n = 30) | 52.9 ± 40.0 | 97.6% ± 2.3% | 3.24 ± 1.21 | |
| All (n = 60) | 49.8 ± 35.0 | 98.0% ± 1.3% | 3.27 ± 1.18 |
The numbers are mean ± standard deviation
Table 2.
Post hoc comparison of diagnostic accuracy
| Comparison | Mean Difference ± Standard Deviation | P-value |
|---|---|---|
| Review Mode (n = 60) | ||
| AI-heatmap - Unassisted | -8.6% ± 39.4% | 1.0 |
| AI-HPF - Unassisted | 12.3% ± 41.5% | 1.0 |
| AI-HPF - AI-heatmap | 20.9% ± 40.9% | 1.0 |
| Case Type (n = 60) | ||
| Carcinoma - Benign | -87.4% ± 52.4% | 0.095 |
Fig. 3.
Comparison of diagnostic accuracy (A), review time (B), and subjective difficulty (C) between review modes and case types. 95% CI, 95% confidence interval; *P < 0.05; **P < 0.005; ***P < 0.0005
Fig. 4.
Diagnostic accuracy of each participant using each review mode
Diagnostic efficiency
The average review time per case in unassisted, AI-heatmap, and AI-HPF modes was 76.6, 54.7, and 49.8 s, respectively. The details of review time in different review modes and case types are listed in Table 1. The ANOVA result showed that review mode (unassisted, AI-heatmap, and AI-HPF; P = 0.00042) and case type (benign and carcinoma; P < 2.2 × 10− 16) had significant impacts on review time, with a significant interaction between these two variables (P = 2.3 × 10− 12). Post hoc analysis (Table 3) showed that the diagnostic process of both assisting modes was significantly faster than that of the unassisted mode (21.5 s faster, P = 8.9 × 10− 16 for AI-heatmap mode; 26.5 s faster, P < 2.0 × 10− 16 for AI-HPF mode). Notably, the acceleration of diagnosis was more significant in AI-HPF mode than in AI-heatmap mode (time reduction: 35.1% vs. 28.1%; P = 0.040).
Table 3.
Post hoc comparison of review Time, including subgroup analysis
| Comparison | Mean Difference ± Standard Deviation (seconds) |
P-value |
|---|---|---|
| Review Mode | ||
| AI-heatmap - Unassisted | -21.5 ± 2.6 | 8.9 × 10− 16 *** |
| AI-HPF - Unassisted | -26.9 ± 2.6 | < 2.0 × 10− 16 *** |
| AI-HPF - AI-heatmap | -5.4 ± 2.6 | 0.040 * |
| Case Type | ||
| Carcinoma - Benign | -13.4 ± 3.6 | 0.00020 *** |
| For benign cases | ||
| AI-heatmap - Unassisted | -30.8 ± 3.4 | < 2.0 × 10− 16 *** |
| AI-HPF - Unassisted | -46.2 ± 3.4 | < 2.0 × 10− 16 *** |
| AI-HPF - AI-heatmap | -15.4 ± 3.4 | 7.2 × 10− 6 *** |
| For carcinoma cases | ||
| AI-heatmap - Unassisted | -12.2 ± 3.7 | 0.0034 ** |
| AI-HPF - Unassisted | -7.6 ± 3.8 | 0.080 |
| AI-HPF - AI-heatmap | 4.5 ± 3.4 | 0.23 |
| For unassisted mode | ||
| Carcinoma - Benign | -32.5 ± 6.0 | 6.8 × 10− 8 *** |
| For AI-heatmap mode | ||
| Carcinoma - Benign | -13.7 ± 5.4 | 0.010 * |
| For AI-HPF mode | ||
| Carcinoma - Benign | 6.0 ± 3.1 | 0.050 |
*P < 0.05; **P < 0.005; ***P < 0.0005
Due to the significant interaction between the review mode and the case type, a post hoc analysis based on the simple main effect was conducted to partition the two variables into subgroups for further analysis (Table 3; Fig. 3B). For benign cases, the diagnostic process of either assisting mode was drastically faster (30.8 s faster for AI-heatmap mode; 46.2 s faster for AI-HPF mode) than that of the unassisted mode (P < 2.0 × 10− 16 for either assisting mode). The acceleration of diagnosis for benign cases was much more significant in AI-HPF mode than in AI-heatmap mode (time reduction: 49.4% vs. 32.9%; P = 0.0000072). For carcinoma cases, the AI-heatmap mode had a significantly shorter review time (12.2 s shorter) compared to the unassisted mode (P = 0.0034), whereas the AI-HPF mode had a clear trend of shorter review time (7.6 s shorter) compared to the unassisted mode (P = 0.080). There was no significant difference in review time for carcinoma cases between the two assisting modes (P = 0.23). Regarding the case type, the diagnostic process of carcinoma cases was significantly faster than that of benign cases in the unassisted mode and the AI-heatmap mode (32.5 s faster, P = 6.8 × 10− 8 for unassisted mode; 13.7 s faster, P = 0.010 for AI-heatmap mode). There was a clear trend of shorter review time for carcinoma cases compared to benign cases in the AI-HPF mode (6.0 s faster, P = 0.050).
Influence of model prediction on diagnostic efficiency
Although AI predictions are useful to guide a pathologist to focus on suspicious areas for diagnosis, false positive predictions might confuse pathologists and undermine their performance. To evaluate the influence of model prediction on diagnostic efficiency, an area ratio of cancer/tissue predicted by the AI model was calculated for each case. Such a ratio reflects the degree of false positive prediction in each benign case. The relationship between review time and model-estimated cancer/tissue ratio is shown in Fig. 5. For benign cases, higher false positive prediction significantly correlated with longer review time in AI-heatmap mode (P = 8.4 × 10− 7) but not in AI-HPF mode (P = 0.62) or unassisted mode (P = 0.94) (Fig. 5; left panel). For carcinoma cases, a higher model-estimated cancer/tissue ratio (indicating higher tumor purity) significantly correlated with shorter review time in AI-heatmap mode (P = 0.024) but not in AI-HPF mode (P = 0.82) or unassisted mode (P = 0.35) (Fig. 5; right panel).
Fig. 5.
Influence of AI model-estimated cancer/tissue ratio on median review time in benign (left panel) and carcinoma cases (right panel). The scattered points represent the raw data. The lines were regressed by general linear models with gray areas representing their 95% confidence interval. *P < 0.05; ***P < 0.0005
Subjective difficulty
The average subjective difficulty in unassisted, AI-heatmap, and AI-HPF modes was 3.89, 3.40, and 3.27, respectively. The details of subjective difficulty in different review modes and case types are listed in Table 1. Fig. 3C shows that the subjective difficulty varied significantly across the three review modes (P = 1.3 × 10− 10), but no significant difference was found between case types (P = 0.58). There was no significant interaction between the review mode and the case type (P = 0.62). Table 4 shows that the subjective difficulty was significantly lower in both assisting modes (P = 1.3 × 10− 6 for AI-heatmap mode; P = 3.7 × 10− 9 for AI-HPF mode) compared to the unassisted mode. No significant difference in subjective difficulty was observed between the two assisting modes (P = 0.27) or between benign and carcinoma cases (P = 0.58).
Table 4.
Post hoc comparison of subjective difficulty
| Comparison | Mean Difference ± Standard Deviation | P-value |
|---|---|---|
| Review Mode (n = 60) | ||
| AI-heatmap - Unassisted | -0.49 ± 0.10 | 1.3 × 10− 6 *** |
| AI-HPF - Unassisted | -0.60 ± 0.10 | 3.7 × 10− 9 *** |
| AI-HPF - AI-heatmap | -0.11 ± 0.10 | 0.27 |
| Case Type (n = 60) | ||
| Carcinoma - Benign | -0.11 ± 0.19 | 0.58 |
***P < 0.0005
Correlation between subjective difficulty and review time
The correlation between subjective difficulty and review time is shown in Fig. 6. In all three review modes, higher subjective difficulty significantly correlated with longer review time (P < 2.0 × 10− 16 for unassisted and AI-heatmap modes; P = 0.00033 for AI-HPF mode).
Fig. 6.
Influence of subjective difficulty on review time. The lines were regressed by general linear models with gray areas representing their 95% confidence interval. ***P < 0.0005
Discussion
The diagnostic accuracy of pathologists could be lowered by increased workload. As the volume of cases rises, the time and attention a pathologist can dedicate to individual samples diminish, leading to higher cognitive load and fatigue. This situation exacerbates the risk of diagnostic errors, including misidentification of pathologic features and overlooking subtle changes. The pressure to maintain high throughput in diagnostic laboratories often forces pathologists to work under stringent time constraints, which could further impair their ability to perform detailed and careful evaluations. The repetitive nature of work may also contribute to mental exhaustion and decreased vigilance, compromising the diagnostic quality. The integration of AI assistance holds promise for mitigating these issues by improving diagnostic efficiency. AI models can rapidly analyze large volumes of data, highlight areas of concern, and assist in identifying subtle changes that might be missed by overburdened pathologists. Such support could reduce workload, enhance diagnostic accuracy, and allow pathologists to focus more on complex cases, thereby improving overall diagnostic quality.
One critical challenge in integrating AI into pathology practice is the appropriate presentation of AI-predicted results to pathologists. While AI models hold significant potential to enhance diagnostic accuracy and efficiency, direct presentation of all AI-predicted results, including false predictions, without appropriate processing could have detrimental effects. A previous study used an AI model to classify a liver tumor as hepatocellular carcinoma or cholangiocarcinoma, and the predicted result associated with a heatmap of diagnostic probability was presented to the pathologists [8]. Although accurate AI predictions increased the likelihood of correct diagnoses by 4.3 times, inaccurate predictions were associated with a threefold increase in incorrect diagnoses [8]. Similarly, decreased diagnostic performance in terms of accuracy and efficiency was observed in identifying gastric lymph node metastasis when AI provided incorrect suggestions [18], indicating that excessive noise can confuse pathologists.
Limiting viewing areas on whole-slide images by selecting specific ROIs might be a valuable approach to enhance diagnostic accuracy. A previous study examined diagnostic agreement in breast pathology by having pathologists select one or more diagnostic ROIs for each case [22]. The study found that higher diagnostic agreement with experts occurred when there was a greater overlap between the pathologists’ ROIs and the consensus ROIs selected by experts [22]. Limiting the viewing areas to most diagnostic ROIs could also improve diagnostic efficiency and reduce the effect of excessive noise that might confuse a pathologist.
In our AI-HPF mode, a well-trained cancer-detecting AI model was used to select the ten most diagnostic high-power field images in each whole-slide image. The effect of this assisting mode was compared to the traditional AI-heatmap mode and unassisted mode. Both assisting modes significantly shortened the review time, reduced subjective difficulty, and maintained high diagnostic accuracy compared to the unassisted mode. Of note, our AI-HPF mode shortened the review time by 35.1%, significantly higher than the effect of traditional AI-heatmap mode (28.1%; P = 0.040). This advantage of AI-HPF mode over AI-heatmap mode in accelerating pathologic diagnosis was even more striking in benign cases (time reduction: 49.4% vs. 32.9%; P = 0.0000072), whereas no significant difference in review time of carcinoma cases was observed between the two AI assistance modes (P = 0.23). In our scenario, limiting the viewing areas to the most diagnostic ROIs increased diagnostic efficiency better than providing a guiding heatmap.
We used the area ratio of cancer/tissue predicted by the AI model to objectively evaluate the extent of false positive prediction in benign cases. Interestingly, we found that a higher degree of false-positive AI prediction significantly slowed the diagnostic process of benign cases only in AI-heatmap mode (Fig. 5; P = 8.4 × 10− 7) but not in AI-HPF mode (P = 0.62). Unlike the traditional AI-heatmap mode, our AI-HPF mode was robust to the detrimental effect of false positive prediction on the diagnostic efficiency of benign cases.
For carcinoma cases, a higher model-estimated cancer/tissue ratio significantly accelerated the diagnostic process only in AI-heatmap mode (Fig. 5; P = 0.024) but not in AI-HPF mode (P = 0.82). Cancer detection was faster in cases with more highlighted areas in a heatmap, indicating a higher tumor purity. Such an effect of the AI-heatmap mode could compete with the advantage of the AI-HPF mode in diagnostic efficiency, resulting in no significant difference in the review time of carcinoma cases between the two assistance modes.
The baseline diagnostic accuracy in the unassisted mode was 97.6%. Given this high baseline performance, our study was statistically underpowered to detect meaningful improvements or decrements in diagnostic accuracy. Future studies focusing on diagnostic tasks with lower baseline accuracy are therefore needed to more robustly assess the potential impact of AI-assisted modes on diagnostic accuracy.
Precautions should be taken when using AI assistance as a diagnostic tool. AI models with suboptimal performance could fail to highlight cancer areas in a heatmap for the AI-heatmap mode. Such models could also fail to select the correct ROIs for the AI-HPF mode. Therefore, AI models with suboptimal performance are unsuitable for diagnostic assistance. Tumor types not represented in the training set, such as malignant lymphoma or sarcoma, may be overlooked by the AI model. Moreover, limiting the review to isolated high-power fields may be inadequate for more complex diagnostic tasks, such as those requiring architectural context.
Conclusions
Here we show for the first time that an AI-HPF assisting mode was superior to the traditional AI-heatmap mode in accelerating cancer detection by pathologists. Our AI-HPF mode maintained high diagnostic accuracy and was robust to the influence of false-positive AI prediction. The limitations of AI assistance must be considered before clinical application.
Authors’ contributions
Conceptualization, W.-Y.C., W.-H.Y. and C.-Y.Y.; methodology, W.-Y.C., W.-H.Y. and C.-Y.Y; software, W.-H.Y. and C.-Y.Y.; slide review, W.-Y.C. and C.H.; diagnostic experiment, Y.-J.L., C.-C.Huang, K.-F.L., C.-C.Hwang, L.-C.C., S.-H.U., C.-J.Y., H.-C.C., J.L. and H.-S.H.; data curation, W.-Y.C., W.-H.Y., and C.-Y.Y; formal analysis, W.-Y.C. and W.-H.Y.; resources, W.-Y.C., S.-H.C., T.-H.W., T.-C.L., C.-T.W., J.-S.Y. and C.-F.K.; writing - original draft, W.-Y.C.; writing - review and editing, W.-H.Y. and C.-Y.Y.
Funding
This study was partly supported by grants from the National Science and Technology Council, Taiwan (NSTC112-2320-B-182-054; to W.-Y. C.) and the Chang Gung Medical Foundation (CMRPG3M2051; to W.-Y. C.).
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Declarations
Ethics approval and consent to participate
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Chang Gung Medical Foundation (IRB No. 202000956B0), with a waiver of informed consent.
Consent for publication
Not applicable.
Competing interests
Chao-Yuan Yeh is the Chief Executive Officer and a co-founder of aetherAI. Wei-Hsiang Yu is a data scientist of aetherAI. aetherAI provided the web-based digital pathology image viewer software.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Schuffler PJ, Geneslaw L, Yarlagadda DVK, Hanna MG, Samboy J, Stamelos E, Vanderbilt C, Philip J, Jean MH, Corsale L, Manzo A, Paramasivam NHG, Ziegler JS, Gao J, Perin JC, Kim YS, Bhanot UK, Roehrl MHA, Ardon O, Chiang S, Giri DD, Sigel CS, Tan LK, Murray M, Virgo C, England C, Yagi Y, Sirintrapun SJ, Klimstra D, Hameed M, Reuter VE, Fuchs TJ. Integrated digital pathology at scale: A solution for clinical diagnostics and cancer research at a large academic medical center. J Am Med Inf Assoc. 2021;28:1874–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hanna MG, Reuter VE, Ardon O, Kim D, Sirintrapun SJ, Schuffler PJ, Busam KJ, Sauter JL, Brogi E, Tan LK, Xu B, Bale T, Agaram NP, Tang LH, Ellenson LH, Philip J, Corsale L, Stamelos E, Friedlander MA, Ntiamoah P, Labasin M, England C, Klimstra DS, Hameed M. Validation of a digital pathology system including remote review during the COVID-19 pandemic. Mod Pathol. 2020;33:2115–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bejnordi BE, Veta M, van Diest PJ, van Ginneken B, Karssemeijer N, Litjens G, van der Laak JAWM, Consortium C. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. J Am Med Assoc. 2017;318:2199–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, Brogi E, Reuter VE, Klimstra DS, Fuchs TJ. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25:1301–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chuang WY, Chang SH, Yu WH, Yang CK, Yeh CJ, Ueng SH, Liu YJ, Chen TD, Chen KH, Hsieh YY, Hsia Y, Wang TH, Hsueh C, Kuo CF, Yeh CY. Successful identification of nasopharyngeal carcinoma in nasopharyngeal biopsies using deep learning. Cancers. 2020;12:507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chuang WY, Chen CC, Yu WH, Yeh CJ, Chang SH, Ueng SH, Wang TH, Hsueh C, Kuo CF, Yeh CY. Identification of nodal micrometastasis in colorectal cancer using deep learning on annotation-free whole-slide images. Mod Pathol. 2021;34:1901–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chen CL, Chen CC, Yu WH, Chen SH, Chang YC, Hsu TI, Hsiao M, Yeh CY, Chen CY. An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning. Nat Commun. 2021;12:1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kiani A, Uyumazturk B, Rajpurkar P, Wang A, Gao R, Jones E, Yu Y, Langlotz CP, Ball RL, Montine TJ, Martin BA, Berry GJ, Ozawa MG, Hazard FK, Brown RA, Chen SB, Wood M, Allard LS, Ylagan L, Ng AY, Shen J. Impact of a deep learning assistant on the histopathologic classification of liver cancer. NPJ Digit Med. 2020;3:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nateghi R, Danyali H, Helfroush MS. A deep learning approach for mitosis detection: application in tumor proliferation prediction from whole slide images. Artif Intell Med. 2021;114:102048. [DOI] [PubMed] [Google Scholar]
- 10.Fulawka L, Blaszczyk J, Tabakov M, Halon A. Assessment of Ki-67 proliferation index with deep learning in DCIS (ductal carcinoma in situ). Sci Rep. 2022;12:3166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Naik N, Madani A, Esteva A, Keskar NS, Press MF, Ruderman D, Agus DB, Socher R. Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains. Nat Commun. 2020;11:5727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chuang WY, Yu WH, Lee YC, Zhang QY, Chang H, Shih LY, Yeh CJ, Lin SM, Chang SH, Ueng SH, Wang TH, Hsueh C, Kuo CF, Chuang SS, Yeh CY. Deep learning-based nuclear morphometry reveals an independent prognostic factor in mantle cell lymphoma. Am J Pathol. 2022;192:1763–78. [DOI] [PubMed] [Google Scholar]
- 13.Diao JA, Wang JK, Chui WF, Mountain V, Gullapally SC, Srinivasan R, Mitchell RN, Glass B, Hoffman S, Rao SK, Maheshwari C, Lahiri A, Prakash A, McLoughlin R, Kerner JK, Resnick MB, Montalto MC, Khosla A, Wapinski IN, Beck AH, Elliott HL, Taylor-Weiner A. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat Commun. 2021;12:1613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jiao Y, Li J, Qian C, Fei S. Deep learning-based tumor microenvironment analysis in colon adenocarcinoma histopathological whole-slide images. Comput Methods Programs Biomed. 2021;204:106047. [DOI] [PubMed] [Google Scholar]
- 15.Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z. A survey of large language models. arXiv. 2023. 10.48550/arXiv.2303.18223.
- 16.OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I et al. GPT-4 technical report. arXiv 2023. 10.48550/arXiv.2303.08774.
- 17.Steiner DF, MacDonald R, Liu Y, Truszkowski P, Hipp JD, Gammage C, Thng F, Peng L, Stumpe MC. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am J Surg Pathol. 2018;42:1636–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Huang SC, Chen CC, Lan J, Hsieh TY, Chuang HC, Chien MY, Ou TS, Chen KH, Wu RC, Liu YJ, Cheng CT, Huang YJ, Tao LW, Hwu AF, Lin IC, Hung SH, Yeh CY, Chen TC. Deep neural network trained on Gigapixel images improves lymph node metastasis detection in clinical settings. Nat Commun. 2022;13:3347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Murphy KM, Weatherhead K, Chenault C, Nguyen C, Sefcik K, Harrington S, Johnson K, Lemeshev Y. Impact of the genius digital diagnostics system on workflow and accuracy compared with the ThinPrep imaging system for review of ThinPrep Papanicolaou tests. Am J Clin Pathol. 2025;164:746–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dov D, Kovalsky SZ, Feng Q, Assaad S, Cohen J, Bell J, Henao R, Carin L, Range DE. Use of machine learning-based software for the screening of thyroid cytopathology whole slide images. Arch Pathol Lab Med. 2022;146:872–8. [DOI] [PubMed] [Google Scholar]
- 21.Haynes W. Tukey’s Test. Encyclopedia of Systems Biology. Edited by Dubitzky W, Wolkenhauer O, Cho KH, Yokota H. New York, NY: Springer, 2013. pp. 2303-4.
- 22.Nagarkar DB, Mercan E, Weaver DL, Brunye TT, Carney PA, Rendi MH, Beck AH, Frederick PD, Shapiro LG, Elmore JG. Region of interest identification and diagnostic agreement in breast pathology. Mod Pathol. 2016;29:1004–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.






