Skip to main content
Sage Choice logoLink to Sage Choice
. 2025 May 4;104(11):1192–1201. doi: 10.1177/00220345251329042

Preclinical Evaluation of an Interactive Image Search System of Oral Pathology

RR Herdiantoputri 1,2, D Komura 2,, M Ochi 2, Y Fukawa 1, K Oba 3, M Tsuchiya 4, Y Kikuchi 4, Y Matsuyama 3, T Ushiku 5, T Ikeda 1, S Ishikawa 2,6,
PMCID: PMC12426326  PMID: 40320652

Abstract

The limited number of specialists and diseases’ long-tail distribution create challenges in diagnosing oral tumors. Health care facilities with sole practicing pathologists face difficulties when encountering the rare cases. Such specialists may lack prior exposure to uncommon presentations, needing external reference materials to formulate accurate diagnoses. An image search or content-based image retrieval (CBIR) system may help diagnose rare tumors by providing histologically similar reference images, thus reducing the pathologists’ workload. However, the effectiveness of CBIR systems in aiding pathologists’ diagnoses through interactive use has not been evaluated. We conducted a remote evaluation in a near-clinical environment using Luigi-Oral, an interactive patch-based CBIR system that uses deep learning to diagnose oral tumors. The database comprised 54,676 image patches at multiple magnifications from 603 cases across 85 oral tumor categories. We recruited 15 general pathologists and 13 oral pathologists with varied experience to evaluate 10 retrospective test cases from 2 institutions using this dedicated system. At top-1 and top-3 differential diagnoses, the overall diagnostic accuracy among the 2 groups was significantly higher with Luigi-Oral than without (12.05% and 21.61% increase, P = 0.002 and P < 0.001, respectively). Improvements were more evident for tumor cases in which the category was underrepresented in the database, benefiting novice and experienced pathologists. Misdiagnoses using Luigi-Oral could be due to inappropriate query input, poor retrieval performance in cases with a rare morphologic type, the difficulty of diagnosis without elaborate clinical information, or the system’s inability to retrieve accurate categories with convincing images. This study proves the clinical usability of an interactive CBIR system and highlights areas for improvement to ensure adequate assistance for pathologists, which potentially reduces pathologists’ workload and provides accessible specialist-level histopathology diagnosis.

Keywords: machine learning, digital pathology, computer-assisted diagnosis, rare tumor, oral tumor, content-based image retrieval

Introduction

Many specific oral tumor types meet the World Health Organization’s (WHO’s) rare cancer definition, with <6 incidence per 100,000 population per year (Hendra et al. 2020; Peraza et al. 2020; Bokhari and Greene 2025). Due to this inherent rarity, diverse category and histologic variations (Appendix Table 1; Appendix Fig. 1), and multifactor considerations, such as tumor location (Wilson et al. 2018), diagnosing oral tumors poses significant challenges, particularly in regions with limited specialists. Practice is further hindered by limited access to databases and evolving diagnostic criteria and testing methods, increasing pathologists’ workload and risking fatigue, delays, and potentially fatal outcomes (Fitzpatrick and Migliorati 2025). An artificial intelligence (AI)–based tool could aid diagnosis and reduce pathologists’ workloads (Retamero et al. 2024). Several tools have been proposed to aid oral pathology diagnosis (Giraldo-Roldan et al. 2023; Cai et al. 2024; Zayed et al. 2024). However, the support is suitable only for limited oral tumor categories.

One possible application for multiple tumor categories is an interactive image search system, technically termed content-based image retrieval (CBIR). In pathology, 2 types of CBIR systems are used: whole-slide image (WSI) based and patch based. WSI-based systems use entire slides as queries, providing contextual slide-level information for image retrieval, while patch-based systems allow selection of specific regions of interest. In morphologically variable areas such as the head and neck, patch-based systems offer advantages by focusing on pertinent areas and excluding irrelevant regions compared with end-to-end WSI-level CBIR approaches (Chen et al. 2022; Li et al. 2023; Shafique et al. 2024; Alfasly et al. 2025). Moreover, pathologists can capture query images using digital microscopes or smartphones with the patch-based system, eliminating the need for expensive WSI scanners, thereby enhancing accessibility to advanced diagnostic tools even in small hospitals and health care facilities in developing countries (Kiehl 2022). By democratizing access to advanced diagnostic capabilities, patch-based CBIR offers a cost-effective solution for improving health care worldwide.

Implementing patch-based CBIR systems involves human intervention for the appropriate selection of query images and review of the retrieved results. A pathologist’s assessment of high positivity for a category may not align with the highest calculated similarity. Therefore, clinical usefulness cannot be judged solely by retrieval accuracy with preselected query patches. Although studies have reported pathologists’ positive experiences with CBIR algorithms regarding histological similarity (Schaer et al. 2019; Chen et al. 2022; Hashimoto et al. 2023), practical usability, potential pitfalls, and areas for system improvement in preclinical settings remain underexplored. By presenting relevant image references, CBIR potentially enhances diagnosis workflow, mitigates fatigue due to excessive workload, and reduces misdiagnosis. Therefore, our study aimed to evaluate the CBIR system’s practical usability in diagnosis accuracy and time and determine factors affecting pathologists’ decisions when using the system.

Here, we introduce Luigi-Oral, a web-based CBIR system for oral tumor histopathology, covering >80% of oral tumor categories in the WHO classification (El-Naggar et al. 2017). This broad coverage includes most cases encountered in oral pathology, including rare types, such as odontogenic and salivary gland tumors (Mendenhall et al. 2011; Marin et al. 2021; Alsanie et al. 2022; Gazendam et al. 2023). Using Luigi-Oral allowed us to explore the CBIR system’s effectiveness in enhancing the diagnostic workflow and addressing the challenges associated with rare and diverse oral tumors.

Methods

Diagnosed histopathology slides of oral tumors were collected from Tokyo Medical and Dental University Hospital between January 2001 and September 2022 (approval No. D2019-087). The slides were scanned and tumor areas were annotated by a pathology resident and verified by board-certified oral pathologists. Square image patches were extracted from tumor areas, resulting in a database of 54,677 patches from 605 cases across 85 oral tumor categories. The Luigi-Oral model was trained using a noncontrastive self-supervised learning method, TiCo (Zhu et al. 2022), on a subset of the database images, employing ResNet-18 as the backbone. The trained model computed image representations were stored as a retrievable database. To retrieve similar images, query images were encoded using the same model, followed by a nearest-neighbor search based on cosine similarity to identify the top-k most similar database images. A web-based user interface (https://luigi-pathology.com) was developed for query image upload and result observation. The training process and retrieval performance of the Luigi-Oral feature extractor model are discussed in prior study (Herdiantoputri et al. 2024).

A remote multicase–multireader evaluation was conducted with 10 test cases from different categories, collected from Teikyo University Hospital’s 20182023 repository (approval No. 23-054). Thirty pathologists who finished usage training were recruited, and 28 followed through: 15 general (7 junior and 8 senior) and 13 oral (4 junior and 9 senior) pathologists from Japan, Indonesia, India, and Rwanda. Each evaluator completed practice tests (Appendix Figs. 2 and 3) before receiving online WSIs and basic patient information (sex, age, and tumor location) of the test cases to diagnose within 5 min per case. Five cases were diagnosed without Luigi-Oral, allowing WHO book reference if needed. The remaining 5 cases could be diagnosed using Luigi-Oral’s assistance, the WHO book, or both. Case order was randomized for each participant using a stratified permuted block randomization with a block size of 2.

Detailed methodology, including development, evaluation, statistical analysis, and code availability, can be found in the appendix. The methodology summary is presented in Figure 1.

Figure 1.

Figure 1.

Summary of the study method. The workflow consists of the training of feature encoder, database construction, query case collection and evaluation by pathologists. In the evaluation, each case was presented to the evaluators through an online whole-slide image (WSI) viewer and assigned to either without Luigi-Oral (without the assistance of Luigi-Oral) or with Luigi-Oral (Luigi-Oral usage is allowed) for evaluation. Optionally, the evaluators could use textbook references before rising up to 3 differential diagnoses. Image created with BioRender.

Results

Evaluators and Cases

From September 12, 2023, to January 26, 2024, 246 valid answers were collected, comprising 123 answers when Luigi-Oral was allowed and 123 cases when it was not. When Luigi-Oral usage was prohibited, the evaluators were assisted by Luigi-Oral for 9 (7.3%) cases by mistake. These answers were therefore excluded from the analyses. Finally, 123 cases with Luigi-Oral and 114 cases without Luigi-Oral were included in the analyses. Evaluator distribution on each case is presented in Appendix Table 3.

Diagnostic Accuracy

Analyzed using the intention-to-treat approach adapted for the diagnosis assistive tool, Luigi-Oral usage significantly improved evaluator accuracy for top-1 and top-3 differential diagnoses. For top-1 diagnoses, we observed a 12.05% increase in accuracy (P = 0.002), whereas for top-3 diagnoses, the improvement was 21.61% (P < 0.001) (Table 1). Accuracy distribution among different pathologist groups revealed specific patterns. For top-1 accuracy, significant differences were found between senior and junior general pathologists (P = 0.042) and also between senior oral pathologists and junior general pathologists (P = 0.017). On top-3 accuracy, we found no significant differences between junior general pathologists and other groups.

Table 1.

Diagnostic Accuracy with and without Luigi-Oral at Top-1 and Top-3 Differential Diagnoses and the Respective Odds Ratios. a

Accuracy
With Luigi-Oral (%) Without Luigi-Oral (%) Odds Ratio (95% CI)
Top-1** 72/123 (58.5) 53/114 (46.5) 2.44 (1.36–4.35)
Top-3*** 87/123 (70.7) 56/114 (49.1) 4.72 (2.47–9.04)
a

The Wald test was used to assess statistical differences in accuracy. Sample sizes were n = 123 (with Luigi-Oral) and n = 114 (without Luigi-Oral) independent patient samples for each variable. P values were 2 sided. Odds ratios and confidence intervals (CIs) were calculated using logistic regression with generalized estimating equations, adjusting for confounding factors, such as profession, experience level, and case numbers in the database.

**

P < 0.01; ***P < 0.001.

At the case level, Luigi-Oral improved evaluator accuracy in 8 of 10 cases (Table 2). Underrepresented categories in the database showed significant improvements (Appendix Table 1). The accuracy of the oncocytoma case increased by 33.34% (P = 0.041) for top-1 and by 33.33% (P = 0.014) for top-3 diagnosis. The accuracy of the chondroblastic osteosarcoma case increased even more substantially, with a 45.45% increase (P = 0.009) for top-1 and a 63.63% increase (P < 0.001) for top-3 diagnosis. Significant improvements were also observed in the secretory carcinoma case for top-3 diagnosis, with a 34.62% increase (P = 0.049), and in the adenomatoid odontogenic tumor case for top-3 diagnosis, with a 41.66% increase (P = 0.020). These categories may have been more rarely encountered by pathologists in clinical practice. However, categories represented by >10 cases in the database, likely more commonly encountered, were diagnosed with higher accuracy without Luigi-Oral.

Table 2.

Case-Level Diagnostic Accuracy with and without Luigi-Oral at Top-1 and Top-3 Differential Diagnoses and the Respective Odds Ratios. a

Accuracy
With Luigi-Oral (%) Without Luigi-Oral (%) Odds Ratio (95% CI)
Odontogenic myxoma/myxofibroma Top-1 9/9 (100) 10/11 (90.9) 1.09 (0.92–1.30)
Top-3 9/9 (100) 10/11 (90.9) 1.09 (0.92–1.30)
Basal cell adenoma Top-1 8/12 (66.7) 4/9 (44.4) 1.25 (0.82–1.90)
Top-3 9/12 (75.0) 4/9 (44.4) 1.36 (0.902.04)
Lateral periodontal cyst Top-1 0/14 (0.0) 0/10 (0.0)
Top-3 2/14 (14.3) 0/10 (0.0) 1.15 (0.96–1.39)
Oncocytoma Top-1* 11/12 (91.7) 7/12 (58.3) 1.40 (1.01–1.92)
Top-3* 12/12 (100) 8/12 (66.7) 1.40 (1.07–1.82)
Fibrous dysplasia Top-1 6/12 (50.0) 8/12 (66.7) 0.85 (0.58–1.23)
Top-3 10/15 (66.7) 11/15 (73.3) 0.94 (0.66–1.32)
Odontogenic keratocyst Top-1 11/15 (73.3) 7/9 (77.8) 0.96 (0.67–1.36)
Top-3 12/15 (80.0) 7/9 (77.8) 1.02 (0.73–1.43)
Chondroblastic osteosarcoma Top-1** 6/11 (54.5) 1/11 (9.1) 1.57 (1.12–2.21)
Top-3*** 8/11 (72.7) 1/11 (9.1) 1.89 (1.38–2.58)
Secretory carcinoma Top-1 4/12 (33.3) 2/13 (15.4) 1.20 (0.86–0.66)
Top-3* 6/12 (50.0) 2/13 (15.4) 1.41 (1.00–2.00)
Adenoid cystic carcinoma Top-1 13/14 (82.9) 11/12 (91.7) 1.01 (0.82–1.25)
Top-3 14/14 (100) 11/12 (91.7) 1.09 (0.93–1.28)
Adenomatoid odontogenic tumor Top-1 4/12 (33.3) 1/12 (8.3) 1.28 (0.94–1.75)
Top-3* 7/12 (58.3) 2/12 (16.7) 1.52 (1.07–2.15)
a

The Wald test was used to assess statistical differences in accuracy. Sample sizes were n = 123 (with Luigi-Oral) and n = 114 (without Luigi-Oral) independent patient samples for each variable. P values were 2 sided. Odds ratios and confidence intervals (CIs) were calculated using logistic regression with generalized estimating equations, without considering confounding factors.

*

P < 0.05; **P < 0.01; ***P < 0.001.

Luigi-Oral usage also significantly improved evaluator accuracy for top-1 and top-3 differential diagnoses in all cases on a per-protocol approach (Appendix Table 4). Significant improvements were observed across the same set of cases, except for the secretory carcinoma case at the top-3.

At the evaluators’ category level, evaluator accuracy of junior general pathologists was significantly higher with Luigi-Oral at top-3 differential diagnoses (27.64% increase; P < 0.001). The same was observed for the accuracy of junior oral pathologists at the top-3 differential diagnoses (33.75% increase; P < 0.001). The accuracy of senior oral pathologists was significantly higher with Luigi-Oral at the top-1 (21.79% increase; P = 0.001) and top-3 (22.67% increase; P = 0.007) differential diagnoses (Fig. 2A). Different categories of evaluators showed higher accuracy without Luigi-Oral on different categories of tumors, likely related to their clinical experiences (Fig. 2B).

Figure 2.

Figure 2.

Breakdown of the evaluators’ category-level accuracy. (A) Diagnostic accuracy at the top-1 and top-3 differential diagnoses for each evaluator’s category. (B) Detailed diagnostic accuracy for each evaluation case at the top-1 and top-3 differential diagnoses. (C) Confusion matrix of the evaluators’ answers at the top-1 differential diagnoses. Color represents the difference in frequency that each category selected at the top-1 differential diagnoses when Luigi-Oral was allowed and when it was not. Red indicates categories more frequently selected when Luigi-Oral was allowed (w/), whereas blue indicates categories more frequently selected when it was not (w/o). The Wald test was used to compare the statistical differences in accuracy. Sample sizes were n = 123 (w/ Luigi-Oral) and n = 114 (w/o Luigi-Oral) independent patient samples for each variable. P values were 2 sided. Odds ratios and confidence intervals were calculated using logistic regression with generalized estimating equations, adjusting for confounding factors, such as case numbers in the database. w/, with; w/o, without; 95% CI, 95% confidence intervals; DDx, differential diagnoses. **P < 0.01; ***P < 0.001.

The frequency of the evaluators’ answer choices at the top-1 is plotted in Figure 2C and Appendix Figure 4A and B. Several cases were confused with other tumor categories even with Luigi-Oral (e.g., the lateral periodontal cyst case was mostly confused with glandular odontogenic cyst among other odontogenic cysts and tumors), and the fibrous dysplasia case was confused mostly with other fibro-osseous diseases. The secretory carcinoma case was confused with cystadenoma, salivary duct carcinoma, and metastasizing ameloblastoma, which may have a similar morphology. However, without Luigi-Oral, the secretory carcinoma case was confused with cemento-ossifying fibroma, which bears no morphological resemblance (Appendix Fig. 4C).

The composition of the retrieved image results across various cases is intriguing. Top-10 accuracy, majority-10, and top-10 %query rates for the fibrous dysplasia case were 44%, 30%, and 90%, respectively, which were still higher than the categories it was confused with (10%, 20%, and 50% for ossifying fibroma and 12%, 10%, and 80% for cemento-osseous dysplasia). Queries for the chondroblastic osteosarcoma case resulted in a 0% majority-10 due to a single case in the database but achieved a high top-10 %query rate of 69.23%. The diagnostic accuracy of top-1 moderately correlated with majority-10 (r = 0.332), top-10 retrieval accuracy (r = 0.437), and top-10 %query (r = 0.346). However, it correlated weakly with the number of total cases in the database (r = 0.125) (Fig. 3A–D).

Figure 3.

Figure 3.

Correlations between diagnostic accuracy, retrieval accuracy, and misdiagnosis. (A–D) The correlations between retrieved result metrics and evaluator accuracy were calculated with Pearson correlations and showed positive correlations (n = 10). (E) Possible causes of misdiagnosis in each case (n = 31). (F) Two sets of queries uploaded by the evaluators for the odontogenic keratocyst case did not include the clear epithelial lining of the cyst, unlike the database images representing this tumor category. (G) The time to diagnosis for all answers compared with the time needed to diagnose accurately. (H) Time to accurate diagnosis of each evaluator category. The lower and upper hinges correspond to the 25th and 75th percentiles, respectively; the upper whisker extends from the hinge to the largest value no further than 1.5× the interquartile range (IQR) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5× the IQR of the hinge. (I) Comparison of query image sets from 2 cases that include mineralized tissue with the top-3 retrieved images by Luigi-Oral. %query, percentage of the query with at least 1 image belonging to the accurate diagnosis; r, Pearson’s correlation coefficient; 95% CI, 95% confidence interval.

Evaluators consulted Luigi-Oral for 97 (78.9%) cases when allowed. The remaining answers were 14 by senior oral pathologists, 5 by junior oral pathologists, 5 by junior general pathologists, and 2 by senior general pathologists (Appendix Fig. 5A). Although usage was allowed, Luigi-Oral was not used in 35.90% of diagnoses by senior oral pathologists, 29.41% by junior oral pathologists, 15.15% by junior general pathologists, and 5.88% by senior general pathologists, suggesting senior oral pathologists rely less on Luigi-Oral for their diagnosis. Regardless, some senior oral pathologists still accurately diagnosed the odontogenic keratocyst case and the adenoid cystic carcinoma case (Appendix Fig. 5B). The comparison of the diagnosis accuracy using the per-protocol approach is shown in Appendix Table 4.

Misdiagnosis

As many as 31 answers consulted Luigi-Oral but did not include the accurate diagnosis within the 3 differential diagnoses (11 by junior general pathologists, 9 by senior general pathologists, 4 by junior oral pathologists, and 7 by senior oral pathologists). The following reasons potentially caused misdiagnoses: (1) unsuitable query image(s) uploaded, resulting in low retrieval performance; (2) low retrieval performance despite appropriate query image(s); (3) Luigi-Oral results did not convince the evaluators to decide on the accurate diagnosis, leading to mistrust; or (4) indeterminate reasons (Fig. 3E).

From 31 misdiagnoses when Luigi-Oral was used, we found 6 misdiagnoses in which query images were unsuitable (3 on different cases; Fig. 3F). Most unsuitable queries included nontissue areas and cropped at an exceedingly low magnification. We also found 2 misdiagnoses for which query images were of nonrepresentative tumor areas, on the odontogenic keratocyst case, and 10 misdiagnoses for which retrieval performance was poor. Potential mistrust occurred in the adenomatoid odontogenic tumor, fibrous dysplasia, and secretory carcinoma cases. For the other misdiagnoses (11/31), the reasons were unclear. In most instances, the category’s highest retrieval rank for the correct diagnosis was lower than the submitted top-1 differential diagnosis’ retrieval rank (8/11) (Appendix Table 5).

Diagnosis Time

Twenty-six answers were assisted by Luigi-Oral and the WHO book, 71 by Luigi-Oral only, 91 without any reference, and 49 by the WHO book only. The median answering time with Luigi-Oral only was lower (196 s) than using the WHO book (265.5 s). Considering only the correct diagnosis at the top-1 differential diagnoses, 15 answers were assisted by both, 39 by Luigi-Oral only, 47 without any reference, and 24 by the WHO book. The median time spent answering accurately was lower (187 s) with the assistance of Luigi-Oral than with the WHO book (227 s) (Fig. 3G). Evaluators uploaded an average of 2.19 images to Luigi-Oral per case. Experience level and familiarity with oral tumor cases may influence the diagnosis time. The junior general pathologist group consistently exhibited higher median times than senior general pathologists and junior oral pathologists across different methods. When Luigi-Oral was used, senior oral pathologists exhibited higher median times than other groups did (Fig. 3H).

Discussion

Many AI systems, including CBIR, are designed for collaborative use with health care professionals, making it crucial to assess their performance in real-world collaborative scenarios (Sakamoto et al. 2022; Wataya et al. 2022; Raciti et al. 2023; Singh et al. 2023; Yasaka et al. 2023). By using oral tumors in our evaluation, our study is the first to explore an interactive patch-based CBIR system as an assistant for diagnosing rare tumors, where CBIR is most applicable. It is more appropriate than traditional supervised classification when training samples are limited, as the latter risks overfitting. CBIR also facilitates critical human decision making through interpretable visual similarity matches. Because no similar AI tools exist for this application, the study evaluated diagnostic workflows with and without Luigi-Oral instead of comparing it to another system. By testing its use among pathologists with varying levels of expertise and experience, we provide insights into both the tool’s effectiveness and its potential impact on decision-making across diverse professional backgrounds.

Studying rare tumors poses challenges; therefore, we used a carefully curated set of representative cases. Although the sample size was limited owing to the rarity of these conditions, we selected cases from diverse parent categories to ensure a comprehensive evaluation. Recognizing that general pathologists may have limited exposure to such rare cases, we considered the lasting impressions these cases often leave. Therefore, instead of sequential diagnosis with a washout period, as is common in computer-aided diagnosis research (Steiner et al. 2020; Ba et al. 2022; Wu et al. 2024), we used random assignments for each pathologist.

We allowed pathologists full discretion in their system usage when it was made accessible. This simulates realistic clinical settings in which familiarity with certain cases may negate the need for assistance. Our approach aimed to understand how often the system would be used voluntarily, which cases would prompt its use, and how evaluators would interact with it in practice.

Our results revealed that the interactive CBIR system improved diagnostic accuracy within a limited time frame, especially in pinpointing differential diagnoses to morphologically similar categories. The system requires human intervention, such as selecting diagnostically relevant areas and making final diagnoses based on presented similar cases. Therefore, CBIR retrieval performance and pathologists’ knowledge and experience markedly influence diagnostic accuracy. Despite this complexity, a notable improvement in diagnostic accuracy was observed across pathologists with varying experience and specialties, including senior oral pathologists and junior general pathologists. Senior oral pathologists likely benefited from their nuanced histopathological knowledge aided by image references from Luigi-Oral in making final differential diagnosis decisions. Consistent with the findings that AI tools benefit less experienced medical professionals (Wataya et al. 2022; Yasaka et al. 2023), junior general pathologists in our study were more receptive to Luigi-Oral’s suggestions and used them to select their answers.

Despite the limited training time on the system, we observed a lower median time to diagnosis when using Luigi-Oral’s assistance compared with using the WHO book alone, even after excluding misdiagnosed cases. This indicates the potential of the CBIR system to become an efficient reference tool by serving as a preliminary search method before conducting a more thorough analysis using textbooks and expert consultations. In clinical practice, pathologists may initially use CBIR for preliminary diagnosis and subsequently confirm findings with the WHO book to increase certainty. The recent introduction of a web-based version of the WHO book suggests the potential for complete digitalization, such as linking CBIR results directly to relevant sections in digital reference books to improve diagnostic efficiency.

Misdiagnosis with the CBIR system may stem from mistrust of retrieved cases, inappropriate query inputs, the CBIR algorithm’s low retrieval accuracy, or indeterminate reasons. Based on these findings, several critical factors must be addressed while developing an enhanced CBIR system to maximize benefits and minimize misdiagnoses. Retrieved images alone may not suffice to instill pathologists with the confidence required for final diagnoses. In diagnostic pathology, pathologists frequently rely on comprehensive clinical, diagnostic, and radiologic data as well as histology (Van Der Laak et al. 2021). By enhancing this process, the CBIR system could incorporate additional information regarding retrieved cases, such as histologic features or tumor categories, functioning akin to a comprehensive digital atlas. Integrating links to references and recommending ancillary tests would further aid investigations, providing interpretable results and valuable diagnostic guidance. Suggestions and clinical guidance for follow-up tests are accessible for research purposes on the Luigi-Oral website following this evaluation.

To effectively employ the CBIR system, users should receive comprehensive training on its interface, behavior, and database characteristics. This knowledge is essential for effectively selecting regions of interest, interpreting results, and extracting meaningful information. As users gain proficiency with the system, they can operate it more efficiently, thereby enhancing overall diagnostic accuracy.

The CBIR model’s performance require rigorous testing to accurately capture complex histological features (Dehkharghanian et al. 2023; Tommasino et al. 2023; Shang et al. 2024). Although retrieval performance correlated with diagnostic accuracy, decisions were not solely based on the retrieved cases. Even with low top-10 accuracy, a single accurate image substantially improved diagnostic accuracy, as observed in the chondroblastic osteosarcoma case. Nonetheless, challenges persist for lateral periodontal cysts and secretory carcinomas, of which retrieval performance remains poor due to their similarity to other types or rare variants. This underscores the importance of a database with diverse morphological representations, rather than just numerous similar cases of the same tumor type.

The system encountered difficulties differentiating mineralized tissue components, likely because the feature extractor prioritized high-level image similarity over semantic similarity (Fig. 3I). Addressing these limitations requires continuous improvements of the feature extraction model (Hameed et al. 2021). State-of-the-art models trained on histopathological images, along with advanced vision–language foundation models for histology (Kang et al. 2023; Chen et al. 2024), show promise in enhancing retrieval accuracy and semantic understanding, potentially improving diagnostic precision and efficiency (Ignatov et al. 2024; Lu et al. 2024).

The limitation of this study is that it may be subject to evaluator bias regarding AI-based diagnosis systems. In addition, the small sample size and single-read design differ from standard multireader, multicase studies, potentially limiting generalizability. However, as a pioneering pilot study, our findings offer valuable insights, with statistically significant results in some cases indicating a strong effect warranting further investigation. Larger clinical trials, involving more cases and diverse tumor types across varied medical institutions and geographical regions, are necessary. While patch-based CBIR systems are advantageous where WSI scanners are unavailable or processing large image data remains impractical, this study used WSI screenshots for remote evaluation. User-uploaded smartphone images enable quick uploads but require robust models to handle lower quality and color variations. Therefore, future exploration of this input method for resource-limited settings is necessary.

Overall, this study proves that an interactive CBIR system is useful to improve the diagnosis performance of pathologists with varied backgrounds and experience levels. Some areas should be improved to ensure safety since this system is not intended to replace qualified pathologists. With model performance improvement, supporting information, and adequate user training, this system potentially reduces pathologists’ workload and helps provide accessible specialist-level histopathology diagnosis.

Author Contributions

R.R. Herdiantoputri, contributed to conception, design, data acquisition, analysis, and interpretation, drafted and critically revised the manuscript; D. Komura, S. Ishikawa, contributed to conception, design, data analysis, and interpretation, drafted and critically revised the manuscript; M. Ochi, contributed to conception, design, drafted and critically revised the manuscript; Y. Fukawa, M. Tsuchiya, Y. Kikuchi, T. Ushiku, contributed to data acquisition, critically revised the manuscript; K. Oba, Y. Matsuyama, contributed to data analysis, critically revised the manuscript; T. Ikeda, contributed to data acquisition, analysis, and interpretation, critically revised the manuscript. All authors gave final approval and agree to be accountable for all aspects of the work.

Supplemental Material

sj-docx-1-jdr-10.1177_00220345251329042 – Supplemental material for Preclinical Evaluation of an Interactive Image Search System of Oral Pathology

Supplemental material, sj-docx-1-jdr-10.1177_00220345251329042 for Preclinical Evaluation of an Interactive Image Search System of Oral Pathology by R.R. Herdiantoputri, D. Komura, M. Ochi, Y. Fukawa, K. Oba, M. Tsuchiya, Y. Kikuchi, Y. Matsuyama, T. Ushiku, T. Ikeda and S. Ishikawa in Journal of Dental Research

Acknowledgments

We acknowledge and thank our evaluators and colleagues who helped during the recruitment: Drs. or Profs. Ayataka Ishikawa, Felix Manirakiza, Agoeng Tjahajani, Pretty Trisfilha, Janti Sudiono, Tetsuya Kitamura, Athira KP, Katsutoshi Hirose, Shingo Sakashita, Yuko Kinowaki, Kaori Shima, Mana Ideyama, Naomi Yada, Yuki Kato, Shrija G., Ryohei Kuroda, Piyush Asnani, Yu Usami, and other contributors who chose to remain anonymous.

In memory of Dr. Kou Kayamori, whose contributions and dedication greatly advanced this research. He will be deeply missed

We thank Enago (www.enago.jp) for the English language review

Footnotes

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the AMED Practical Research for Innovative Cancer Control under grant number JP 24ck0106873 and JP 24ck0106904 to S.I., JSPS KAKENHI Grant-in-Aid for Scientific Research (S) under grant number 2H04990 to S.I., and JSPS KAKENHI Grant-in-Aid for Scientific Research (B) under grant number 21H03836 to D.K.

Data Availability: The raw data used in this study are not publicly available to preserve participant privacy. The data generated and analyzed during this study are available from the corresponding author upon reasonable request.

A supplemental appendix to this article is available online.

References

  1. Alfasly S, Alabtah G, Hemati S, Kalari KR, Garcia JJ, Tizhoosh HR. 2025. Validation of histopathology foundation models through whole slide image retrieval. Sci Rep. 15(1):3990. doi: 10.1038/s41598-025-88545-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alsanie I, Rajab S, Cottom H, Adegun O, Agarwal R, Jay A, Graham L, James J, Barrett AW, Van Heerden W, et al. 2022. Distribution and frequency of salivary gland tumours: an international multicenter study. Head Neck Pathol. 16(4):1043–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ba W, Wang S, Shang M, Zhang Z, Wu H, Yu C, Xing R, Wang W, Wang L, Liu C, et al. 2022. Assessment of deep learning assistance for the pathological diagnosis of gastric cancer. Mod Pathol. 35(9):1262–1268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bokhari MR, Greene J. 2025. Pleomorphic Adenoma. In: StatPearls. Treasure Island (FL): StatPearls Publishing; [updated 2023 Jul 4; accessed 2025 Feb 14]. http://www.ncbi.nlm.nih.gov/books/NBK430829/. [Google Scholar]
  5. Cai X, Zhang H, Wang Y, Zhang J, Li T. 2024. Digital pathology-based artificial intelligence models for differential diagnosis and prognosis of sporadic odontogenic keratocysts. Int J Oral Sci. 16(1):16. doi: 10.1038/s41368-024-00287-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen C, Lu MY, Williamson DFK, Chen TY, Schaumberg AJ, Mahmood F. 2022. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat Biomed Eng. 6(12):1420–1434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Song AH, Chen B, Zhang A, Shao D, Shaban M, et al. 2024. Towards a general-purpose foundation model for computational pathology. Nat Med. 30(3):850–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dehkharghanian T, Bidgoli AA, Riasatian A, Mazaheri P, Campbell CJV, Pantanowitz L, Tizhoosh HR, Rahnamayan S. 2023. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn Pathol. 18(1):67. doi: 10.1186/s13000-023-01355-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. El-Naggar AK, Chan JKC, Grandis JR, Takata T, Grandis J, Slootweg PJ. editors. 2017. WHO classification of head and neck tumours. 4th ed. Lyon (France): International Agency for Research on Cancer; Geneva (Switzerland): World Health Organization. [Google Scholar]
  10. Fitzpatrick SG, Migliorati CA. 2025. The challenge of evidence-based practice in oral diagnostic sciences. Oral Surg Oral Med Oral Pathol Oral Radiol. 139(1):1–4. [DOI] [PubMed] [Google Scholar]
  11. Gazendam A, Popovic S, Parasu N, Ghert M. 2023. Chondrosarcoma: a clinical review. J Clin Med. 12(7):2506. doi: 10.3390/jcm12072506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Giraldo-Roldan D, Ribeiro ECC, Araújo ALD, Penafort PVM, Silva VMD, Câmara J, Pontes HAR, Martins MD, Oliveira MC, Santos-Silva AR, et al. 2023. Deep learning applied to the histopathological diagnosis of ameloblastomas and ameloblastic carcinomas. J Oral Pathol Med. 52(10):988–995. [DOI] [PubMed] [Google Scholar]
  13. Hameed IM, Abdulhussain SH, Mahmmod BM. 2021. Content-based image retrieval: a review of recent trends. Cogent Eng. 8(1):1927469. doi: 10.1080/23311916.2021.1927469 [DOI] [Google Scholar]
  14. Hashimoto N, Takagi Y, Masuda H, Miyoshi H, Kohno K, Nagaishi M, Sato K, Takeuchi M, Furuta T, Kawamoto K, et al. 2023. Case-based similar image retrieval for weakly annotated large histopathological images of malignant lymphoma using deep metric learning. Med Image Anal. 85:102752. doi: 10.1016/j.media.2023.102752 [DOI] [PubMed] [Google Scholar]
  15. Hendra FN, Van Cann EM, Helder MN, Ruslin M, De Visscher JG, Forouzanfar T, De Vet HCW. 2020. Global incidence and profile of ameloblastoma: a systematic review and meta-analysis. Oral Dis. 26(1):12–21. [DOI] [PubMed] [Google Scholar]
  16. Herdiantoputri RR, Komura D, Ochi M, Fukawa Y, Kayamori K, Tsuchiya M, Kikuchi Y, Ushiku T, Ikeda T, Ishikawa S. 2024. Benchmarking deep learning-based image retrieval of oral tumor histology. Cureus. 16(6):e62264. doi: 10.7759/cureus.62264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ignatov A, Yates J, Boeva V. 2024. Histopathological image classification with cell morphology aware deep neural networks. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); June 17–18, 2024; Seattle, WA. New York (NY): Institute of Electrical and Electronics Engineers (IEEE). p 6913–6925. [Google Scholar]
  18. Kang M, Song H, Park S, Yoo D, Pereira S. 2023. Benchmarking self-supervised learning on diverse pathology datasets. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); June 17–24, 2023; Vancouver, BC, Canada. New York (NY): Institute of Electrical and Electronics Engineers (IEEE); [accessed 2023 Oct 25]. p 3344–3354. https://ieeexplore.ieee.org/document/10204656/. [Google Scholar]
  19. Kiehl T-R. 2022. Digital and computational pathology: a specialty reimagined. In: Ehsani S, Glauner P, Plugmann P, Thieringer FM, editors. The future circle of healthcare. Cham (Switzerland): Springer International Publishing; [accessed 2023 Oct 2]. p 227–250. https://link.springer.com/10.1007/978-3-030-99838-7_12. [Google Scholar]
  20. Li S, Zhao Y, Zhang J, Yu T, Zhang Ji, Gao Y. 2023. High-order correlation-guided slide-level histology retrieval with self-supervised hashing. IEEE Trans Pattern Anal Mach Intell. 45(9):11008–11023. [DOI] [PubMed] [Google Scholar]
  21. Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, et al. 2024. A visual-language foundation model for computational pathology. Nat Med. 30(3):863–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Marin C, Dave M, Hunter KD. 2021. Malignant odontogenic tumours: a systematic review of cases reported in literature. Front Oral Health. 2:775707. doi: 10.3389/froh.2021.775707 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mendenhall WM, Fernandes R, Werning JW, Vaysberg M, Malyapa RS, Mendenhall NP. 2011. Head and neck osteosarcoma. Am J Otolaryngol. 32(6):597–600. [DOI] [PubMed] [Google Scholar]
  24. Peraza A, Gómez R, Beltran J, Amarista FJ. 2020. Mucoepidermoid carcinoma. An update and review of the literature. J Stomatol Oral Maxillofac Surg. 121(6):713–720. [DOI] [PubMed] [Google Scholar]
  25. Raciti P, Sue J, Retamero JA, Ceballos R, Godrich R, Kunz JD, Casson A, Thiagarajan D, Ebrahimzadeh Z, Viret J, et al. 2023. Clinical validation of artificial intelligence–augmented pathology diagnosis demonstrates significant gains in diagnostic accuracy in prostate cancer detection. Arch Pathol Lab Med. 147(10):1178–1185. [DOI] [PubMed] [Google Scholar]
  26. Retamero JA, Gulturk E, Bozkurt A, Liu S, Gorgan M, Moral L, Horton M, Parke A, Malfroid K, Sue J, et al. 2024. Artificial intelligence helps pathologists increase diagnostic accuracy and efficiency in the detection of breast cancer lymph node metastases. Am J Surg Pathol. 48(7):846–854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Sakamoto T, Furukawa T, Pham HHN, Kuroda K, Tabata K, Kashima Y, Okoshi EN, Morimoto S, Bychkov A, Fukuoka J. 2022. A collaborative workflow between pathologists and deep learning for the evaluation of tumour cellularity in lung adenocarcinoma. Histopathology. 81(6):758–769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Schaer R, Otálora S, Jimenez-del-Toro O, Atzori M, Müller H. 2019. Deep learning-based retrieval system for gigapixel histopathology cases and the open access literature. J Pathol Inform. 10:19. doi: 10.4103/jpi.jpi_88_18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Shafique A, Gonzalez R, Pantanowitz L, Tan PH, Machado A, Cree IA, Tizhoosh HR. 2024. A preliminary investigation into search and matching for tumor discrimination in World Health Organization breast taxonomy using deep networks. Mod Pathol. 37(2):100381. doi: 10.1016/j.modpat.2023.100381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Shang HH, Nasr MS, Veerla JP, Saurav JR, Hajighasemi A, Malidarreh P, Huber M, Moleta C, Makker J, Luber JM. 2024. Histopathology slide indexing and search—are we there yet? N Engl J Med AI. 1(5):AIcs2300019. doi: 10.1056/AIcs2300019 [DOI] [Google Scholar]
  31. Singh A, Randive S, Breggia A, Ahmad B, Christman R, Amal S. 2023. Enhancing prostate cancer diagnosis with a novel artificial intelligence-based web application: synergizing deep learning models, multimodal data, and insights from usability study with pathologists. Cancers (Basel). 15(23):5659. doi: 10.3390/cancers15235659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Steiner DF, Nagpal K, Sayres R, Foote DJ, Wedin BD, Pearce A, Cai CJ, Winter SR, Symonds M, Yatziv L, et al. 2020. Evaluation of the use of combined artificial intelligence and pathologist assessment to review and grade prostate biopsies. JAMA Netw Open. 3(11):e2023267. doi: 10.1001/jamanetworkopen.2020.23267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Tommasino C, Merolla F, Russo C, Staibano S, Rinaldi AM. 2023. Histopathological image deep feature representation for CBIR in smart PACS. J Digit Imaging. 36(5):2194–2209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Van Der Laak J, Litjens G, Ciompi F. 2021. Deep learning in histopathology: the path to the clinic. Nat Med. 27(5):775–784. [DOI] [PubMed] [Google Scholar]
  35. Wataya T, Yanagawa M, Tsubamoto M, Sato T, Nishigaki D, Kita K, Yamagata K, Suzuki Y, Hata A, Kido S, et al. 2022. Radiologists with and without deep learning–based computer-aided diagnosis: comparison of performance and interobserver agreement for characterizing and diagnosing pulmonary nodules/masses. Eur Radiol. 33(1):348–359. [DOI] [PubMed] [Google Scholar]
  36. Wilson ML, Fleming KA, Kuti MA, Looi LM, Lago N, Ru K. 2018. Access to pathology and laboratory medicine services: a crucial gap. Lancet. 391(10133):1927–1938. [DOI] [PubMed] [Google Scholar]
  37. Wu S, Wang Y, Hong G, Luo Y, Lin Z, Shen R, Zeng H, Xu A, Wu P, Xiao M, et al. 2024. An artificial intelligence model for detecting pathological lymph node metastasis in prostate cancer using whole slide images: a retrospective, multicentre, diagnostic study. EClinicalMedicine. 71:102580. doi: 10.1016/j.eclinm.2024.102580 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Yasaka K, Hatano S, Mizuki M, Okimoto N, Kubo T, Shibata E, Watadani T, Abe O. 2023. Effects of deep learning on radiologists’ and radiology residents’ performance in identifying esophageal cancer on CT. Br J Radiol. 96(1150):20220685. doi: 10.1259/bjr.20220685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zayed SO, Abd-Rabou RYM, Abdelhameed GM, Abdelhamid Y, Khairy K, Abulnoor BA, Ibrahim SH, Khaled H. 2024. The innovation of AI-based software in oral diseases: clinical-histopathological correlation diagnostic accuracy primary study. BMC Oral Health. 24(1):598. doi: 10.1186/s12903-024-04347-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zhu J, Moraes RM, Karakulak S, Sobol V, Canziani A, LeCun Y. 2022. TiCo: transformation invariance and covariance contrast for self-supervised visual representation learning. arXiv [Preprint]. doi: 10.48550/arXiv.2206.10698 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-docx-1-jdr-10.1177_00220345251329042 – Supplemental material for Preclinical Evaluation of an Interactive Image Search System of Oral Pathology

Supplemental material, sj-docx-1-jdr-10.1177_00220345251329042 for Preclinical Evaluation of an Interactive Image Search System of Oral Pathology by R.R. Herdiantoputri, D. Komura, M. Ochi, Y. Fukawa, K. Oba, M. Tsuchiya, Y. Kikuchi, Y. Matsuyama, T. Ushiku, T. Ikeda and S. Ishikawa in Journal of Dental Research


Articles from Journal of Dental Research are provided here courtesy of SAGE Publications

RESOURCES