Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2024 Apr 23;110(8):5034–5046. doi: 10.1097/JS9.0000000000001469

Diagnostic accuracy of artificial intelligence assisted clinical imaging in the detection of oral potentially malignant disorders and oral cancer: a systematic review and meta-analysis

JingWen Li a, Wai Ying Kot b, Colman Patrick McGrath c, Bik Wan Amy Chan e, Joshua Wing Kei Ho d,f, Li Wu Zheng a,*
PMCID: PMC11325952  PMID: 38652301

Abstract

Background:

The objective of this study is to examine the application of artificial intelligence (AI) algorithms in detecting oral potentially malignant disorders (OPMD) and oral cancerous lesions, and to evaluate the accuracy variations among different imaging tools employed in these diagnostic processes.

Materials and methods:

A systematic search was conducted in four databases: Embase, Web of Science, PubMed, and Scopus. The inclusion criteria included studies using machine learning algorithms to provide diagnostic information on specific oral lesions, prospective or retrospective design, and inclusion of OPMD. Sensitivity and specificity analyses were also required. Forest plots were generated to display overall diagnostic odds ratio (DOR), sensitivity, specificity, negative predictive values, and summary receiver operating characteristic (SROC) curves. Meta-regression analysis was conducted to examine potential differences among different imaging tools.

Results:

The overall DOR for AI-based screening of OPMD and oral mucosal cancerous lesions from normal mucosa was 68.438 (95% CI= [39.484–118.623], I 2=86%). The area under the SROC curve was 0.938, indicating excellent diagnostic performance. AI-assisted screening showed a sensitivity of 89.9% (95% CI= [0.866–0.925]; I 2=81%), specificity of 89.2% (95% CI= [0.851–0.922], I 2=79%), and a high negative predictive value of 89.5% (95% CI= [0.851–0.927], I 2=96%). Meta-regression analysis revealed no significant difference among the three image tools. After generating a GOSH plot, the DOR was calculated to be 49.30, and the area under the SROC curve was 0.877. Additionally, sensitivity, specificity, and negative predictive value were 90.5% (95% CI [0.873–0.929], I 2=4%), 87.0% (95% CI [0.813–0.912], I 2=49%) and 90.1% (95% CI [0.860–0.931], I 2=57%), respectively. Subgroup analysis showed that clinical photography had the highest diagnostic accuracy.

Conclusions:

AI-based detection using clinical photography shows a high DOR and is easily accessible in the current era with billions of phone subscribers globally. This indicates that there is significant potential for AI to enhance the diagnostic capabilities of general practitioners to the level of specialists by utilizing clinical photographs, without the need for expensive specialized imaging equipment.

Keywords: artificial intelligence, diagnostic test, oral cancer, oral potentially malignant disorders

Introduction

Highlights

  • The accurate detection of oral potentially malignant disorders and oral cancerous lesions is crucial for early diagnosis and effective treatment.

  • Artificial intelligence (AI)-based detection using clinical photography exhibits a high diagnostic odds ratio, this is particularly significant for low-income countries and middle-income countries that face challenges in accessing specialist care and adequate healthcare services.

  • AI algorithms have demonstrated their capability to enhance the early detection of oral lesions, underscoring the significance of further exploration and integration of AI technologies in the realm of oral healthcare.

Oral cancer is a common malignancy worldwide, with over 377 000 cases diagnosed and 177 000 deaths per year1. According to Surveillance, Epidemiology, and End Results (SEER) Program data, the 5-year survival rate of oral cancer decreases as the disease progresses, from 86.3% in the localized stage to 39.3% in the distant stage2. The diagnosis of oral cancer is sometimes preceded by oral potentially malignant disorders (OPMD), which is a significant group of disorders that affect the oral mucosa with an increased risk of developing malignancy3. Timely diagnosis of OPMD enable clinicians to closely monitor the progression of the disease, and help with the early detection and intervention of malignant transformation4. However, since many OPMD and early malignancies are asymptomatic and subtle, the diagnosis of oral cancer is often delayed, resulting in late presentation and poor prognosis.

The conventional screening method for OPMD and oral cancer is through visual examination under direct light and palpation of the oral cavity, followed by a biopsy and histopathological examination for definitive diagnosis5. Although the conventional oral examination has shown to be effective in a limited resource setting6, the visual examination and palpation of the oral cavity are subjective to the judgment of the clinician and dependent on the experience of the clinician7,8. Meanwhile, the biopsy and histopathological examination are invasive and technique sensitive, requiring extra laboratory work and fees for accurate diagnosis. Several noninvasive (NI) imaging techniques have recently been developed and used as adjuncts in the screening of OPMD and oral cancer, including autofluorescence, optical coherence tomography (OCT), and clinical photography. However, large-scale studies demonstrating their accuracy in clinical practice are still lacking9. Thus, an accurate, objective and NI method is clearly needed for screening and early diagnosis of OPMD and oral cancer.

Over the past decade, artificial intelligence (AI), including the recently launched ChatGPT, has sparked significant anticipation regarding its potential value within the field of health sciences. This emerging technique has been increasingly used in the medical field and has demonstrated clinically feasible and accurate performance in predicting, screening, and diagnosing cancerous and precancerous lesions in various organs1014. The increased usage of AI in the oncological field implies its possible usage in the oral cavity, suggesting its potential in screening OPMD and oral cancer. By using various algorithms, AI is able to perform image recognition, data mining, and deep learning, which can effectively solve the big data processing challenges in the medical field13. For the detection and diagnosis of OPMD and oral cancer, AI has demonstrated unique advantages in processing oral images acquired by various types of devices, and can be independent of clinician subjectivity compared with clinician’s diagnosis15,16. The use of AI in OPMD and oral cancer may assist clinicians in clinical judgment and decision making, reduce diagnostic errors, and ultimately improve overall treatment outcomes. Current studies demonstrated the usage of AI in the detection and classification of oral cancer1719, predicting malignant transformation, nodal metastasis and prognosis2022, and anticipating its recurrent rate23. However, due to the numerous types of oral mucosal lesions, screening and diagnostic methods, and the corresponding detection devices, there is still limited large-scale research on the AI diagnosis of OPMD and oral cancer.

The objective of this study is to provide a comprehensive overview of the existing knowledge regarding the application of AI algorithms in the detection of OPMD and oral cancerous lesions. Furthermore, the study aims to assess whether there are variations in accuracy among different imaging tools utilized in these diagnostic processes.

Materials and methods

The systematic review and meta-analysis were performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA, Supplemental Digital Content 1, http://links.lww.com/JS9/C446, Supplemental Digital Content 2, http://links.lww.com/JS9/C447)24, A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR 2, Supplemental Digital Content 3, http://links.lww.com/JS9/C448)25 and Meta-analysis of Observational Studies in Epidemiology (MOOSE) guidelines. The study was registered in the ‘International Prospective Register of Systematic Reviews’ (PROSPERO), and the detailed prespecified protocol is available upon request. The present study was registered with the Research Registry (UIN: reviewregistry UIN: reviewregistry1809; https://www.researchregistry.com/browse-the-registry#registryofsystematicreviewsmeta-analyses/).

Search strategy and study selection

PubMed, Embase, Web of Science, and Scopus were searched to identify eligible studies. Title, abstract, and keywords of studies were searched using free terms and combined database thesaurus terms including Medical Subject Headings (MeSH) and Embase Subject Headings (Emtree) to maximize sensitivity. Table 1 presents the keywords and search strategies utilized for each database in this study. Two reviewers independently reviewed the titles and abstracts of eligible studies, and the selected papers were re-evaluated in full-text.

Table 1.

Systematic review search strategy for PubMed, Embase, Web of Science, and Scopus.

Web of science
(ALL=(artificial intelligence) OR ALL=(machine learning) OR ALL=(deep learning) OR ALL=(neural network) OR ALL=(artificial neural network) OR ALL=(convolutional neural network) OR ALL=(generative adversarial network) OR ALL=(transfer learning)) AND (ALL=(oral potentially malignant disorder) OR ALL=(OPMD) OR ALL=(oral precancerous) OR ALL=(oral premalignant))
Web of science
(ALL=(artificial intelligence) OR ALL=(machine learning) OR ALL=(deep learning) OR ALL=(neural network) OR ALL=(artificial neural network) OR ALL=(convolutional neural network) OR ALL=(generative adversarial network) OR ALL=(transfer learning)) AND (ALL=(oral potentially malignant disorder) OR ALL=(OPMD) OR ALL=(oral precancerous) OR ALL=(oral premalignant)).
PubMed
((artificial intelligence) OR (machine learning) OR (deep learning) OR (neural network) OR (artificial neural network) OR (convolutional neural network) OR (generative adversarial network) OR (transfer learning)) AND ((oral potentially malignant disorder) OR (OPMD) OR (oral precancerous) OR (oral premalignant))
Scopus TITLE-ABS-KEY((( ‘artificial intelligence’ OR ‘machine learning’ OR ‘deep learning’ OR ‘neural network’ OR ‘artificial neural network’ OR ‘convolutional neural network’ OR ‘generative adversarial network’ OR ‘transfer learning’)) AND ((‘oral potentially malignant disorder’ OR OPMD OR ‘oral precancerous’ OR ‘oral premalignant’)))
Embase (‘artificial intelligence’/exp OR ‘artificial intelligence’ OR ‘machine learning’/exp OR ‘machine learning’ OR ‘deep learning’/exp OR ‘deep learning’ OR ‘neural network’/exp OR ‘neural network’ OR ‘artificial neural network’/exp OR ‘artificial neural network’ OR ‘convolutional neural network’/exp OR ‘convolutional neural network’ OR ‘generative adversarial network’/exp OR ‘generative adversarial network’ OR ‘transfer learning’/exp OR ‘transfer learning’) AND (‘oral potentially malignant disorder’/exp OR ‘oral potentially malignant disorder’ OR ‘opmd’ OR ‘oral precancerous’ OR ‘oral premalignant’)

Eligibility criteria

Inclusion criteria: (1) Full-length, peer-reviewed original research papers published in English with no time restrictions; (2) Studies utilizing any class machine learning algorithms for providing diagnostic information on specific oral lesions of interest; (3) Prospective or retrospective design; (4) The oral lesions must include OPMD; (5) sensitivity and specificity analyses.

Exclusion criteria: (1) Letters to the editor, case reports, reviews, book chapters, and any study in a language other than English; (2) Insufficient availability of diagnostic AI data.

Data extraction and management

The following data from each included study were extracted: (1) year and region of the study, study type, number of the individuals, and sample size; (2) lesion type and diagnostic standard, image type, strategy of AI, and the best performing algorithm; (3) true positive, false-positive, true negative, and false negative data, or sensitivity, specificity, and accuracy information. Disagreements between reviewers were resolved either through consensus or through the final determination made by a third-party independent reviewer.

Risk of bias assessment

The risk of bias of the included studies was evaluated using the Quality Assessment of Diagnostic Accuracy Study (QUADAS-2) tool26. Four domains are assessed through this scoring system: patient selection; index test; reference standard; and flow, and timing. According to scoring in these domains, the risk of bias is judged as ‘low’, ‘high’, or ‘unclear’. Any differences of opinion were resolved utilizing discussion and mutual agreement between the authors.

Data analysis

Forest plots were generated to display the sensitivity, specificity, and negative predictive values, as well as the summary receiver operating characteristic (SROC) curves. Additionally, a meta-regression analysis was conducted to assess the potential impact of different imaging tools on the diagnostic accuracy of machine learning for oral mucosal precancerous and cancerous lesions. To further explore the heterogeneity, Graphic Display of Study Heterogeneity (GOSH) plots were generated for sensitivity and specificity analysis, respectively. A maximum of 1×106 randomly fitted models were used to accommodate computational demands27. Unsupervised clustering techniques including k-means clustering, density-based spatial clustering of applications with noise (DBSCAN), and Gaussian mixture models were applied to the GOSH plot data. Cooke’s distance was employed to assess the impact of a study on heterogeneity within a specific cluster. This measure allowed for the quantification of the level of influence that a particular study exerted on the overall heterogeneity observed in the cluster. After excluding potentially influential studies, a sensitivity analysis was conducted to examine the robustness of the findings. The results of both the primary analysis and the sensitivity analysis are reported herein to provide a comprehensive assessment of the research outcomes. The above analyses were conducted using the meta, metaphor, and mada package in R statistical software (R Foundation for Statistical Computing). A significance level of P<0.05, unless otherwise specified, was considered statistically significant.

Results

Result of database search

A comprehensive search yielded a total of 385 articles without any time restrictions. Following the removal of duplicate records, 223 papers underwent evaluation based on their titles and abstracts. Subsequently, 51 publications were subjected to full-text review, resulting in the exclusion of 14 studies that did not meet the predefined inclusion criteria. Ultimately, a total of 17 studies were deemed eligible for inclusion in the final meta-analysis. The study selection process, as recommended by the PRISMA Statement, is depicted in Figure 1. Detailed baseline data and characteristics of the included studies can be found in Supplementary Material (Supplemental Digital Content 4, http://links.lww.com/JS9/C449). Definitions of key definitions and performance metrics relevant to this systematic review was explained in Table 2.

Figure 1.

Figure 1

A flow diagram of the literature search and study selection process.

Table 2.

Traffic-light plot of QUADAS-2 risk of bias assessment tool for individual studies.

Risk of bias Applicability concerns
Study Patient selection Index test Reference standard Flow and timing Patient selection Index test Reference standard
Nayak (2006)28 ?
Francisco (2014)29 ?
Huang (2017)30 ?
Heidari (2018)31
Song (201832)
Jeng (2020)33 ?
Duran-Sierra (2021)34 ? ?
James (2021)35
Jubair (2022)17 ?
Lin (2021)36 ?
Song (2021)37
Tanriver (2021)38
Yang (2022)43
Alshawwa (2022)39 ?
Figueroa (2022)40 ? ?
Fu (2020)41
Warin (2022)42

☺Low Risk, ☹High Risk, ? Unclear Risk.

Risk of bias assessment.

Study characteristics and risk of bias assessment

The current meta-analysis included seventeen publications comprised 12 cross-sectional studies28,30,3235,3740,42,43, three retrospective17,36,41, and two prospective29,31 designs. Of which, 13 publications provided information on the number of patients, totaling 7020, with combined sample size across the included studies amounted to 24 430 (reported in 15 out of the 17 publications). Among the included studies, 70.59% (12 out of 17) utilized biopsy and histopathology as the definitive diagnostic method. In the remaining five publications, the clinical diagnosis provided by an expert or specialist was considered the gold standard. In addition, it is worth noted that only one publication (5.9%) performed both internal and external validation. In terms of regional distribution, the majority of the included studies (15 out of 17) were conducted in Asia, with India accounting for the most. The remaining two studies originated from North America and Latin America, respectively.

The results of the QUADAS-2 tool are provided in Table 3 and Figure 2. In domain 1, 52.9% of the studies demonstrated a low risk of bias, while 47.1% had an unclear risk of bias. Regarding applicability concern, 29.4% of the studies were classified as having a high concern, 64.7% had a low concern, and 5.9% had an unclear concern. Within domain 2, only one study (5.9%) was presented with high risk of bias, others (94.1%) were found to have low risk of bias; all studies were presented with low concerns regarding applicability. Within domain 3, 88.2% of studies were found to have a low risk of bias, 5.9% high risk, and 5.9% unclear risk; all studies were presented with low applicability concern. In domain 4, all studies had low risk of bias.

Table 3.

Definitions of key terms relevant to this systematic review.

General definition
Artificial intelligence The domain of computer science concerned with the development of computer systems able to perform tasks usually requiring human intelligence
Machine learning The ability of a machine to learn information and draw inferences from patterns within data without explicit programmed instruction
Deep learning A sub-field of machine learning involving the use of complex neural networks with multiple layers (>3) to allow automatic feature selection from unstructured input data
Neural network AI architectures comprising multiple algorithms in interconnected layers inspired by their biological counterparts, that allow complex feature selection and pattern recognition
Performance metrics
Diagnostic odds ratio (DOR) (‘TP’ * ‘TN’) / (‘FP’ * ‘FN’) The ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease
Sensitivity ‘TP’/(‘TP’ + ‘FN’) The ability of a test to correctly identify subjects with a disease. Also known as recall more generally
Specificity ‘TN’/(‘FP’ + ‘TN’) The ability of a test to correctly identify subjects without a disease
Negative predictive value (NPV) ‘TN’ / (‘TN’ + ‘FN’) = (Specificity * (1-Prevalence)) /( Specificity *(1- Prevalence) + Prevalence*(1-Sensitivity)) The proportions of negative results in statistics and diagnostic tests that is true negative results

FN, false negative; FP, false-positive; TN, true negative; TP, true positive.

Figure 2.

Figure 2

Risk of bias assessment.

Diagnostic accuracy of AI-assisted screening of OPMD and oral mucosal cancerous lesions

The overall diagnostic odds ratio (DOR) of AI-based screening of OPMD and oral mucosal cancerous lesions from normal mucosa was 68.438 (95% CI= [39.484–118.623], I 2=86%; Fig. 3A). The area under the summary receiver operating characteristic curve was 0.938, indicating an excellent diagnostic accuracy (Fig. 4A SROC). AI-assisted screening exhibited strong performance with a sensitivity of 89.9% (95% CI= [0.866–0.925]; I 2=81%), specificity of 89.2% (95% CI= [0.851–0.922], I 2=79%), and a high negative predictive value of 89.5% (95% CI= [0.851–0.927], I 2=96%, Figure 3B,C,D). Subgroup analyses were conducted to assess the comparative discriminating power of AI-assisted image tools in detecting OPMD and oral mucosal cancerous lesions. The purpose was to identify which image tool demonstrated superior performance in distinguishing these lesions. Subgroup analysis revealed that (OCT, OR=154.064, 95% CI= [38.826–611.342]) exhibited much higher diagnostic accuracy compared to autofluorescence (OR=27.089, 95% CI= [17.850–41.110]) and clinical photography (OR=61.834, 95% CI= [33.423–114.394]) in AI-based screening for OPMD and oral cancerous lesions in normal mucosa (Fig. 3A). However, meta-regression analysis demonstrated that there was no significant difference among the three sub-groups (P = 0.096).

Figure 3.

Figure 3

Forest plots of (A) Diagnostic odds ratio, (B) Sensitivity, (C) Specificity, (D) negative predictive values for screening OPMD and oral mucosal cancerous lesions cancerous lesions with influential outliers excluded.

Figure 4.

Figure 4

(A) Area under the summary receiver operating characteristic (SROC) for screening OPMD and oral mucosal cancerous lesions. (B) SROC for screening OPMD and oral mucosal cancerous lesions with influential outliers excluded.

Diagnostic accuracy of AI-Assisted screening of OPMD and oral mucosal cancerous lesions after excluding the heterogeneity datasets.

To investigate the sources of heterogeneity in the extracted data, GOSH plots were utilized (Fig. 5). Unsupervised clustering algorithms were applied to identify influential outliers, enabling a deeper exploration of the underlying causes of heterogeneity. A total of 24 datasets were obtained from 17 studies. Of which, six datasets were identified as contributors to between-study heterogeneity in terms of sensitivity, while seven datasets were deemed potentially influential in relation to specificity. After excluding these datasets, sensitivity (N=18) and specificity (N=17) were calculated respectively, resulting in a reduction of Higgins I 2 from 81.4% [(73.2; 87.1) Tau2=0.4805, Q (17)=123.50, P<0.0001] to 36.7% [0.0%; 64.0%] (Tau2=0.1302, Q(16)=26.85, P=0.0603) for sensitivity, and from 78.9% [69.2; 85.5] (Tau2=0.6327, Q(17)=109.00, P<0.0001) to 51.9% [16.2%; 72.3%] (Tau2= 0.3229, Q(16)=26.85, P=0.0069) for specificity. Further analysis was thus performed with influential outliers excluded. DOR was calculated to be 49.30 (95% CI [31.23–77.82], I 2=54%, Fig. 6A), and the area under the SROC curve was 0.877, indicating a satisfactory diagnostic accuracy (Fig. 4B SROC). Additionally, sensitivity, specificity, and NPV were 90.5% (95% CI [0.873–0.929], I 2=4%), 87.0% (95% CI [0.813–0.912], I 2=49%) and 90.1% (95% CI [0.860–0.931], I 2=57%), respectively (Figure 6 B,C,D). Regarding to the subgroup analysis, interestingly, the clinical photograph [OR=77.772, 95% CI [25.832–234.152)] demonstrated the highest diagnostic accuracy, followed by OCT [OR=56.997, 95% CI [24.540–132.383)] and autofluorescence tool [OR=33.908, 95% CI (16.323–70.440)], with no statistically difference observed among the above three groups (P=0.637, Fig. 6A).

Figure 5.

GOSH plot of sensitivity and specificity data with outcome of k-means clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Gaussian unsupervised algorithms. Potentially influential studies are determined as those with a leverage 3 times above the mean in any generated cluster.

graphic file with name js9-110-5034-g005.jpg

graphic file with name js9-110-5034-g006.jpg

Figure 6.

Figure 6

Forest plots of (A) Diagnostic odds ratio, (B) Sensitivity, (C) Specificity, (D) negative predictive values for screening OPMD and oral mucosal cancerous lesions with influential outliers excluded.

Discussion

As a global health concern, it is well known that early detection of oral cancer is paramount in improving survival rates and minimizing morbidity. AI-based detection offers a novel approach for screening OPMD and facilitating early diagnosis of oral cancer, demonstrating notable advantages in terms of fast-response and high-efficiency. This approach primarily involves utilizing autofluorescence, OCT, and clinical photographs techniques44,45. In the current systematic review, we conducted a comprehensive analysis in particularly excluding influential outliers. The overall DOR of AI-based screening for OPMD and oral mucosal cancerous lesions was determined to be 49.30 (AUC=0.877). Moreover, the diagnostic sensitivity reached a high value of 0.905, suggesting a satisfactory level of diagnostic accuracy. In subgroup analysis, no statistical difference was observed in the diagnostic capacity among each image tool. Of note, the DOR and sensitivity of clinical photograph to all OPMD and cancerous lesions were remarkably high, reaching 77.772 and 0.939, respectively. In comparison, OCT demonstrated a DOR of 56.997 and a sensitivity of 0.883, while the autofluorescence tool had a DOR of 33.908 and a sensitivity of 0.889. The findings of our study differ from those of Kim et al.15 in a similar research area. In their study, it was found that AI-assisted OCT analysis outperformed other methods in screening for oral precancerous and cancerous lesions compared to normal mucosa. The difference in search criteria, studies included in the meta-analysis, and the implementation of the GOSH plot to reduce heterogeneity in our study may contribute to the discrepancy in research findings between the two studies.

Autofluorescence imaging utilizes a safe blue light source that, when directed at normal tissue, reflects a uniform green fluorescence defined as fluorescence visualization retained. However, when suspicious lesions are illuminated, the fluorescence is absorbed, resulting in a darkening or even blackening, known as fluorescence visualization loss. Based on above characteristics, autofluorescence technology has the capability to detect mucosal lesions that might be difficult to identify by visual observation. It does not require the use of reagents as well as provides real-time examination results30. Based on our analysis, the diagnostic accuracy of AI using autofluorescence images was determined to be 27.089 and 33.908 before and after influential outliers’ exclusion. These findings indicate the lowest accuracy among the imaging tools evaluated in our study, aligning with the results reported in other meta-analyses15. The relatively lower diagnostic accuracy of autofluorescence imaging may be attributed to the following reasons: 1) Autofluorescence technology may have inherent technical limitations, such as the characteristics of the light source and fluorescence signal capture. These limitations can potentially impact its diagnostic performance46; 2) Oral cancer encompasses various subtypes, and autofluorescence may exhibit lower sensitivity or specificity towards certain types of oral cancer lesions47. 3) Interfering factors in the oral environment, such as saliva, food debris, or other oral substances, can potentially disrupt the accuracy of autofluorescence imaging and subsequently reduce its diagnostic precision. 4) Autofluorescence imaging is susceptible to false-positive results, which can induce the bias of diagnostic accuracy. This can be attributed to the fact that different types of lesions with diverse underlying causes exhibit distinct autofluorescent properties48. Furthermore, the acquisition of autofluorescence imaging using different devices can introduce biases. Variances in hardware components such as light sources, detectors, and filters among different facilitates can affect diagnostic accuracy. These factors can lead to variations in brightness, contrast, and color shift in autofluorescence images obtained from different devices. Consequently, AI algorithms applied to utilize autofluorescence imaging for diagnostic purposes should place greater emphasis on minimizing device-related differences in the future49.

OCT technique harnesses the low-coherence nature of a broadband light source to capture real-time, NI, and high-resolution two or three-dimensional cross-sectional images of the internal structure of a sample. It has developed for over 30 years and has been successfully applied in multidisciplinary including oral medicine50,51. Polarization-sensitive OCT (PSOCT), which incorporates polarization information, enables the detection of tissue abnormalities by probing the birefringence and depolarization characteristics of the tissue52. Chen et al. utilized PSOCT for vivo and vitro imaging of mouse tongue tissues. They analyzed the changes in oral mucosal matrix from normal, hyperplasia, and dysplasia to early-stage cancer. The sensitivity, specificity, positive predictive value, and negative predictive value of PSOCT for detecting dysplasia and early-stage cancer were reported as 100%, 95%, 93.75%, and 100%, respectively. These findings provide preliminary validation of the accuracy of PSOCT in detecting early-stage malignant in the oral cavity53. Yang et al. utilized AI for automated classification and detection of OCT images and reported a sensitivity and specificity of up to 98% for the diagnosis of OPMD and oral cancer43. In our study, OCT was found to be a satisfactory diagnostic image tool with a DOR of 56.997 in AI diagnosis. In future research endeavors, it is essential to consider the accessibility of equipment or systems that can be utilized in outpatient treatment settings.

Clinical photograph imaging primarily involves captured intraoral images using the digital camera or smartphone, followed by computer processing for automatic identification of lesion types. Currently, it can be broadly categorized into two main approaches. One approach focuses on evaluating the performance of existing convolutional neural networks (CNNs) in diagnosing oral mucosal lesions to validate their effectiveness. Fu et al.41 developed a deep learning algorithm based on clinical visual features for oral cancer auto-detection, which rapidly identifies lesion areas from clinical intraoral photographs with high accuracy. Warin et al. investigated the performance of CNNs in binary classification and object detection tasks for OSCC and normal tissue. They collected 350 OSCC images and 350 normal oral mucosa samples, conducted preprocessing after extracting the region of interest. The results demonstrated that using the DenseNet121 model achieved a classification accuracy of 98.75% sensitivity and 100% specificity. The other approach involves adopting new algorithms or adjusting exist models to improve the auto-detection performance, comprising the utilization of classification networks, transfer learning, multi-model ensembles, and Bayesian deep neural networks54. Jubair et al. utilized a lightweight CNNs for binary classification of oral mucosal lesions, categorizing them as benign or ‘suspicious’ (i.e. cancer or OPMD). They employed ImageNet pretrained EfficientNet-B0 model, which outperformed commonly used VGG19 and ResNet101 networks on the validation performance metrics (sensitivity: 86.7%, specificity: 84.5%)17. In the binary classification of benign lesions and precancerous lesions, a developed integrated learning model combining ResNet-50 and VGG-16 CNNs achieved slightly superior identification accuracy compared to individual models, with sensitivity of 98.14%, and specificity of 94.23%55. Clinical photograph imaging indicated the highest DOR of 77.772 among all image tools in our meta-analysis.

Several potential reasons account for this discrepancy: AI algorithms rely on effective feature representation to discriminate between different types of lesions. Clinical photograph provides high-resolution and detailed visual information, including texture, shape, and color, which can aid in accurate lesion discrimination. In contrast, OCT and autofluorescence imaging may have limited feature representation capabilities, potentially impeding their ability to provide sufficient information for accurate identification. Besides, OCT and autofluorescence imaging might susceptible to equipment-related factors, such as the intensity and stability of the light source, limitations in imaging depth, and tissue transparency. These factors can decrease image quality and readability, thereby impacting the accurate identification of lesions. Even though no significant difference was found, this result highlights the potential superior performance of clinical photograph in auto-detecting OPMD and oral cancerous lesions. With the features of NI, real-time, user-friendly, and cost-effective, further research should establish the large-scale, standardized datasets of oral mucosal lesion images to ensure the accuracy and generalizability.

Nevertheless, compared to AI research in other medical domains, studies pertaining to AI diagnosis of oral mucosal lesions remain relatively limited, calling for further in-depth investigation and exploration. Under these circumstances, the clinical recommendations and considerations are as follows:

  1. The lack of large-scale, standardized datasets of oral mucosal lesion images poses a challenge in current research. Conducting comparative studies among various AI algorithms becomes difficult, impeding comprehensive validation of their accuracy and generalizability.

  2. The collection of samples often excludes complicated cases, and only few research takes into account the general conditions of the patients. Thus, the reported diagnostic performance in training and validation phase may not fully represent the effectiveness when applied in clinical use.

  3. Uncertainty research regarding intelligent diagnosis of oral mucosal lesions has not been extensively conducted. In the face of the commonly used machine learning algorithm could exhibit erroneous judgments even with the high confidence levels. In other words, AI-based diagnosis does not provide explicit indications of its uncertainty or lack of knowledge to individuals.

  4. It is imperative to focus on enhancing the interpretability of AI models and adopting clinically applicable performance metrics. This approach will facilitate the translation of AI models into clinical settings. Developers should possess a comprehensive understanding of the training data and be mindful of potential unintended algorithmic biases. Furthermore, it is crucial to conduct external validation to ensure their generalizability across diverse individuals. In the articles we included, it was found that only a single publication (5.9%) conducted both internal and external validation, highlighting a significant gap in the validation procedures employed across the studies. By addressing these aspects, researchers can enhance the trustworthiness and utility of AI models in clinical settings, promoting their effective integration and facilitating improved patient care.

  5. When considering the implementation of image tools in actual clinical practice, it is essential to recognize that each tool has its own advantages and limitations to ensure optimal application. Moreover, the accessibility of equipment or systems for use in outpatient settings becomes a significant consideration. The availability and feasibility of employing such tools on patients during routine clinical care play a vital role in their practical integration.

To the best of our knowledge, this is the most comprehensive systematic review and meta-analysis focus on the diagnostic accuracy with the sub-group analysis based on different image tools. Specifically, we employed the GOSH plot, a method for visualizing between-study heterogeneity, to reduce the heterogeneity among the included studies. This was done with the aim of ensuring the accuracy and reliability of the obtained results, unprecedented in the existing literature. Based on our analysis, the acquisition of large-scale image datasets holds substantial clinical significance as a pivotal factor in advancing the accuracy of AI analysis.

Our study has several potential limitations that should be acknowledged. Firstly, we limited our inclusion criteria to studies published in the English language, which may introduce language bias and potentially exclude relevant studies published in other languages. Secondly, the inclusion of data from multiple imaging tools in our analysis introduced heterogeneity in the results. To address this issue, we conducted GOSH analysis and separate sensitivity analyses for each imaging tool to assess their individual performance. Nonetheless, the overall heterogeneity across the included studies should be considered when interpreting the results. In addition, it is important to acknowledge that even when using the same imaging tool, variations in device quality and differences in techniques across studies can impact the accuracy of diagnosis. Factors such as variations in image acquisition protocols, calibration methods, and technical specifications of the devices used can contribute to differences in diagnostic accuracy. These variations should be taken into consideration when interpreting the results and generalizing the findings.

Conclusion

Utilizing various image tools, AI-based detection using clinical photography exhibits a high DOR, making it easily accessible in the present era with billions of phone subscribers globally. This is particularly significant for low-income countries and middle-income countries that face challenges in accessing specialist care and adequate healthcare services. With the ongoing evolution of image acquisition devices and the integration of diverse AI algorithms, there is a prevailing expectation of significant advancements in diagnostic accuracy. It is anticipated that the utilization of AI for diagnosing a vast number of these images will enable accurate and efficient screening, thereby enhancing healthcare outcomes.

Ethical approval

Not applicable.

Consent

Not applicable.

Sources of funding

This study is supported in part by AIR@InnoHK administrated by Innovation and Technology Commission of Hong Kong.

Author contribution

J.L.: data curation, formal analysis, methodology, writing – original draft, and writing – review and editing; W.Y.K.: data curation, formal analysis, and methodology; C.P.M.: conceptualization, supervision, and writing – review and editing; B.W.A.C.: software, supervision, and writing – review and editing; J.W.K.H.: software and supervision; L.W.Z.: conceptualization, methodology, supervision, writing – original draft, and writing – review and editing.

Conflicts of interest disclosure

JWKH is a co-founder, shareholder and director of Vitome Limited. This company plays no role in the study and the decision to publish this manuscript.

Research registration unique identifying number (UIN)

The study was registered in the ‘International Prospective Register of Systematic Reviews’ (PROSPERO, registration number: CRD42023480955) Status: ‘Review Completed not published’.

Guarantor

Liwu Zheng.

Data availability statement

None required for this article.

Provenance and peer review

Not commissioned, externally peer-reviewed.

Supplementary Material

js9-110-5034-s001.pdf (131.5KB, pdf)
js9-110-5034-s002.docx (55.4KB, docx)
js9-110-5034-s003.pdf (319.2KB, pdf)
js9-110-5034-s004.pdf (4.8MB, pdf)

Footnotes

Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.

Supplemental Digital Content is available for this article. Direct URL citations are provided in the HTML and PDF versions of this article on the journal’s website, www.lww.com/international-journal-of-surgery.

Published online 23 April 2024

Contributor Information

JingWen Li, Email: jingwen7883@gmail.com.

Wai Ying Kot, Email: kamechan@connect.hku.hk.

Colman Patrick McGrath, Email: mcgrathc@hku.hk.

Bik Wan Amy Chan, Email: abwchan@cuhk.edu.hk.

Joshua Wing Kei Ho, Email: jwkho@hku.hk.

Li Wu Zheng, Email: lwzheng@hku.hk.

Reference

  • 1. Organization WH. Global oral health status report: towards universal health coverage for oral health by 2030. 2022.
  • 2. Surveillance E, and End Results Program, National Cancer Institute. Cancer Stat Facts: Oral Cavity and Pharynx Cancer.
  • 3. Warnakulasuriya S, Kujan O, Aguirre-Urizar JM, et al. Oral potentially malignant disorders: a consensus report from an international seminar on nomenclature and classification, convened by the WHO Collaborating Centre for Oral Cancer. Oral Dis 2021;27:1862–1880. [DOI] [PubMed] [Google Scholar]
  • 4. Mehanna HM, Rattay T, Smith J, et al. Treatment and follow-up of oral dysplasia - a systematic review and meta-analysis. Head Neck 2009;31:1600–1609. [DOI] [PubMed] [Google Scholar]
  • 5. Parak U, Lopes Carvalho A, Roitberg F, et al. Effectiveness of screening for oral cancer and oral potentially malignant disorders (OPMD): a systematic review. Prev Med Rep 2022;30:101987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Sulaiman D, Lohiya A, Rizwan SA, et al. Diagnostic accuracy of screening of lip and oral cavity cancers or potentially malignant disorders (PMD) by frontline workers: a systematic review and meta-analysis. Asian Pac J Cancer Prev 2022;23:3983–3991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Essat M, Cooper K, Bessey A, et al. Diagnostic accuracy of conventional oral examination for detecting oral cavity cancer and potentially malignant disorders in patients with clinically evident oral lesions: systematic review and meta-analysis. Head Neck 2022;44:998–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Walsh T, Warnakulasuriya S, Lingen MW, et al. Clinical assessment for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults. Cochrane Database Syst Rev 2021;12:CD010173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Mendonca P, Sunny SP, Mohan U, et al. Non-invasive imaging of oral potentially malignant and malignant lesions: a systematic review and meta-analysis. Oral Oncol 2022;130:105877. [DOI] [PubMed] [Google Scholar]
  • 10. Choudhary OP, Priyanka. ChatGPT in travel medicine: a friend or foe? Travel Med Infect Dis 2023;54:102615. [DOI] [PubMed] [Google Scholar]
  • 11. Choudhary OP, Saini J, Challana A, et al. ChatGPT for veterinary anatomy education: an overview of the prospects and drawbacks. Int J Morphol 2023;41:1198–1202. [Google Scholar]
  • 12. Allahqoli L, Laganà AS, Mazidimoradi A, et al. Diagnosis of cervical cancer and pre-cancerous lesions by artificial intelligence: a systematic review. Diagnostics (Basel) 2022;12:2771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Hou X, Shen G, Zhou L, et al. Artificial intelligence in cervical cancer screening and diagnosis. Front Oncol 2022;12:851367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Popa SL, Grad S, Chiarioni G, et al. Applications of artificial intelligence in the automatic diagnosis of focal liver lesions: a systematic review. J Gastrointestin Liver Dis 2023;32:77–85. [DOI] [PubMed] [Google Scholar]
  • 15. Kim JS, Kim BG, Hwang SH. Efficacy of artificial intelligence-assisted discrimination of oral cancerous lesions from normal mucosa based on the oral mucosal image: a systematic review and meta-analysis. Cancers (Basel) 2022;14:3499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ferro A, Kotecha S, Fan K. Machine learning in point-of-care automated classification of oral potentially malignant and malignant disorders: a systematic review and meta-analysis. Sci Rep 2022;12:13797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Jubair F, Al-Karadsheh O, Malamos D, et al. A novel lightweight deep convolutional neural network for early detection of oral cancer. Oral Dis 2022;28:1123–1130. [DOI] [PubMed] [Google Scholar]
  • 18. Sharma N, Om H. Usage of probabilistic and general regression neural network for early detection and prevention of oral cancer. Scient World J 2015;2015:234191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Al-Ma’aitah M, AlZubi AA. Enhanced computational model for gravitational search optimized echo state neural networks based oral cancer detection. J Med Syst 2018;42:205. [DOI] [PubMed] [Google Scholar]
  • 20. Adeoye J, Tan JY, Choi SW, et al. Prediction models applying machine learning to oral cavity cancer outcomes: a systematic review. Int J Med Inform 2021;154:104557. [DOI] [PubMed] [Google Scholar]
  • 21. Mermod M, Jourdan EF, Gupta R, et al. Development and validation of a multivariable prediction model for the identification of occult lymph node metastasis in oral squamous cell carcinoma. Head Neck 2020;42:1811–1820. [DOI] [PubMed] [Google Scholar]
  • 22. Bur AM, Holcomb A, Goodwin S, et al. Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma. Oral Oncol 2019;92:20–25. [DOI] [PubMed] [Google Scholar]
  • 23. Exarchos KP, Goletsis Y, Fotiadis DI. Multiparametric decision support system for the prediction of oral cancer reoccurrence. IEEE Trans Inf Technol Biomed 2012;16:1127–1134. [DOI] [PubMed] [Google Scholar]
  • 24. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg 2021;88:105906. [DOI] [PubMed] [Google Scholar]
  • 25. Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 2017;358:j4008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529–536. [DOI] [PubMed] [Google Scholar]
  • 27. Olkin I, Dahabreh IJ, Trikalinos TA. GOSH - a graphical display of study heterogeneity. Res Synth Methods 2012;3:214–223. [DOI] [PubMed] [Google Scholar]
  • 28. Nayak GS, Kamath S, Pai KM, et al. Principal component analysis and artificial neural network analysis of oral tissue fluorescence spectra: classification of normal premalignant and malignant pathological conditions. Biopolymers 2006;82:152–166. [DOI] [PubMed] [Google Scholar]
  • 29. Francisco AL, Correr WR, Azevedo LH, et al. Fluorescence spectroscopy for the detection of potentially malignant disorders and squamous cell carcinoma of the oral cavity. Photodiagnosis Photodyn Ther 2014;11:82–90. [DOI] [PubMed] [Google Scholar]
  • 30. Huang TT, Huang JS, Wang YY, et al. Novel quantitative analysis of autofluorescence images for oral cancer screening. Oral Oncol 2017;68:20–26. [DOI] [PubMed] [Google Scholar]
  • 31. Heidari AE, Sunny SP, James BL, et al. Optical coherence tomography as an oral cancer screening adjunct in a low resource settings. IEEE Journal of Selected Topics in Quantum Electronics 2018;25:1–8. [Google Scholar]
  • 32. Song B, Sunny S, Uthoff RD, et al. Automatic classification of dual-modalilty, smartphone-based oral dysplasia and malignancy images using deep learning. Biomed Opt Express 2018;9:5318–5329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Jeng MJ, Sharma M, Chao TY, et al. Multiclass classification of autofluorescence images of oral cavity lesions based on quantitative analysis. PLoS One 2020;15:e0228132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Duran-Sierra E, Cheng S, Cuenca R, et al. Machine-learning assisted discrimination of precancerous and cancerous from healthy oral tissue based on multispectral autofluorescence lifetime imaging endoscopy. Cancers (Basel) 2021;13:4751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. James BL, Sunny SP, Heidari AE, et al. Validation of a point-of-care optical coherence tomography device with machine learning algorithm for detection of oral potentially malignant and malignant lesions. Cancers (Basel) 2021;13:3583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Lin H, Chen H, Weng L, et al. Automatic detection of oral cancer in smartphone-based images using deep learning for early diagnosis. J Biomed Opt. 2021;26:086007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Song B, Sunny S, Li S, et al. Mobile-based oral cancer classification for point-of-care screening. J Biomed Opt 2021;26:065003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Tanriver G, Soluk Tekkesin M, Ergen O. Automated detection and classification of oral lesions using deep learning to detect oral potentially malignant disorders. Cancers (Basel) 2021;13:2766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Alshawwa SZ, Saleh A, Hasan M, et al. Segmentation of oral leukoplakia (OL) and proliferative verrucous leukoplakia (PVL) using artificial intelligence techniques. Biomed Res Int 2022;2022:2363410. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 40. Figueroa KC, Song B, Sunny S, et al. Interpretable deep learning approach for oral cancer classification using guided attention inference network. J Biomed Opt 2022;27:015001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Fu Q, Chen Y, Li Z, et al. A deep learning algorithm for detection of oral cavity squamous cell carcinoma from photographic images: a retrospective study. EClinicalMedicine 2020;27:100558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Warin K, Limprasert W, Suebnukarn S, et al. AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer. PLoS One 2022;17:e0273508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Yang Z, Shang J, Liu C, et al. Identification of oral precancerous and cancerous tissue by swept source optical coherence tomography. Lasers Surg Med 2022;54:320–328. [DOI] [PubMed] [Google Scholar]
  • 44. Camalan S, Mahmood H, Binol H, et al. Convolutional neural network-based clinical predictors of oral dysplasia: class activation map analysis of deep learning results. Cancers (Basel) 2021;13:1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Mintz Y, Brodie R. Introduction to artificial intelligence in medicine. Minim Invasive Ther Allied Technol 2019;28:73–81. [DOI] [PubMed] [Google Scholar]
  • 46. Mat Lazim N, Kandhro AH, Menegaldo A, et al. Autofluorescence image-guided endoscopy in the management of upper aerodigestive tract tumors. Int J Environ Res Public Health 2022;20:159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Awan KH, Morgan PR, Warnakulasuriya S. Evaluation of an autofluorescence based imaging system (VELscope) in the detection of oral potentially malignant disorders and benign keratoses. Oral Oncol 2011;47:274–277. [DOI] [PubMed] [Google Scholar]
  • 48. Yang EC, Tan MT, Schwarz RA, et al. Noninvasive diagnostic adjuncts for the evaluation of potentially premalignant oral epithelial lesions: current limitations and future directions. Oral Surg Oral Med Oral Pathol Oral Radiol 2018;125:670–681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Frampton GK, Kalita N, Payne L, et al. Accuracy of fundus autofluorescence imaging for the diagnosis and monitoring of retinal conditions: a systematic review. Health Technol Assess 2016;20:1–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Schmitt JM. Optical coherence tomography (OCT): a review. IEEE Journal of selected topics in quantum electronics 1999;5:1205–1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science 1991;254:1178–1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. de Boer JF, Milner TE, van Gemert MJ, et al. Two-dimensional birefringence imaging in biological tissue by polarization-sensitive optical coherence tomography. Opt Lett 1997;22:934–936. [DOI] [PubMed] [Google Scholar]
  • 53. Chen PH, Lee HY, Chen YF, et al. Detection of oral dysplastic and early cancerous lesions by polarization-sensitive optical coherence tomography. Cancers (Basel) 2020;12:2376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Warin K, Limprasert W, Suebnukarn S, et al. Automatic classification and detection of oral cancer in photographic images using deep learning algorithms. J Oral Pathol Med 2021;50:911–918. [DOI] [PubMed] [Google Scholar]
  • 55. Nanditha B, Geetha A, Chandrashekar H, et al. An ensemble deep neural network approach for oral cancer screening. 2021.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

None required for this article.


Articles from International Journal of Surgery (London, England) are provided here courtesy of Wolters Kluwer Health

RESOURCES