Abstract
Aim
The early detection of oral cancer (OC) at the earliest stage significantly increases survival rates. Recently, there has been an increasing interest in the use of artificial intelligence (AI) technologies in diagnostic medicine. This study aimed to critically analyse the available evidence concerning the utility of AI in the diagnosis of OC. Special consideration was given to the diagnostic accuracy of AI and its ability to identify the early stages of OC.
Materials and methods
From the date of inception to December 2021, 4 databases (PubMed, Scopus, EBSCO, and OVID) were searched. Three independent authors selected studies on the basis of strict inclusion criteria. The risk of bias and applicability were assessed using the prediction model risk of bias assessment tool. Of the 606 initial records, 17 studies with a total of 7245 patients and 69,425 images were included. Ten statistical methods were used to assess AI performance in the included studies. Six studies used supervised machine learning, whilst 11 used deep learning. The results of deep learning ranged with an accuracy of 81% to 99.7%, sensitivity 79% to 98.75%, specificity 82% to 100%, and area under the curve (AUC) 79% to 99.5%.
Results
Results obtained from supervised machine learning demonstrated an accuracy ranging from 43.5% to 100%, sensitivity of 94% to 100%, specificity 16% to 100%, and AUC of 93%.
Conclusions
There is no clear consensus regarding the best AI method for OC detection. AI is a valuable diagnostic tool that represents a large evolutionary leap in the detection of OC in its early stages. Based on the evidence, deep learning, such as a deep convolutional neural network, is more accurate in the early detection of OC compared to supervised machine learning.
Key words: Oral cancer, Artificial intelligence, Neural network, Machine learning, Diagnosis
Introduction
According to the Global Cancer Statistics of 2018, oral cancer (OC) (International Classification of Disease [ICD]: 10 C00–06) is the 11th most frequently reported cancer worldwide, with over 640,000 new cases reported annually.1 Despite major improvements in cancer diagnosis and treatment modalities, morbidity and mortality rates of OCs remain high, particularly in advanced stages (T3 and T4).2, 3, 4, 5 Although histologic evaluation of biopsies by an oral pathologist remains the gold standard for diagnosing OC, it is liable to subjective judgment due to discrepancies in interpretation and variability of results.6 Therefore, alternative methods that are anticipated to provide more accurate, fast, and standardised diagnosis and improve OC patient survival rates are needed.
Artificial intelligence (AI) is an area of computer science that can be defined as a machine's capacity to emulate a human's cognitive capacity. The term “artificial intelligence” refers to a wide range of methodologies. For instance, deep learning is a potentially revolutionary technology that attempts to model high-level abstractions in medical imagery to derive diagnostic meanings.
It is vital to remember that AI is a broad term that encompasses 2 distinct branches: traditional machine learning and deep learning. Traditional machine learning uses algorithms and computer processes to calculate information and recognise patterns from input data and then offers a quantified judgment as a diagnostic result regarding the nature and behaviour of the lesion.3 Traditional machine-learning approaches are further divided into supervised and unsupervised methods. The supervised technique relies on the machine learning model being trained to validate the inputs and outputs that are used as the model's ground truth against which the diagnostic input is tested.7 In contrast, the unsupervised techniques are machine learning models that are not built upon preordained values; hence, it uses extraction and mining methods to explore common hidden features from the input data or specimen.8 Deep learning or neural networks, which are regarded as a subset of machine learning, are computational techniques based on the formation of nonlinear processing units with multiple hidden layers to learn and comprehend input and associate it with the output. Unlike classical machine learning, deep learning can process large-scale data, given the intricacy and abstraction of data, and explore complex relations between the input and output.9,10
Recently, there has been a significant surge in research on AI-based technologies for medical imaging and diagnosis.11 The reason for implementing AI in the field of oncology is its potential to improve the accuracy and efficacy of cancer screening.6 AI technologies are effective in identifying breast, lung, and oral cancers.12, 13, 14 These techniques are currently being evaluated for inclusion in diagnostic systems, particularly for disease screening in resource-constrained situations, where trained doctors and experts are in short supply.15, 16, 17
Because AI has always been under constant investigation and development, many reviews have been conducted during the last decade. However, there is a lack of emphasis on the accuracy or sensitivity of the method in the early detection of OC.
The use of AI can reduce the effort required for sceening and analysis of large data sets during detection of malignant lesions.6 However, more research on the use of AI in the diagnosis of OC is required. Primarily, the accuracy and efficiency of AI in recognizing OC in comparison to a trained clinician must be evaluated, along with detection at an early stage.
This systematic review was conducted to critically evaluate the available evidence concerning the accuracy and efficiency of utilizing AI in diagnosing OC and whether AI can detect OC lesions in their early stages as precisely as a clinician can.
Methodology
Protocol
This systematic review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement for reporting systematic reviews.18 The systematic review protocol was registered on the PROSPERO platform (CRD42021288107).
Focused question
Is AI effective in providing an accurate diagnosis for the early detection of OC?
The question for the current systematic review was adopted to follow the PICO criteria:
P: Oral squamous cell carcinoma (OSCC) cases
I: AI (machine and deep learning)
C: Cancerous vs noncancerous images
O: Accuracy of AI in the early detection of OC
Literature search
From inception to November 30, 2021, the University of Sharjah Library was used to conduct the search, which included access to 4 databases: PubMed, Scopus, EBSCO, and OVID. The publications collected were published between 2000 and 2021, ensuring that the literature gathered provided a comprehensive picture of AI advancement in the field of OC detection and diagnosis. A set of keyword combinations “oral cancer” [MeSH term] AND “machine learning” [MeSH term] OR “deep learning” [MeSH term] OR “neural network” [MeSH term]) was used to search the literature in all 4 databases to ensure that all relevant articles were screened.
A manual search of the following dental journals was also performed: Journal of Oncology, Journal of Oral Diseases, Journal of Oral Pathology & Medicine and Oral Surgery Oral Medicine, Oral Pathology Oral Radiology, International Journal of Oral and Maxillofacial Surgery, European Journal of Craniomaxillofacial Surgery, British Journal of Oral and Maxillofacial Surgery, and Journal of Craniofacial Surgery.
Additional research was conducted on the basis of the reference lists of the discovered studies and pertinent reviews on the issue. Furthermore, ClinicalTrials.gov, www.centerwatch.com/clinical trials, and www.clinicalconnection.com were used to search the web databases for information on ongoing clinical studies.
Inclusion and exclusion criteria
The inclusion criteria were as follows:
-
1.
Human experimental or observational studies that have employed AI technology to identify OCs.
-
2.
Research comparing physicians’ diagnostic outcomes against AI for OC.
-
3.
The samples collected should be in the form of histologic or photographic images.
-
4.
Full-text, English-language studies that reported accuracy, sensitivity, specificity, and/or area under the curve (AUC).
The exclusion criteria were as follows:
-
1.
Studies with fewer than 10 patients.
-
2.
Studies including individuals with recurrent OC.
-
3.
Animal studies.
-
4.
Literature reviews, case reports, short communication, non-English studies, personal viewpoints, letters to editors, and conference abstracts.
Study selection and data extraction
The titles, abstracts, and full texts of the relevant studies were examined separately by 3 reviewers, and any disagreements were resolved by consensus. The reviewers retrieved the required information from eligible studies. The following data were collected for each study (when available): author, year, country, sample type, sample size, learning machine and training set/cycle, statistical findings (accuracy, sensitivity, specificity, and AUC), and the main outcomes (Table 1).
Table 1.
No. | Author, year, country | Sample number | Sample type | Learning machine/training cycle and sets | Statistical findings (AUC, sensitivity, specificity, etc) | Main outcome |
---|---|---|---|---|---|---|
1 | Welikala et al7 India | No. of patients = 1085 No. of images = 2155 Training images = 1744 images Validation images = 207 |
Photographic images | 1. Image classification: ResNet-101 neural network 2. Object detection: Region proposal network (RPN) and detection network |
Image classification: -Images that contained lesion:P = 84.77%, R = 89.51%, F1 87.07% -Object detection: P = 46.61%, R = 37.16%, F1 = 41.35% |
Initial results demonstrate the effectiveness of deep learning and are encouraging when we consider the scale of the problem. |
2 | Majumder et al8 India | No. of patients = 114 HG-OSCC = 45 patients with 225 tissue sites LG-OSCC = 23 patients with 83 tissue sites Leukoplakia = 6 patients with 40 tissue sites Normal = 30 patients with 225 tissue sites |
Oral tissue biopsies | Total principal component analysis regression (TPCR), based direct multiclass discrimination algorithm. Training cycle and set = 4 training sets and 4 validation sets |
TPCR accuracy with 4 classes -Training Data: HG-OSCC = 94%, LG-OSCC = 100%, leukoplakia = 100%, normal = 100% -Cross-validation data: HG-SCC = 90%, LG-SCC = 90%, leukoplakia = 85%, normal = 88% |
TPCR was found to provide satisfactory performance in classifying the tissue sites in 4 different low classes: high-grade squamous cell carcinoma, low -grade squamous cell carcinoma, leukoplakia, and normal squamous tissue. |
3 | Das et al20 India | No. of patients = 43 Total No. of images = 126 with 3 images from each slide; (normal = 2, LG-OSCC = 25, HG-OSCC = 15) |
Histologic slide image | DCNN Training cycle and set = 20 epochs |
Epithelia segmentation: AC = 98.42%, SN = 97.76% Keratin pearls detection: AC = 96.88% |
The proposed CNN has higher accuracy results and better performance in the segmentation of tissue layer and keratin pearl detection of the histologic image of OSCC than the existing state of the art for epithelial layer segmentation. |
4 | Uthoff et al21 India | Number of patients = 190 Number of images = 170 image pairs Normal class = 86; suspected OSCC = 84 |
Autofluorescence image and white light image |
CNN Training cycle and set = 80 epochs |
On-site specialist: AUC = 0.908, SN = 0.8500, SP = 0.8875, PPV = 0.8767, NPV = 0.8549 Remote specialist: SN = 0.9259, SP = 0.8667, PPV = 0.9494, NPV =0.8125 |
With suspect areas outlined, the combination of WLI and AFI provides the most information about the type of lesion and the size of the affected area. Compared to on-site specialists, the remote specialist was able to diagnose patients correctly with the help of the proposed device with high value and performance. |
5 | Song et al22 India | No. of patients = 12 No. of images = 35 images |
P53 immunostained tissue section |
Supporting vector machine Training cycle and set = not mentioned |
Blue component: AC = 98.01%, SN = 98.86%, SP = 94.74% | The experimental result, blue component of automatic technique, has performed well in classification as well as detecting immunopositivity of tissue images. Also, they found that the immunopositive ratio values of both manual and automatic techniques were equal. |
6 | Song et al23 India | 2350 cheek mucosa images | The intraoral data set of cheek mucosa images | Learning machine: Bayesian deep network training = 300 epochs |
AC = 90% | The performance can be further improved by referring more patients. The experiments show that the model is capable of identifying difficult cases needing further inspection. |
7 | Jeyaraj et al24 India | Total image in BioGPS data = 100 (tumor = 65, normal = 35) Total images in TCIA archive = 500 (tumor = 450, normal = 50) Total image in GDC data set = 700 (tumor = 625, normal = 75) |
Multidimensional hyperspectral image | Partitioned DCNN Training cycle and set = not mentioned |
DCNN algorithm (with 100-image set): AC = 91.4%, SP = 91%, SN = 94%, AUC = 0.94) Proposed partitioned CNN algorithm (with 500-image set): AC = 94.5%, SP = 98%, SN = 94%, AUC = 0.965) |
Proposed partitioned CNN had higher accuracy results compared with the other classifier SVM and DBN, and the accuracy increased by 4.5% when a large number of cancer patient data sets were used in the training phase. |
8 | Rahman et al25 India | Total No. of slides = 42 Normal = 13, (OSCC lesion = 29) Total No. of images of nuclei acquired from slide = 720 (normal = 237, malignant = 483) |
Histopathologic slide | 1. Tree-based classification 2. Logistic regression 3. K-nearest neighbour classifier 4. SVM classifier 5-Linear discriminant analysis Training cycle and set = Cycles: 5, training sets: 4, testing sets: 1 |
For texture, shape, and colour features: 1. SN = 99.2%, SP = 99.8%, AC = 99.4% 2. SN = 100%, SP = 100%; AC = 100%; 3. SN = 99.2%, SP = 16.1%, AC = 43.5% 4. SN = 100%, SP = 100%, AC = 100% 5. SN = 99.6%, SP = 100%, AC = 99.9% |
Accurate results for colour, shape, and texture features using the classification were achieved. The in-depth analysis showed that SVM and linear discriminant classifiers gave the best results for texture and colour features. |
9 | Shahul Hameed et al26 India | No. of patients = 40 -27 slides -118 normal cells -334 malignant slides -Total of 452 extracted morphologic features |
Histologic images | 1. Decision tree classifier 2. SVM 3. K-nearest neighbour 4. Discriminant analysis 5. Logistic regression Training cycle and set = not mentioned |
Accuracy of: -Decision tree = 99.78% -Linear discriminant = 93.6% -Logistic regression = 62.9% -SVM = 93.6% -K-nearest neighbour = 54.3% |
The decision tree yielded the highest accuracy. |
10 | Duran-Sierra et al27 USA | 57 patients for tissue biopsy examination of suspicious oral epithelial precancerous or cancerous lesions | Multispectral auto-fluorescence lifetime imaging |
Learning machine: 1. Linear discriminant analysis, quadratic |
SN = 94% SP = 74% F1 score = 0.85 |
The model using spectral-only features was SVM. LOGREG was the best performing classification, WhileQDA was the best-performing model using time-resolved-only features. |
11 | Schwarz et al28 USA | Patient No. with oral lesion = 60, with 154 sites -Normal volunteers = 64, with 270 sites |
Spectroscopy probe, biopsy | SVM: linear discriminant analysis Training cycle and set = not mentioned |
SN = 82%, SP = 87%, AUC = 0.93 |
Differences in oral spectra were observed in (1) neoplastic vs non- neoplastic sites, (2) keratinised vs nonkeratinised tissue, and (3) shallow vs deep depths within oral tissue. Algorithms based on spectra from 310 nonkeratinised anatomic sites (buccal, tongue, floor of mouth, and lip) yielded an area under the receiver operating characteristic curve of 0.96 in the training set and 0.93 in the validation set. |
12 | Song et al29 USA |
6211 pairs of intraoral images from 5025 patients | Intraoral images | Learning machine = dual-modality mobile-based classification using deep learning model MobileNet/Training = 300 epochs. | AC = 81%, SN = 79%, SP = 82% | The proposed method achieved 81% accuracy for distinguishing normal/benign lesions from clinically suspicious lesions. |
13 | Fu et al30 China | No. of images: -Initial data set = 44,409 images -Algorithm development = 5575 -IVD = 401 -Secondary analysis = 170 -EVD = 420 photographs -CVD = 666 photographs |
Photographic images |
Learning machine: DCNN Training cycle and set = not mentioned |
IVD: AUC = 0.983 (95%), SN = 94.9%, SP = 88.7%, AC = 91.5% -Secondary analysis on IVD: AUC = 0.995, SN = 97.4%, SP = 93.5%, AC = 95.3% EVD: AUC = 0.935, SN = 89.6%, SP = 80.6%, AC = 84.1% CVD: AUC = 0.97, SN = 91.0%, SP = 93.5%, AC = 92.3% Overall accuracy = 92.3% |
This deep neural network is helpful in identifying these very small OSCC lesions in high-risk individuals, achieving a promising result (AUC = 0.995) during the secondary analysis on internal validation data set, which is comparable to a human specialist. |
14 | Lin et al31 China | Oral lesion images = 688 Normal mucosa images = 760 |
Photographic images | Learning machine = smartphone-based image diagnosis with deep learning network HRNet/Training = 15, 30, and 45 epochs. | SN = 83%, SP = 96.6%, P = 84.3%, F1 = 83.6% | The performance of HRNet model achieved slightly better performance when compared to VGG16, ResNet50, DenseNet169. Also the F1 score was higher by 8% when a centre positioning method was used. |
15 | Aubreville et al32 Germany | No. of patients = 12 Total No. of images = 7894 (Normal alveolar ridge = 1951, normal inner labium = 1317, normal hard palate = 811, and OSCC lesion = 3815) |
Confocal laser endomicroscopy images | Learning machine: DCNN Training cycle and set = 60 epochs |
Proposed CNN: AC = 88.3%, SN = 86.6%, SP = 90.0%, AUC = 0.96 |
Present CNN approach using ppf method significantly outperforms conventional approach, that is, textural feature-based machine for CLE image recognition. |
16 | Warin et al33 Thailand |
700 clinical oral photographs | Oral photographs. | Learning machine: DenseNet121 and Faster R-CNN network. Training: not mentioned |
DenseNet121: P = 100%, R = 99%, F1 = 99%, SN = 98.75%, SP = 100%, AUC = 0.99 Faster R-CNN: P = 76.67%, R = 82.14% F1 = 79.31%, AUC = 0.79 |
The DenseNet121 and faster R-CNN algorithm were proved to offer the acceptable potential for the classification and detection of cancerous lesions in oral photographic images. |
17 | Jubair et al34 Jordan | Total patients = 543 Total images: 716 Suspicious images (OC and oral dysplasia) = 236 Benign lesions = 480 |
Photographic images: tongue |
Learning machine: CNN (EfficientNet-B0) Training: 5 epochs, Bootstrapping = 120 repetitions |
SP = 84.5%, SN = 86.7%, AC = 85.0%, AUC = 0.911 | Deep CNN using EfficientNet-B0 transfer model can be used for detection of cancerous or potentially malignant oral lesions with high levels of accuracy, sensitivity, and specificity. |
AC, accuracy; AFI, auto-fluorescence imaging ; AUC, area under the curve CLE, confocal laser endomicroscopy; CNN, convolutional neural network; CVD, clinical validation dataset; DBN, deep belief network; DCNNdeep convolutional neural network; EVD, external validation dataset; GDC, genomic data commons; GPS, BioGPS data portal; HG-SCC, high grade squamous cell carcinoma; ; IVD, internal validation dataset; ; LG-OSCC, low grade squamous cell carcinoma; ; OSCC, oral squamous cell carcinoma; NPV, negative predictive value;P, precision; ppf, patch probability fusion; PPV, positive predictive value; QDA, quadratic discriminant analysis; SN, sensitivity; SP, specificity; TCIA, the cancer imaging archive; WLI, white light imaging; SVM, support vector machine; OC, oral cancer.
Risk of bias and quality of the studies assessment
A prediction model risk of bias assessment tool (PROBAST tool) for nonrandomised studies was used to assess the risk of bias and applicability of the studies19 (Table 2). PROBAST is a collection of 20 questions from 4 different domains (participants, predictors, outcomes, and analysis). Yes, probably yes, probably no, no, or no information was provided as response for each question. A domain should have had all questions answered with yes or probably yes to be considered low risk. If at least one question in a domain was answered no or probably no, the study was classified as having a high risk of bias unless the assessors determined that the risk was low or uncertain based on the overall indicators. Similarly, to be considered an unclear risk, at least one domain was rated as having an unclear risk of bias, whereas the other domains were rated as having a low risk of bias.
Table 2.
Author | Type of study |
Risk of bias |
Applicability |
Overall |
||||||
---|---|---|---|---|---|---|---|---|---|---|
Participant selection | Predictors | Outcome | Analysis | Participant selection | Predictors | Outcome | Risk of bias | Applicability | ||
Welikala et al7 India |
Development and validation | – | + | + | + | – | + | + | – | – |
Majumder et al8 India |
Development and validation | – | + | – | – | – | + | + | – | – |
Das et al20 India |
Development and validation | + | + | + | + | + | + | + | + | + |
Uthoff et al21 India | Development and validation |
+ | + | + | + | + | + | + | + | + |
Song et al22 India | Development | + | + | + | + | + | + | + | + | + |
Song et al23 India | Validation | + | + | + | + | + | + | + | + | + |
Jeyaraj et al24 India | Development and validation | + | + | + | + | + | + | + | + | + |
Rahman et al25 India | Development | + | + | + | + | + | + | + | + | + |
Shahul Hameed et al26 India |
Development and validation | + | + | + | + | + | + | + | + | + |
Duran-Sierra et al27 USA | Validation | – | + | ? | + |
- |
- |
+ | – | – |
Schwarz et al28 USA | Development and validation | + | + | + | + | + | + | + | + | + |
Song et al29 USA | Development and validation |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
Fu et al30 China | Development and validation | + | + | – | – | + | + | – | – | + |
Lin et al31 China | Development |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
Aubreville et al32 Germany | Development and validation | + | + | + | + | + | + | + | + | + |
Warin et al33 Thailand |
Development and validation |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
Jubair et al34 Jordan | Development and validation | + | + | ? | + | + | + | + | + | + |
+, low risk of bias/low concerns regarding applicability; −, high risk of bias/high concerns regarding applicability; ?, unclear risk of bias/unclear concerns regarding applicability.
Data synthesis
The collected data and main findings are presented in the form of narrative synthesis. Due to the heterogeneity amongst the selected studies, formal quantitative syntheses were not conducted.
Results
Literature search
The kappa value was 0.85; therefore, an agreement amongst the 3 investigators was almost perfect. Through electronic and manual searches, 606 articles were identified (PubMed, 90; Scopus, 192; EBSCO, 181; OVID, 138; and manual search, 5) (Figure 1). After the duplicate removal process, 328 articles remained. The titles and abstracts of the 328 records were examined on the basis of predefined eligibility criteria. Consequently, 296 articles were excluded because they were off-topic. The full text of the remaining 32 articles was carefully read by 2 reviewers for potential inclusion. The articles were narrowed down to 17 articles selected to draw the results of the systematic review. However, the remaining 15 articles were excluded because either their AI model was utilised for reasons other than OC diagnosis, AI was not utilised for OC early detection purposes, or samples used were not presented as histologic or photographic images. The process of study selection is documented in the PRISMA flowchart in Figure 1.
Study quality assessment
Using the PROBAST checklist, 13 studies were assessed as having a low risk of bias, and 4 studies were rated as having a high risk of bias. In terms of applicability, 14 studies were reasonably applicable (Table 2).
Study characteristics
Demographic characteristics
The total number of patients from the included studies was 7245, and the total number of images analysed was 69,425. Seventeen studies were from various countries, with India accounting for 9 of them.7,8,20, 21, 22, 23, 24, 25, 26 Three studies27, 28, 29 were conducted in the United States, 230,31 were performed in China, and the other studies were carried out in Germany,32 Taiwan,33 and Jordan.34
The sample size was calculated on the basis of the number of patients recruited, and 4 studies8,21,25,27 had fewer than 100 patients. The smallest number of patients was 12.18 The largest sample size was 502,529. In terms of image count, the minimum number of histologic images was 3522, whilst the largest was 44,40,930.
Study designs
All the selected studies were clinical trials. Nine were case-control studies,7,8,20,21,23,28,30,33,34 7 were comparative studies,22,24, 25, 26, 27,29,31 and only one was a retrospective study,30 with several of them employing various statistical procedures for a range of AI technologies.
The studies provide 7 forms of AI, including several types of supervised classical machine learning models and deep learning. In most investigations, deep learning has been used to detect OCs. Nonetheless, in terms of frequency of use, deep learning (convolutional neural network [CNN]) was used in 11 studies,7,20,21,23,24,29, 30, 31, 32, 33, 34 whilst 6 studies used machine learning.8,22,25, 26, 27, 28 The most frequently used subtype of the supervised machine learning approach is the support vector machine, which was used in 4 studies.25, 26, 27, 28 Three studies used smartphone applications,14,21,31 all of which used deep learning techniques. Figure 2 compares the AI models used along with their frequencies amongst the 17 studies.
Study comparator
Uthoff et al sorted samples into the suspicious and nonsuspicious categories.21 Other studies8,20,23,24,27, 28, 29,31 offered an AI model that could categorise lesions as normal, precancerous, or cancerous, with or without additional categorisation of the samples into various stages of OC. Five studies25,26,30,32,33 presented AI methods to categorise samples using binary classification as normal or malignant. Jubair et al34 divided the samples into benign or suspicious (malignant or premalignant). Furthermore, Schwarz et al presented an AI that can categorise samples into a range of normal to mild dysplasia (negative) vs moderate dysplasia to cancer (positive).28
Welikala et al divided the samples into 5 categories: no lesion, no referral needed, refer for other reasons, refer- low risk of potentially malignant disorders (OPMD), and refer cancer high-risk OPMD.7 Other studies22,27 categorised samples as positive or negative based on staining intensity.
Study outcome
Table 1 summarises the findings of the study. Various statistical tests have been used to test and verify the efficacy of machine learning in OC diagnosis. Accuracy, sensitivity, specificity, and AUC were employed in most of the investigations. Eleven studies utilised accuracy to assess the efficacy of AI technology.
The overall accuracy rate ranged from 43.5%25 to 100%.8 Eight of the 11 articles had an accuracy of at least 90%.8,20,22, 23, 24, 25, 26,30 Three investigations had an accuracy rating of less than 90%.29,32,34 Deep learning yielded an accuracy range between 81%29 and 96.88%.20 However, the range of values for supervised machine learning ranges from 43.5%25 to 100%.8
Thirteen studies examined the effectiveness of AI in diagnosing OC in terms of its sensitivity. Seven studies20,22,24,25,27,30,33 reported a sensitivity of 90% or more. Moreover, 6 studies21,28,29,31,32,34 reported a sensitivity of less than 90%. The sensitivity of deep learning ranged from 79%29 to 98.75%.33 However, supervised machine learning ranged between 94%27 and 100%.25
Specificity was assessed in 12 studies to measure AI efficiency. Six studies had a result value equal to or greater than 90%.22,24,25,31, 32, 33 In contrast, six investigations reported a specificity result value of less than 90%.21,27, 28, 29, 30,34
For deep learning, specificity ranged between 80.6%30 and 100%,33 whereas supervised machine learning scored between 16% and 100%.25
Seven of the 17 studies employed AUC to assess the efficiency of the AI machine. AUC values of more than 0.9 were found in 7 investigations.21,24,28,30,32, 33, 34
Some studies utilised different statistical methods to assess AI performance, such as the F1 score,7,27,31,33 recall,7,33 precision,7,31,33 positive predictive value, and negative predictive value.21
Discussion
The main goal of this systematic review was to evaluate the effectiveness of AI in detecting and screening for OC using photographic and histologic images. Most of the studies included in this systematic review showed that machine learning models can detect OC with excellent accuracy, sensitivity, and specificity. Current advancements in machine learning algorithms allow the detection of OC using an efficient and noninvasive technique with a performance comparable to that of human specialists.30 Although the oral cavity is accessible during a normal checkup, many cancers are not discovered until they are advanced.7 Experts can detect OCs through visual inspection based on the clinical appearance of the lesion. Using AI as a more accurate and quick method for diagnosing OC in its early stages may be one of the most effective ways to decrease death rates. Currently, there is growing interest in using AI in oncology to improve the accuracy and efficacy of screening suspected lesions.
Machine learning vs deep learning methods
All selected studies in this systematic review utilised supervised machine learning and deep learning models, with 6 studies using supervised machine learning and 11 studies using deep learning methods (Figure 2). Studies that used deep learning had an accuracy range of 72% to 99.2%, whereas machine learning had a range of 43.5% to 100%.7,8,20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 Modalities employing deep learning show consistent results with a narrow range of accuracy, whereas machine learning shows a wide range of differences, making the machine learning results or performance somewhat unpredictable.
Overall performance
Regarding the overall performance of deep learning, the highest result was reported in 4 studies. In a study by Uthoff et al, who used a deep learning approach using smartphone data transmission power to discriminate between suspicious and nonsuspicious lesions, they obtained a minimum risk of bias based on the probability scoring system with an AUC of 0.908.21 In contrast, the Gabor texture descriptor was employed by Das et al to identify keratin pearl from non-pearl regions.20 They discovered that the colours of the 3 primary constituent layers, epithelium, subepithelial, and keratin areas, could be discriminated.20 Fu et al analysed 44,409 images, and they yielded a high accuracy even though a large sample was utilised.30 Fu et al employed a detection network to take an oral photograph as the input and create a single bounding box that indicates the probable lesion. The lesion region was trimmed as a candidate patch based on the detection results obtained in the first step. The candidate patch was then provided to a classification network, which produced a list of 2 confidence ratings in the range of 0 to 1 for patients with OSCC and controls.30 Because the photographs used to train the deep neural networks may not accurately reflect the diversity and heterogeneity of oral disease lesions, the algorithm cannot make reliable predictions for other oral lesions. Seven studies used the AUC to evaluate the proposed machine learning method. The highest AUC score was 99.5% for the deep CNN using photographic images in the secondary analysis of the internal validation data set.30 Rahman et al scored the highest value in terms of accuracy, sensitivity, and specificity using a support vector machine classifier and logistic regression.25 In contrast, the K-nearest neighbour classifier scored the lowest for accuracy, specificity, and AUC.33
AI accuracy for histopathologic images
The histopathologic analysis is the gold standard for the detection and diagnosis of OC. However, this method relies on subjective analyses, which makes screening accuracy by the clinician subjective.6 When histopathologic samples are examined for OC, certain features and characteristics allow the pathologist to determine whether a patient presents with malignancy and to identify the stage. Sometimes, as the manual evaluation of samples for diagnostic features requires quantification, there is a chance for error, which inevitably leads to inaccurate results.6 Consequently, AI has reduced such errors and improved the efficiency and accuracy of detecting the cytologic and histologic features of OC. Moreover, AI technology can process large sample sizes to detect OC. Two types of samples were used in the selected studies: biopsy and histologic samples and photographic images. Six studies used biopsy and histologic samples.8,20,22,25, 26, 27 Some studies that examined cellular changes to differentiate malignant samples from normal and abnormal cell nuclei have defined them as a marker.22,25,26 Das et al inspected epithelial changes by detecting keratin pearls in the oral mucosa of patients with OC using the proposed segmentation method.20 They quantified the keratinisation layer, which was successful with their proposed CNN machine because this parameter is significant in determining the stage of OC.20
Future perspectives, translational value, and limitations
Researchers have found that deep learning aids pathologists in the effective multiclass classification of cancer. This enables the oncology team to deliver an effective treatment plan, whilst minimizing the overall workload. Additionally, deep learning models can categorise patients into high- or low-risk categories, thus aiding oncologists in deciding whether to choose a radical or conservative treatment approach for the patient. This could exclude patients in low-risk categories from the harmful effects of the radical approach.35,36 Although these factors strongly favour the translation of AI-based research into clinical oncology practice, there are a few limitations. Privacy and confidentiality of patient data remain major hurdles in the clinical application of AI in oncology.37 There is also a question of owning the responsibility (doctor or software) in case of an error in AI-based analysis. Apart from these factors, the patient's autonomy and relationship with the treating clinician are affected by the introduction of AI in oncology practise.37
Conclusions
This systematic review supports that machine learning yields accurate results for detecting OC, which is of great assistance for pathologists to improve their diagnostic results and minimise the chance of error. Furthermore, studies that ranked the strongest based on their evidence have applied deep learning (neural networks), which indicates a high performance and thus is more accurate.
Author contributions
Al-Rawi NH: Conceptualisation (lead); supervision (lead); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Sultan A: Validation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Rajai B: Validation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Shuaeeb H: Validation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Alnajjar M: Validation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Alketbi M: Validation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Mohammad Y: Validation (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Shetty SR: Validation (lead); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Mashrah MA: Conceptualisation (lead); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing original draft (equal).
Conflict of interest
None disclosed.
Editor: Professor L. Samaranayake
REFERENCES
- 1.Lewellyn CD, Johnson NW, Warnakulasuriya KA. Risk factors for squamous cell carcinoma of the oral cavity in young people—a comprehensive literature review. Oral Oncol. 2001;37:401–418. doi: 10.1016/s1368-8375(00)00135-4. [DOI] [PubMed] [Google Scholar]
- 2.Ferlay J, Colombet M, Soerjomataram I, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer. 2019;144:1941–1953. doi: 10.1002/ijc.31937. [DOI] [PubMed] [Google Scholar]
- 3.Ilhan B, Guneri P, Wilder-Smith P. The contribution of artificial intelligence to reducing the diagnostic delay in oral cancer. Oral Oncol. 2021;116 doi: 10.1016/j.oraloncology.2021.105254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Krishna AB, Tanveer A, Bhagirath PV, et al. Role of artificial intelligence in diagnostic oral pathology—a modern approach. J Oral Maxillofac Pathol. 2020;24:152–156. doi: 10.4103/jomfp.JOMFP_215_19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lokesh K, Kannabiran J, Rao MD. Salivary lactate dehydrogenase (LDH)—a novel technique in oral cancer detection and diagnosis. J Clin Diagn Res. 2016;10:ZC34–ZC37. doi: 10.7860/JCDR/2016/16243.7223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ilhan B, Lin K, Guneri P, et al. Improving oral cancer outcomes with imaging and artificial intelligence. J Dent Res. 2020;99:241–248. doi: 10.1177/0022034520902128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Welikala RA, Remagnino P, Lim JH, et al. Automated detection and classification of oral lesions using deep learning for early detection of oral cancer. IEEE Access. 2020;8:132677–132693. [Google Scholar]
- 8.Majumder SK, Gupta A, Gupta S, et al. Multi-class classification algorithm for optical diagnosis of oral cancer. J Photochem Photobiol B. 2006;85:109–117. doi: 10.1016/j.jphotobiol.2006.05.004. [DOI] [PubMed] [Google Scholar]
- 9.Chan CH, Huang TT, Chen CY, et al. Texture-map-based branch-collaborative network for oral cancer detection. EEE Trans Biomed Circuits Syst. 2019;13:766–780. doi: 10.1109/TBCAS.2019.2918244. [DOI] [PubMed] [Google Scholar]
- 10.Lu J, Sladoje N, Stark CR, et al. A deep learning based pipeline for efficient oral cancer screening on whole slide images. arXiv. 2020;1910:1054. [Google Scholar]
- 11.De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–1350. doi: 10.1038/s41591-018-0107-6. [DOI] [PubMed] [Google Scholar]
- 12.Ehteshami Bejnordi B, Veta M, Johannes van Diest P, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318(22):2199–2210. doi: 10.1001/jama.2017.14585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24:1559–1567. doi: 10.1038/s41591-018-0177-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Song B, Sunny S, Uthoff RD, et al. Automatic classification of dual-modalilty, smartphone-based oral dysplasia and malignancy images using deep learning. Biomed Opt Express. 2018;10:5318–5329. doi: 10.1364/BOE.9.005318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wang F, Casalino LP, Khullar D, et al. Deep learning in medicine—promise, progress, and challenges. JAMA Intern Med. 2019;179:293–294. doi: 10.1001/jamainternmed.2018.7117. [DOI] [PubMed] [Google Scholar]
- 16.Hu L, Bell D, Antani S. An observational study of deep learning and automated evaluation of cervical images for cancer screening. JNCI J Natl Cancer Inst. 2019;111:923–932. doi: 10.1093/jnci/djy225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.de Haan K, Koyedemir HC, Rivenson Y, et al. Automated screening of sickle cells using a smartphone-based microscope and deep learning. npj Digit Med. 2020;3(76) doi: 10.1038/s41746-020-0282-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Moons KM, Wolff RF, Riley RD, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170:W1–33. doi: 10.7326/M18-1377. [DOI] [PubMed] [Google Scholar]
- 20.Das DK, Bose S, Maiti AK, et al. Automatic identification of clinically relevant regions from oral tissue histological images for oral squamous cell carcinoma diagnosis. Tissue Cell. 2018;53:111–119. doi: 10.1016/j.tice.2018.06.004. [DOI] [PubMed] [Google Scholar]
- 21.Uthoff RD, Song B, Sunny S, et al. Point-of-care, smartphone-based, dual-modality, dual-view, oral cancer screening device with neural network classification for low-resource communities. PLoS ONE. 2018;13(12) doi: 10.1371/journal.pone.0207493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shahul Hameed KA, Shaheer Abubacker KA, Banumathi A, et al. Immunohistochemical analysis of oral cancer tissue images using support vector machine. Measurement. 2020;173 [Google Scholar]
- 23.Song B, Sunny S, Li S, et al. Bayesian deep learning for reliable oral cancer image classification. Biomed Opt Express. 2021;12:6422–6430. doi: 10.1364/BOE.432365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jeyaraj PR, Nadar ES. Computer-assisted medical image classification for early diagnosis of oral cancer employing deep learning algorithm. J Cancer Res Clin Oncol. 2019;145:829–837. doi: 10.1007/s00432-018-02834-7. [DOI] [PubMed] [Google Scholar]
- 25.Rahman TY, Mahanta LB, Das AK, et al. Automated oral squamous cell carcinoma identification using shape, texture and color features of whole image strips. Tissue Cell. 2020;63 doi: 10.1016/j.tice.2019.101322. [DOI] [PubMed] [Google Scholar]
- 26.Rahman TY, Mahanta LB, Choudhury H, et al. Study of morphological and textural features for classification of oral squamous cell carcinoma by traditional machine learning techniques. Cancer Rep. 2020;3:e1293. doi: 10.1002/cnr2.1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Duran-Sierra E, Cheng S, Cuenca R, et al. Machine-learning assisted discrimination of precancerous and cancerous from healthy oral tissue based on multispectral autofluorescence lifetime imaging endoscopy. Cancers (Basel) 2021;13:4751. doi: 10.3390/cancers13194751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schwarz RA, Gao W, Redden Weber C, et al. Noninvasive evaluation of oral lesions using depth-sensitive optical spectroscopy. Cancer. 2009;115(8):1669–1679. doi: 10.1002/cncr.24177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Song B, Sunny S, Li S, et al. Mobile-based oral cancer classification for point-of-care screening. J Biomed Opt. 2021;26 doi: 10.1117/1.JBO.26.6.065003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fu Q, Chen Y, Li Z, et al. A deep learning algorithm for detection of oral cavity squamous cell carcinoma from photographic images: a retrospective study. EClinicalMedicine. 2020;27 doi: 10.1016/j.eclinm.2020.100558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lin H, Chen H, Weng L, et al. Automatic detection of oral cancer in smartphone-based images using deep learning for early diagnosis. J Biomed Opt. 2021;26 doi: 10.1117/1.JBO.26.8.086007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Aubreville M, Knipfer C, Oetter N, et al. Automatic classification of cancerous tissue in laserendomicroscopy images of the oral cavity using deep learning. Sci Rep. 2017;7:11979. doi: 10.1038/s41598-017-12320-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Warin K, Limprasert W, Suebnukarn S, et al. Automatic classification and detection of oral cancer in photographic images using deep learning algorithms. J Oral Pathol Med. 2021;50:911–918. doi: 10.1111/jop.13227. [DOI] [PubMed] [Google Scholar]
- 34.Jubair F, Al-Karadsheh O, Malamos D, et al. A novel lightweight deep convolutional neural network for early detection of oral cancer. Oral Dis. 2021 doi: 10.1111/odi.13825. Online ahead of print. [DOI] [PubMed] [Google Scholar]
- 35.Das N, Hussain E, Mahanta LB. Automated classification of cells into multiple classes in epithelial tissue of oral squamous cell carcinoma using transfer learning and convolutional neural network. Neural Netw. 2020;128:47–60. doi: 10.1016/j.neunet.2020.05.003. [DOI] [PubMed] [Google Scholar]
- 36.Alabi RO, Bello IO, Youssef O, et al. Utilizing deep machine learning for prognostication of oral squamous cell carcinoma-a systematic review. Front Oral Health. 2021;2 doi: 10.3389/froh.2021.686863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Alabi RO, Tero V, Mohammed E. Machine learning for prognosis of oral cancer: what are the ethical challenges? CEUR-Workshop Proceedings. 2020;2373:1–22. [Google Scholar]