Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2024 Dec 13;3(12):e0000692. doi: 10.1371/journal.pdig.0000692

Classification of periodontitis stage and grade using natural language processing techniques

Nazila Ameli 1,*, Tahereh Firoozi 1, Monica Gibson 2, Hollis Lai 1
Editor: Wisit Cheungpasitporn3
PMCID: PMC11642968  PMID: 39671337

Abstract

Periodontitis is a complex and microbiome-related inflammatory condition impacting dental supporting tissues. Emphasizing the potential of Clinical Decision Support Systems (CDSS), this study aims to facilitate early diagnosis of periodontitis by extracting patients’ information collected as dental charts and notes. We developed a CDSS to predict the stage and grade of periodontitis using natural language processing (NLP) techniques including bidirectional encoder representation for transformers (BERT). We compared the performance of BERT with that of a baseline feature-engineered model. A secondary data analysis was conducted using 309 anonymized patient periodontal charts and corresponding clinician’s notes obtained from the university periodontal clinic. After data preprocessing, we added a classification layer on top of the pre-trained BERT model to classify the clinical notes into their corresponding stage and grades. Then, we fine-tuned the pre-trained BERT model on 70% of our data. The performance of the model was evaluated on 32 unseen new patients’ clinical notes. The results were compared with the output of a baseline feature-engineered algorithm coupled with MLP techniques to classify the stage and grade of periodontitis. Our proposed BERT model predicted the patients’ stage and grade with 77% and 75% accuracy, respectively. MLP model showed that the accuracy of correct classification of stage and grade of the periodontitis on a set of 32 new unseen data was 59.4% and 62.5%, respectively. The BERT model could predict the periodontitis stage and grade on the same new dataset with higher accuracy (66% and 72%, respectively). The utilization of BERT in this context represents a groundbreaking application in dentistry, particularly in CDSS. Our BERT model outperformed baseline models, even with reduced information, promising efficient review of patient notes. This integration of advanced NLP techniques with CDSS frameworks holds potential for timely interventions, preventing complications and reducing healthcare costs.

Author summary

In our study, we aimed to enhance the early diagnosis of periodontitis, a complex inflammatory dental condition, by developing a Clinical Decision Support Systems (CDSS) through the application of natural language processing (NLP) techniques. Specifically, we developed a CDSS utilizing bidirectional encoder representation for transformers (BERT) to predict the stage and grade of periodontitis by analyzing patients’ dental charts and notes. We conducted a secondary data analysis using anonymized patient records from a university periodontal clinic, comparing the performance of our BERT model with a baseline feature-engineered model. Our results demonstrated that our BERT model achieved higher accuracy in predicting both the stage and grade of periodontitis compared to the baseline model. Specifically, the BERT model predicted the stage and grade with 77% and 75% accuracy, respectively, outperforming the baseline model’s accuracy of 59.4% and 62.5%. This novel application of BERT in dentistry, particularly in CDSS, holds promise for a more efficient review of patient notes, enabling timely interventions to prevent complications and reduce healthcare costs.

Introduction

Periodontitis is a multifactorial and microbiome-associated inflammatory disease that occurs in the dental supporting tissues [1,2]. Progression of the disease can adversely affect oral and systemic health and result in tooth loss, reduction of masticatory performance [3] as well as having association with diabetes [4], and rheumatoid arthritis [5]. Thus, periodontitis and its complications will impose substantially negative effects on oral health related quality of life (OHRQoL), while successful and timely diagnosis and management of the disease may improve patients’ OHRQoL [3,6]. Moreover, early detection and diagnosis of periodontitis can help in preventing the consequent costly and invasive dental treatment [2].

According to the 2017 World Workshop on the classification of Periodontal and Peri-implant Diseases and Conditions [1], the recent method for classifying periodontitis is based on staging and grading. Staging is determined through the severity of the disease and complexity of its management, while grade is an indicator of the rate of periodontitis progression assessed according to the history and the presence of risk factors for the disease. According to the criteria defined for stage and grade of the periodontal disease [1], clinicians traditionally analyze patients’ systemic, clinical, and radiographic data collected in periodontal charts over time to determine the stage and grade of periodontitis. In recent decades, electronic dental records (EDR) have been introduced for collection of patient data, which are found to have superiority over paper charts in terms of higher storage capacity, time efficiency, ease of information retrieval and accessibility [7]. In addition, new computerized techniques known as artificial intelligence (AI) and its subsets like machine learning (ML) provide opportunities to extract valuable information from complex data and analyze various relationships to benefit patients [8]. AI is an evolving field that seeks to automate tasks that would require intelligence if it was done by humans. ML also has the potential to monitor and detect patterns of patient presentations and risk factors [9,10].

Recently, neural networks (NN) models- a powerful tool in ML- are gaining attraction and are being extensively used in the area of diagnosis [11], prognosis [12], classification [13], and predictions [14,15] due to their ability of modelling non-linear relationships among the hidden variables in different formats of data including images and texts. Moreover, advancement in text mining and natural language processing (NLP) techniques—also a branch of AI techniques- such as Bidirectional Encoder Representation for Transformers (BERT), significantly improved the accuracy of text classification systems using deep learning models (DL) [16]. NLP methods and text mining approaches have been widely used for extracting information from the patient’s electronic records in medicine and dentistry [1720]. Chen et al., applied NLP techniques by implementing Sentence2vec and Word2vec approaches to learn sentence vectors and word vectors, and to extract information from Chinese EDR. They reported that their NLP workflow can efficiently structure narrative text from EDR [21]. Patel et al. in a recent study developed and applied two automated NLP algorithms (approximate string-matching function and Levestein Distant Function) to extract information from clinical notes to track periodontal disease change over time using longitudinal EDR. They concluded that utilizing longitudinal EDR data to track disease changes over 15 years was a feasible method that could be applied for studying clinical courses using AI and ML methods [22].

BERT is a language model which uses transformers for text representation, and supports the pretraining and fine tuning of the model in the case of textual data. Application of BERT and related architectures have resulted in considerable improvements in multiple medical applications, including processing of electronic health records [23], outcome prediction [24], identification of medical terms and concepts [25], and others. Despite its success in the medical field, there are no studies that assess the BERT application in predicting the stage and grade of the periodontal disease using patients’ textual notes. Regular expression (RegEx) is another powerful tool for pattern matching and text analysis. By defining a sequence of characters in a string using literal text or special characters with specific meaning, the tool is able to search, match, and manipulate text data.

It has been shown that AI techniques such as text mining and DL algorithms can be successfully applied for extracting semantic information from narrative patients’ charts in dentistry, classifying the periodontal disease and predicting the factors influencing the periodontitis occurrence [8,26]. Compared to the rule-based approaches using NLP tools, DL models do not depend on the grammatical accuracy of sentences, so can extract implicit aspects following their identification [27]. However, the use of AI and DL techniques to review narrative patients’ charts and determine the contribution of its information to the stage and grade of periodontitis in addition to clinical and radiographic findings has not yet been studied to the authors’ knowledge. Thus, the aim of the present study is to introduce and compare automated methods to:

  1. Extracting crucial latent information from patients’ unstructured notes and chart data, employing a baseline rule-based model (RegEx),

  2. Employing the extracted information to classify the stage and grade of periodontitis using an MLP algorithm,

  3. Automating the classification of patients with periodontitis based on unstructured patient notes, utilizing BERT and classification models.

  4. Comparing and contrasting the result of pretrained models including BERT with that of rule-based models Regex.

This research aims to pioneer innovative approaches within dentistry, particularly in leveraging CDSS, to enhance early diagnosis and treatment of periodontitis.

Methods

To demonstrate the applicability of the proposed method for accurate and timely classification of the stage and grade of the periodontitis using the possible contributing factors mentioned in clinicians’ notes and periodontal charts, we conducted a secondary data analysis on the intake patient charts and textual notes referred to a University Periodontal Graduate Clinic from 2017 to 2021. Since our research predominantly relied on secondary data analysis, obtaining patient consent was not relevant. The data were anonymized and the University Human Research Ethics Board approved this study (Pro00107743). Three hundred and nine patient charts and relevant notes were collected in the initial visit.

The proposed method consists of two phases of AI algorithms to facilitate information extraction and classification. In the initial phase, the state-of-the-art BERT model was employed for comprehensive text analysis and classification of the periodontitis stage and grade. The application of transformers in NLP tasks was accelerated with the advent of a specific transformer-based language model called the Bidirectional Encoder Representation for Transformers or BERT [28]. BERT is a transformer-based encoder model for language representation that uses a multi-head attention mechanism and a bidirectional approach to learn the contextual relations between words and sentences in a text for an accurate representation of the entire text. [29].

For training the BERT model, three hundred and nine patient clinical notes belonging to 309 patients referred to the University Periodontal Graduate Clinic were selected among the pool of 1513 patients’ charts. These clinical notes met the inclusion criteria, which involved having the stage and grade of the disease determined by a graduate student and confirmed by a periodontist. The prevalence of different periodontitis stage and grades in the collected notes were as follows: stage I = 6, stage II = 36, stage III = 206, stage IV = 61, and grade A = 17, grade B = 202, grade C = 87.

We first imported the 309 textual patients’ notes coupled with their stage and grade levels into the Google Colaboratory and preprocessed the texts for computational analysis. The data preprocessing stage involved meticulous steps to prepare the textual patient notes for analysis. This included tokenization, addition of special tokens such as [CLS] and [SEP], and strategies like padding and truncation to ensure uniformity in the input data format [30]. Furthermore, the dataset was split into training (80%) and testing (20%) subsets to evaluate model performance effectively.

After preprocessing the data, the necessary libraries including the “class_weight” module from scikit-learn [31], which aids in addressing class imbalance issues in classification tasks, were imported. The predefined window length in our study was 100. This number is determined based on the distribution of essay length in the proposed dataset. The model architecture employed combines elements of BERT and LSTM (Long Short-Term Memory) layers to address the classification task. BERT, a pre-trained language model, serves as the foundation for understanding contextual information within textual data [29]. Specifically, the "bert-base-uncased" model was utilized, known for its effectiveness in various NLP tasks.

The input sequences were passed through the BERT model, and a recurrent network was used to aggregate the multiple embeddings into a single embedding after the transformer window swipes over the long text. The LSTM model was used as the recurrent layer to aggregate the BERT output at each time step. This model combines the data into a sequence of vectors having the same length relative to their temporal position and temporal dependency with respect to the features in the essays [32]. LSTM was selected over other potential operations because the LSTM layers tend to produce more accurate modelling results of deep connections between sequential features that can be used to improve score prediction for text classification tasks [32]. By combining BERT’s contextual understanding with LSTM’s ability to model sequential data, the model aimed to capture both local and global dependencies within the input data, enhancing its predictive capabilities.

The output of the LSTM layer was subsequently passed through additional dense and dropout layers with ReLU activation functions. These layers enabled the model to learn higher-level features and prevent overfitting by introducing regularization. The final layer consisted of a fully connected neural network layer with softmax activation, producing classification probabilities for each class. For each input text, the selected text stage or text grade was the class with the highest probability in the neural network’s output. The number of units in this layer corresponds to the number of unique classes present in the training data (4 stages and 3 grades).

For model compilation, the Adam optimizer was employed with a predefined learning rate of 5e-6. The choice of sparse categorical cross-entropy as the loss function was appropriate for multi-class classification tasks, facilitating the comparison between predicted and actual class labels. During model training, accuracy was used as the evaluation metric to provide insight into the model’s performance across different epochs.

To address potential class imbalance issues, class weights were computed and incorporated into the training process. These weights ensure that the model assigns appropriate importance to each class during optimization, thereby mitigating the impact of imbalanced data distributions. Additionally, early stopping was implemented as a regularization technique to prevent overfitting based on validation loss. Through monitoring the validation loss, the early stopping mechanism terminates training when the model’s performance on unseen data begins to deteriorate. By restoring the best weights observed during training, this approach helps to optimize the model’s generalization ability and improve its performance on unseen data.

The model was trained over 40 epochs with a batch size of 20 for both stage and grade and the generalizability of the model was evaluated by testing on a set of 32 new unseen patients’ charts (stage I = 1, stage II = 4, stage III = 21, and stage IV = 6, grade A = 2, grade B = 21, and grade C = 9) using the evaluation metrics. Table 1 represents the model architecture and selected hyperparameters for training.

Table 1. Architecture and hyperparameters for the BERT model.

Layer Parameter Name Candidate Values Selected Value
BERT Number of parameters 110M 110M
Transformer blocks 12 layers 12 layers
Attention heads 12 12
Hidden neurons 768 768
Dropout Dropout rate 0.2–0.5 0.2
LSTM Decay Rate 0.96–0.97 0.97
Activation Function ReLU, sigmoid ReLU
Learning Rate 0.1-10e-7 5e-6
Momentum 0.5–0.9 0.5
Dense Neurons 25–100 64
Model Compile Epoch 15–50 40
Batch Size 12–36 20

In the second phase, we developed a baseline rule-based model by employing traditional text mining and pattern extraction techniques followed by the utilization of an MLP model, for predictive classification. This phase involved mining textual data and extracting pertinent patterns to facilitate the MLP in discerning and categorizing the data accurately. Clinicians’ notes, along with periodontal charts containing essential clinical and radiographic findings, served as primary data sources.

Text mining is a computerized technique to extract key information from vast quantities of textual data. It can be used for information retrieval, information extraction, and text categorization as a powerful research tool [33]. In the field of dentistry, text mining has been shown to be a valuable method of extracting latent (unknown) patterns from patient charts [15]. To complete our text mining, patient clinical notes were first imported and compiled into Google Colaboratory as individual text files. Then, a data-frame was made containing two columns: patients’ chart numbers and clinicians’ notes as unstructured texts. Upon conducting an initial assessment of the document, specifically with regard to the structure of the conformance rules text, we employed regular expressions (RegEx) to exclusively extract the unformatted text of those rules and the dependent variables (stage/grade), and possible contributing factors (including the patients’ medical and dental history), and to identify and locate the pertinent patterns [34]. With RegEx search, particular strings of characters can be done using pattern matching, which is in contrast to constructing multiple, literal search queries [35]. To make the RegEx insensitive to the lowercase/capital letters, we used the command “re.IGNORECASE” following the definition of the pattern of each variable.

After finding the pattern for all possible variables embedded in clinicians’ notes, we created a new data-frame with 16 columns including: patient chart number, date, stage, grade, systolic blood pressure, diastolic blood pressure, heart rate, tooth stain, smoking history, plaque, calculus, bone loss, tooth mobility, allergies, history of previous periodontal surgery, and diabetes. During this phase, one limitation we encountered was that not all records provided comprehensive information about patients’ clinical and dental history including details such as history of previous periodontal surgery, smoking history, and amount of stain on the teeth. To address this, we implemented the RegEx extractor aiming for maximum information detail. However, if certain structured pattern were missing, the RegEx would not generate an error but instead extract the available limited information from the clinical note. For instance, if the clinical note did not mention previous periodontal surgery GBR/GTR/FGG, the program would still extract this information in a structured format, leaving the category of previous surgery empty.

The output data-frame was cleaned before exporting as a CSV file to facilitate future ML analysis. Table 2, shows the variables and their descriptive values (mean ± SD, minimum and maximum, or codes used for variables with categorical scale).

Table 2. Descriptives of extracted variables from the textual notes and periodontal charts.

Variable Value (min-max, mean ± SD)/ Recoded values
Stage I = 1, II = 2, III = 3, IV = 4
Grade A = 1, B = 2, C = 3
Heart rate 44–118, 72.4 ± 10.45
Blood pressure (Systolic and diastolic blood pressure) 79–187, 129.3 ± 15.54
43–120, 78.1 ± 40.32
History of periodontal surgery (FGGa, GTRb, GBRc) No = 0, Yes = 1
Tooth stain No = 0, Light = 1, Medium = 2, Heavy = 3
Calculus No = 0, Light = 1, Medium = 2, Heavy = 3
Smoking history No = 0, Yes = 1
Bone loss No = 0, Yes = 1
Tooth mobility No = 0, Yes = 1
Diabetes No = 0, Yes = 1
Allergies No = 0, Yes = 1
Count of teeth with pockets 8–32
Pocket score Localized = 0, Generalized = 1
Count of teeth with CALd 8–32
CAL score Localized = 0, Generalized = 1

a free gingival graft

b guided tissue regeneration

c guided bone regeneration

d clinical attachment loss

Finally, patients’ periodontal charts including PPD, CAL, and the number of teeth with bleeding and/or plaque in CSV format were also imported into Google Colaboratory. Data cleaning and recoding were completed as explained above and the newly cleaned and re-coded periodontal charts were combined with the clinician’s notes into one data-frame and exported as a CSV file (Table 3).

Table 3. Example of the collected CSV data file extracted from the patients’ charts and clinical notes.

Variable Value
Chart Number 82 93 104
Date 2019-10-18 2019-06-17 2019-06-17
Systolic blood pressure 132 144 134
Diastolic blood pressure 78 88 86
Stage III IV III
Grade A C B
Heart rate 72 68 82
Stain Medium Heavy Heavy
Plaque Heavy Heavy Medium
Calculus Heavy Heavy Medium
Bone loss Generalized Generalized Generalized
Tooth mobility Yes Yes No
Allergy No No No
History of previous surgery No Yes No
Diabetes No Yes No
Smoking No Yes No
Pocket score Generalized Generalized Generalized
CAL score Generalized Generalized Generalized
Count of teeth with bleeding 12 23 14
Count of teeth with plaque 22 17 19

*CAL: clinical attachment loss, FGG: free gingival graft, GTR: guided tissue regeneration, GBR: guided bone regeneration

To classify the stage and grade of the periodontitis using contributing variables (15 extracted variables from clinicians’ notes and 4 variables from periodontal charts), we first imported the CSV file created through the previous steps into Google Colaboratory. First, the dependent (stage or grade) and independent variables were defined as y and x, respectively. Then, the dataset was split into training (70%) and testing (30%) sets to train the network for classifying the target variables. To design our MLP architecture, we defined the grid with two hidden layers and varying numbers of neuron units within each hidden layer (ranging from 4–20 nodes). We empirically concluded that for classifying both the grade and stage of the periodontal disease, by increasing the number of hidden layers beyond two, the algorithm performance in terms of accuracy decreases. Finally, for this multi-class classification problem, a softmax function was applied in the output layer to yield the probability of each class at each unit of the output layer. The Rectifier Linear Unit (ReLU) was used in all four hidden layers because models working with ReLUs are more easily optimized compared to networks with sigmoid or tanh units [36]. The grid output revealed the optimal number of nodes in each hidden layer for classifying the target variables as follows: 15 and 4 nodes in the first hidden layer and 13 and 4 nodes in the second hidden layer for classifying stage and grade, respectively.

For training a NN, there are several optimization algorithms to choose from. Optimizers are algorithms or methods used to change the attributes of the NN such as weights and learning rate in order to reduce the losses and increase the accuracy of the model [37]. One of the most popular is Adam, known for its efficiency and training speed [38]. To compile the model, we also used the “categorical_crossentropy” loss function. Finally, we fitted our model with the batch size of 16 and 30 epochs for both stage and grade outcomes. The evaluation metrics employed to present and interpret the results.

Evaluation of model performance

Evaluation metrics included accuracy, which is the ratio of correctly predicted observations to the total observations; recall, which is the ratio of true positives to the sum of true positives and false negatives; precision, which is the ratio of true positives to the sum of true positives and false positives; and the F1-score, which is the harmonic mean of precision and recall [39]. These metrics range from 0 to 1, with 1 indicating perfect performance [40].

Results

Our proposed BERT model for predicting the periodontitis stage and grade resulted in a high accuracy. The model predicted the patients’ stage with 77% accuracy. Although the accuracy of the BERT model for patients’ grades is high (accuracy = 75%), the performance of the model for predicting the stage of the patients’ is higher than the grade.

The model performed quite accurately in classifying stage III and grade B, while in other stages and grades, the model couldn’t perform accurately. The precision column indicated that our model was successful in assigning 91% and 65% of the patients in stage III and grade B to the right class, respectively. The recall column showed that 75% of the patients that truly belonged to stage III were identified correctly by our model; however, these values were found to be 47% and 35% for grades B and C, respectively. The F1-Score column, which is usually used to judge the overall performance of the model and is defined as the average of precision and recall values in the previous columns, showed that our proposed model performed highly accurately in predicting the patients with stage III (82%) compared to the periodontitis grade (Table 4).

Table 4. Confusion matrix evaluating the performance of the BERT model in classifying the stage and grade of the periodontal disease.

Stages/grades Precision Recall F-Score
I 0.11 0.25 0.15
II 0.25 0.30 0.27
III 0.91 0.75 0.82
IV 0.06 0.25 0.09
A 0.05 0.00 0.00
B 0.65 0.47 0.55
C 0.16 0.35 0.22

We implemented model interpretability techniques such as Local Interpretable Model-Agnostic Explanations (LIME) to analyze the predictions made by the BERT model. LIME helps in identifying the key features that influence the model’s decision-making process for classifying periodontitis stages and grades. This interpretability approach allows us to better understand the reasoning behind correct and incorrect classifications, providing insights into the strengths and weaknesses of the model. However, it is important to note that the primary aim of our study was not to conduct a detailed analysis of the specific features driving the predictions but to demonstrate the overall classification accuracy and applicability of these models for periodontal staging and grading. Fig 1 (A and B) illustrates the most influential words and features contributing to the model’s predictions for stage III and grade B as an example.

Fig 1. LIME-based interpretation of BERT model predictions.

Fig 1

(A) Key features for classifying stage III vs. non-stage III. (B) Significant words and features for distinguishing grade B vs. non-grade B.

Table 5 depicts a full list of the features extracted from the notes and charts using the RegEx. MLP analysis demonstrated that the model can correctly classify the stage and grade of periodontitis by 69.9% and 69.8% accuracy, respectively using the extracted variables from patients’ notes and periodontal charts. The highest accuracy was found to be for correct classification of stage III and grade B periodontitis (74.7% and 71.1%, respectively).

Table 5. Example of the collected CSV data file extracted from the patients’ charts and clinical notes.

Value
Chart Number 82 93 104
Date 2019-10-18 2019-06-17 2019-06-17
Systolic blood pressure 132 144 134
Diastolic blood pressure 78 88 86
Stage III IV III
Grade A C B
Heart rate 72 68 82
Stain Medium Heavy Heavy
Plaque Heavy Heavy Medium
Calculus Heavy Heavy Medium
Bone loss Generalized Generalized Generalized
Tooth mobility Yes Yes No
Allergy No No No
History of previous surgery No Yes No
Diabetes No Yes No
Smoking No Yes No
Pocket score Generalized Generalized Generalized
CAL score Generalized Generalized Generalized
Count of teeth with bleeding 12 23 14
Count of teeth with plaque 22 17 19

The performance of the model is shown in Table 6.

Table 6. Confusion matrix evaluating the performance of the NN model in classifying the stage and grade of the periodontal disease using the contributing factors extracted from patients’ charts by Regex and MLP.

Stages/grades Precision Recall F-Score
I 0.00 0.00 0.00
II 0.00 0.00 0.00
III 0.69 0.99 0.81
IV 0.00 0.00 0.00
A 0.00 0.00 0.00
B 0.72 0.54 0.61
C 0.21 0.41 0.28

Table 7 compares the predictability of BERT and MLP models for periodontitis staging and grading on the set of new 32 data (stage I = 1, stage II = 4, stage III = 21, and stage IV = 6, grade A = 2, grade B = 21, and grade C = 9). Our proposed BERT model was able to predict the new unseen patients’ stage and grade with 66% and 72% accuracy, respectively. Although the accuracy of the BERT model for the new patients’ stages is high, the performance of the model for predicting the grade of the patients’ is higher than the stage similar to the MLP model (59.4% vs. 62.5%).

Table 7. Performance comparison between the BERT and MLP models on the new unseen dataset.

Stages Precision Recall F1-Score
I
RegEx 0 0 0
BERT 1 1 1
II
RegEx 0.22 0.5 0.30
BERT 0.25 0.25 0.25
III
RegEx 0.74 0.81 0.77
BERT 0.78 0.67 0.72
IV
RegEx 0 0 0
BERT 0.5 0.83 0.62
Grades
A
RegEx 0 0 0
BERT 0 0 0
B
RegEx 0.75 0.86 0.8
BERT 0.78 0.86 0.82
C
RegEx 0.33 0.22 0.27
BERT 0.62 0.56 0.59

During the evaluation, we closely analyzed the misclassified cases to identify patterns or common features leading to incorrect predictions. It was observed that a significant portion of misclassified cases belonged to the less prevalent stages and grades, indicating potential data imbalance issues. This analysis highlights the need for enhancing data augmentation techniques to better represent underrepresented classes in the dataset. According to Fig 2, the RegEx/MLP model generally performs better in reducing misclassifications, particularly for stage III cases, where it correctly identifies 17 out of 21 instances compared to BERT’s 14. However, the BERT model exhibits slightly fewer misclassifications overall, especially for Stage IV cases, and handles stage I classifications well, with minimal confusion, indicating higher reliability in identifying early-stage cases (Fig 2).

Fig 2. Confusion matrices for stage classification using BERT and RegEx/MLP models.

Fig 2

The comparison between the BERT and RegEx/MLP models in the context of grade classification on the new unseen data, as illustrated in Fig 3, shows that both models perform similarly in correctly identifying grade B cases, with each model correctly classifying 18 instances. However, the BERT model exhibits a slightly better performance in reducing misclassifications for grade C cases, where it correctly classifies 5 out of 9 instances, compared to the RegEx/MLP model’s 2. Both models show difficulty in correctly identifying grade A cases, with BERT misclassifying both instances as grade B and RegEx/MLP misclassifying them as grade C (Fig 3).

Fig 3. Confusion matrices for grade classification using BERT and RegEx/MLP models.

Fig 3

Discussion

The increasing popularity of computerized dental records provides the opportunity to utilize AI-based technologies such as ML and DL models to improve patient care; however, a recent review has reported dentistry’s clinical integration of such techniques has lagged [41]. To the authors’ knowledge, there have only been a few studies that have investigated the use of ML and DL techniques in the diagnosis of periodontitis. Most of these studies have focused on analysing panoramic radiographs/and or clinical examinations or the biomarkers/bacteria extracted from the saliva [2], which is costly and time-consuming. Our study compared two approaches for automatically extracting patient data from textual notes and periodontal charts using ML techniques and demonstrated their ability to classify the stage and grade of the periodontitis.

In the first phase, we applied the BERT model to predict the stage and grade of the periodontal disease using the patients’ textual notes. Our results demonstrated that the BERT model outperformed other applied methods in terms of accuracy in classifying the correct stage and grade of the patients’ periodontal disease. To date, no previous studies have used patients’ dental records collected as text files for analysis via BERT algorithms. In medicine, Haulcy and Glass compared the performance of five classifiers, as well as convolutional neural networks and long- and short-term memory networks on the classification of Alzheimer’s disease using audio features and text features [42]. They reported that the top-performing classification models were the support vector machine and random forest classifiers trained on BERT embeddings, which both achieved an accuracy of 85.4% on the test set. Recently, new domain-specific BERT models pretrained on large-scale biomedical (BioBERT) and medical dictionaries (Med-BERT) have been introduced. These models have shown promising results in extracting valuable information from biomedical and medical literature, respectively, and outperformed the state-of-the-art methods on medical and biomedical tasks [43,44]. However, we used a base architecture in the study’s BERT model, as the dental patients’ notes mostly include a detailed document of the dental history, physical examination, diagnosis, and treatment planning, which is different from the biomedical or medical records. Moreover, as the study is the very first application of BERT model on patients’ dental notes, we decided to use a more generic architecture.

In the second phase, we utilized baseline feature engineering (RegEx) and MLP models to prepare the patient data and classify the stage and grade of the disease. Integrating these AI models into clinical workflows could enable real-time analysis of patient data directly from digital records, providing immediate diagnostic support for clinicians. This integration would streamline the diagnostic process, reduce the need for additional tests, and potentially lower costs. However, challenges such as data standardization, interoperability among EDR systems, and clinician training must be addressed to ensure seamless integration. Risks include the possibility of over-reliance on AI predictions, which might lead to misdiagnosis if the model’s limitations are not well understood or if data quality issues are not adequately managed.

Currently digital dental and medical records are gaining popularity for storing patient information. Data in electronic medical/dental records can be divided into three kinds: structured data, semi-structured data, and unstructured data. Unstructured text is defined as narrative data, consisting of clinical notes, discharge records, and radiology reports. Unstructured texts store a lot of valuable information however, common structural frameworks are usually lacking, and many errors including improper grammatical use, spelling errors, local dialects, and semantic ambiguities exist, which result in the complexity of data processing and analysis. Retrieval and application of such valuable information is possible through the application of text mining methods [18]. We used MLP to build our network, as its implementation is easy. MLP was also chosen as it is capable of providing high-quality models, while keeping the training time relatively low [45]. The proposed ML method in phase 1 was able to extract relevant information and classify the stage and grade of the periodontitis with approximately 70% accuracy, which indicates that this model is well-suited to automatically classify the periodontitis stage and grade using patient chart and clinical data.

In comparison to traditional clinical methods and other published approaches for periodontal staging and grading, our BERT model demonstrated promising results by just utilizing clinical notes without any imaging data. In contrast to Ertas et al [39], our study focused on using textual data from patient notes, which is often more readily available in clinical settings compared to comprehensive radiographic data. Our BERT model achieved an accuracy of 72% for grading and 66% for staging, which aligns closely with the performance levels of periodontal specialists observed by Oh et al., who reported accuracies of 71.33% for staging and 64% for grading among clinicians with periodontal backgrounds [46] This suggests that the use of NLP techniques like BERT can potentially bridge the gap between non-specialists and specialists in clinical diagnosis.

Additionally, in a recent study by Tastan Eroglu et al. [47], the ChatGPT model was used to classify periodontitis based on textual inputs, with staging and grading accuracies of 59.5% and 50.5%, respectively​. These values are lower compared to our BERT and even RegEx models’ performance, highlighting BERT’s superiority in understanding clinical text for periodontitis classification. This difference may be attributed to BERT’s fine-tuning capabilities on domain-specific data, which enables it to capture more nuanced clinical features compared to general-purpose models like ChatGPT.

These comparisons indicate that while the BERT model may not yet match the highest accuracies achieved with image-based methods for staging, its performance is superior for text-based classification tasks. This demonstrates its potential utility in settings where detailed radiographic data may not be available, making it a valuable tool for enhancing decision support in clinical practice. However, future studies should aim to integrate multimodal data—combining both textual and radiographic information—to leverage the strengths of each data type and further improve classification accuracy.

The study by Oh et al. [46] revealed that dental practitioners with periodontal backgrounds had higher accuracy in classifying periodontitis stages (71.33%) compared to grades (64%), with non-periodontal practitioners showing a similar trend but with lower accuracy (61.67% for stages and 49.33% for grades). In contrast, our BERT model demonstrated higher accuracy in predicting grades (72%) than stages (66%) for new patients, with these results being closer to the accuracy levels achieved by periodontal specialists. Additionally, the BERT model’s classification accuracy for both stage and grade surpasses that of non-periodontal practitioners, while the MLP model, with an accuracy of 62.5% for grades and 59.4% for stages, also outperforms non-periodontal clinicians. These comparisons suggest that our BERT model aligns more closely with specialist performance but both approaches exceed the accuracy of non-specialists in classifying the periodontitis stage and grade, indicating their strong potential for effective clinical application in the disease classification.

As the current study aimed to examine several different health-related medical and dental possible contributing factors using patient data and periodontal charts, ML methods were best suited to address the complexity and multi-dimensional nature of each due to their known ability to detect complex relationships and classify the outcome accurately [41]. ML models outperform the traditional statistical methods as they can consider a broader range of features without strict, predetermined predictor and outcome parameters and classify the output with impressive accuracy [48].

Text mining techniques, which allow for the extraction of high-quality information from large-scale unstructured text data can acquire implicit knowledge that is hidden in the unstructured text through extracting the predefined information and new knowledge from the unstructured texts [49]. In the first phase, we applied RegEx to complete the text mining procedure. In a study conducted by Zhen et al. to compare two text-mining methods (RegEx and Naïve Bayes Classifier) for analysing published full articles in terms of their adoption of standards in radiation therapy [35], they found that classifications and overall usage trends reported by the RegEx-based method are comparable to those of the domain expert.

Despite its promising results in medical research, utilization of DL and NN in dental research generally and in the periodontal field specifically is relatively scarce. A recent review by Ossowska et al. [50] has shown that new technologies are developing very quickly in the field of dentistry and AI is spreading into periodontology with the greatest focus on evaluating periodontal bone loss, peri-implant bone loss, and predicting the development of periodontitis due to the psychological features.

The potential utility of applying ML and DL models to automatically classify the stage and grade of periodontitis using patients’ notes and clinical charts have been found in the current study. The two methods presented, phase one relying on a combination of chart and text mining, and phase two relying on intelligent text mining only, both yielded promising results. However, these findings must be interpreted with caution according to the preliminary nature of this research as it is a novel method applied on unstructured patient notes and charts recruited from a university periodontal clinic. Therefore, the text mining and DL techniques (MLP and BERT) used in the present study require validation to confirm their efficiency on different types of medical/dental records from other centers.

The model demonstrated higher accuracy in classifying stage III and grade B periodontitis, which can be attributed to the imbalanced nature of the input data, with a greater number of patients falling into these categories compared to stages I or II and grade A. This imbalance is likely due to the increased number of referrals for patients with more severe periodontal conditions. Despite the relatively small sample size of 309 patients—a recognized limitation for a ML study—our findings indicate a relatively high accuracy in classifying the stage and grade of periodontitis. It is crucial to acknowledge that this class imbalance, with a predominance of patients in stage III and grade B, inherently influenced the model’s performance, partly due to the higher referral rates for severe cases. The model’s ability to achieve higher accuracy in these classes, even with a limited sample size, underscores its robustness in handling imbalanced data. To address potential overfitting, we incorporated regularization techniques, including dropout layers and early stopping, in the model training process. Despite these measures, a slight tendency towards overfitting was observed, particularly in the later stages of training.

Future studies should also analyze larger and more balanced datasets using these ML/DL techniques to provide additional validation. Specifically, exploring techniques such as synthetic data generation and data augmentation could be valuable in overcoming the challenge of class imbalance and enhancing model performance. Moreover, evaluating the integration of other machine learning approaches, like ensemble methods, may further improve classification accuracy. However, the ethical concerns surrounding AI in medical diagnosis, particularly issues related to bias, transparency, and patient privacy, have led to restrictions on access to large, diverse datasets in healthcare [40,51]. This limited access exacerbates the challenge of developing unbiased and accurate AI models, as the scarcity of data hinders the ability to train AI systems effectively. Without comprehensive datasets, AI models risk perpetuating existing disparities in healthcare, as they may not adequately represent all patient populations [52]. Addressing these ethical challenges is essential to balance the need for robust AI development with the protection of patient rights and the promotion of fair, transparent medical practices.

In spite of access to few numbers of data in our study, and the novelty of our proposed method, our findings showed a relatively high accuracy in correct classification of the stage and grade of the periodontitis, which suggests that our proposed method was successful in extracting the relevant information from patient notes and classifying the stage and grade of the periodontal disease.

One limitation of the current study is that we did not conduct a manual review of the 309 cases using both methods among the authors. The rational for this was first, since each model in the present study utilized a randomly sampled 70% of cases for training, separately for BERT and the feature method, it is likely that the models would perform well in such a comparison. However, this would introduce bias, as the validation data (30%) was not used in the training process. Second, the two models used different validation data, and there were very few overlaps. Therefore, it is not straightforward to compare the outcomes directly.

Our new proposed models are the preliminary applications of NLP and DL techniques to assign the stage and grade of the periodontal disease using patients’ notes compared to the previous models of utilizing radiographic images and salivary samples. This model can identify the stage and grade of the periodontitis without the need for a comprehensive clinical and radiographic examination, which are routinely required for disease diagnosis. Future research should focus on longitudinal studies to assess the predictive value of these models over time, along with a detailed comparison of different NLP and DL architectures to identify the most efficient approaches for clinical implementation. We hope this will increase enrollment in clinical trials of new therapies, and improve patient outcomes by enabling periodontists to diagnose periodontal disease using the information collected in patients’ notes in an accurate and timely manner. However, these models are not intended to replace the clinical judgment of experienced practitioners but to serve as a valuable support tool, particularly for general practitioners or those with limited experience in diagnosing periodontitis. These techniques can be also applied to other health-care providers in need of using patients’ information as large amounts of unstructured texts to classify stages of a disease. It is recommended that future studies compare various BERT models and other NLP techniques to identify the most effective ones for extracting information from EDR to improve classification accuracy and clinical applicability.

Data Availability

The data that support the findings of this study are available from the University of Alberta but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available at dentrsch@ualberta.ca upon reasonable request and with permission of University of Alberta.

Funding Statement

This work was supported by the Network for Canadian Oral Health Research (NCOHR) New Frontier Seed Grant Program (2020–2021), awarded to HL, NA, and MG, and a graduate student scholarship from Alberta Innovates (2023–2024), awarded to NA. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Tonetti MS, Greenwell H, Kornman KS. Staging and grading of periodontitis: Framework and proposal of a new classification and case definition. J Periodontol; 2018. (89 Suppl 1): S159–S172. doi: 10.1002/JPER.18-0006 [DOI] [PubMed] [Google Scholar]
  • 2.Kim EH, Kim S, Kim HJ, Jeong HO, Lee J, Jang J, et al. Prediction of chronic periodontitis severity using machine learning models based on salivary bacterial copy number. Front Cell Infect Microbiol; 2020. (10): 571515. doi: 10.3389/fcimb.2020.571515 PMCID: PMC7701273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Borges T de F, Regalo SC, Taba M Jr, Siéssere S, Mestriner W Jr, Semprini M. Changes in masticatory performance and quality of life in individuals with chronic periodontitis. J Periodontol; 2013. (84): 325–331. doi: 10.1902/jop.2012.120069 Epub 2012 May 1. [DOI] [PubMed] [Google Scholar]
  • 4.Preshaw P, Alba A, Herrera D, Jepsen S, Konstantinidis A, Makrilakis K, et al. Periodontitis and diabetes: a two-way relationship. Diabetologia; 2012. (55): 21–31. doi: 10.1007/s00125-011-2342-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Araújo VMA, Melo IM, Lima V. Relationship between periodontitis and rheumatoid arthritis: review of the literature. Mediators Inflamm; 2015; 2015:259074. doi: 10.1155/2015/259074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Graziani F, Music L, Bozic D, Tsakos G. Is periodontitis and its treatment capable of changing the quality of life of a patient? Br Dent J; 2019. (227): 621–625. doi: 10.1038/s41415-019-0735-3 [DOI] [PubMed] [Google Scholar]
  • 7.Genco RJ, Borgnakke WS. Risk factors for periodontal disease. Periodontol 2000; 2013 (62): 59–94. doi: 10.1111/j.1600-0757.2012.00457.x [DOI] [PubMed] [Google Scholar]
  • 8.Monsarrat P, Bernard D, Marty M, Cecchin-Albertoni C, Doumard E, Gez L, et al. Systemic periodontal risk score using an innovative machine learning strategy: An observational study. J Pers Med; 2022. 12(2): 217. doi: 10.3390/jpm12020217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bertoldi C, Forabosco A, Lalla M, Generali L, Zaffe D, Cortellini P. How intraday index changes influence periodontal assessment: A preliminary study. International Journal of Dentistry; 2017: 1–10. Retrieved from doi: 10.1155/2017/7912158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chang HJ, Lee SJ, Yong TH, Shin NY, Jang BG, Kim JE, et al. Deep learning hybrid method to automatically diagnose periodontal bone loss and stage periodontitis. Scientific Reports; 2020. (10): 7531. Retrieved from doi: 10.1038/s41598-020-64509-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med; 2021. 13(1): 152. doi: 10.1186/s13073-021-00968-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Walczak S., Velanovich V. Improving prognosis and reducing decision regret for pancreatic cancer treatment using artificial neural networks. Decision Support Systems; 2018. (106): 110–118. [Google Scholar]
  • 13.Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature; 2017. 542(7639): 115–118. doi: 10.1038/nature21056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fei Y, Li WQ. Improve artificial neural network for medical analysis, diagnosis and prediction. J Crit Care; 2017. (40): 293. doi: 10.1016/j.jcrc.2017.06.012 . [DOI] [PubMed] [Google Scholar]
  • 15.Ameli N, Gibson MP, Khanna A, Howey M, Lai H. An application of machine learning techniques to analyze patient information to improve oral health outcomes. Front. Dent. Med; 2022. (3): 833191. doi: 10.3389/fdmed.2022.833191 [DOI] [Google Scholar]
  • 16.Mitchell JR, Szepietowski P, Howard R, Reisman P, Jones JD, Lewis P, et al. A question-and-answer system to extract data from free-text oncological pathology reports (CancerBERT Network): Development Study. J Med Internet Res; 2022. 24(3): e27210. doi: 10.2196/27210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Patel JS, Su C, Tellez M, Albandar JM, Rao R, Iyer V, et al. Developing and testing a prediction model for periodontal disease using machine learning and big electronic dental record data. Front Artif Intell; 2022. (5): 979525. doi: 10.3389/frai.2022.979525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: A Review. J Healthc Eng; 2018; 2018: 4302425. doi: 10.1155/2018/4302425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. Journal of Biomedical Informatics; 2023. (138): 104282. doi: 10.1016/j.jbi.2023.104282 . [DOI] [PubMed] [Google Scholar]
  • 20.Benicio DHP, Xavier-Júnior JC, Paiva KRS, Camargo JDDAS. Applying text mining and natural language processing to electronic medical records for extracting and transforming texts into structured data. Available at 10.2139/ssrn.3991515 [DOI]
  • 21.Chen Q, Zhou X, Wu J, Zhou Y. Structuring electronic dental records through deep learning for a clinical decision support system. Health Informatics J; 2021. 27(1):1460458220980036. doi: 10.1177/1460458220980036 [DOI] [PubMed] [Google Scholar]
  • 22.Patel JS, Kumar K, Zai A, Shin D, Willis L, Thyvalikakath TP. Developing automated computer algorithms to track periodontal disease change from longitudinal electronic dental records. Diagnostics (Basel); 2023. 13(6):1028. doi: 10.3390/diagnostics13061028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li F, Jin Y, Liu W, Rawat BP, Cai P, Yu H. Fine-tuning Bidirectional Encoder Representations from Transformers (BERT)-based models on large-scale electronic health record notes: an empirical study. JMIR Med Inform. 2019; 7(3): e14830. doi: 10.2196/14830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Huang K, Altosaar J, Ranganath R. Clinical BERT: modeling clinical notes and predicting hospital readmission. ArXiv. Preprint posted online November 29, 2020. http://arxiv.org/abs/1904.05342 [Google Scholar]
  • 25.Xu D, Gopale M, Zhang J, Brown K, Begoli E, Bethard S. Unified medical language system resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization. J Am Med Inform Assoc; 2020. 27(10):1510–19. doi: 10.1093/jamia/ocaa080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kim J, Amar S. Periodontal disease and systemic conditions: A bidirectional relationship. Odontology; 2006. 94(1): 10–21. doi: 10.1007/s10266-006-0060-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ray P, Chakrabarti A. A mixed approach of deep learning method and rule-based method to improve aspect aevel sentiment analysis. Applied Computing and Informatics; 2022. 18(1/2): 163 178. doi: 10.1016/j.aci.2019.02.002 [DOI] [Google Scholar]
  • 28.Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv 2018:1810.04805. [Google Scholar]
  • 29.Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, et al. Relation extraction from clinical narratives using pre-trained Language Models. AMIA Annu Symp Proc; 2020. (2019): 1236–1245. [PMC free article] [PubMed] [Google Scholar]
  • 30.Lagouvardos S, Dolby J, Grech N, Antoniadis A, Smaragdakis Y. Static analysis of shape in TensorFlow programs, in 34th European Conference on Object-Oriented Programming (ECOOP 2020), LIPIcs, Vol. 166, pp. 15:1–15:29, 2020. doi: 10.4230/LIPIcs.ECOOP.2020.15 [DOI] [Google Scholar]
  • 31.Pedregosa F, Varoquaux G, Gramfort A, Michel V,Thirion B, Grisel O, Blondel M, et al. Scikit-learn: Machine Learning in Python. Journal of MachineLearning Research 2011; 12:2825–30 [Google Scholar]
  • 32.Qin H. Comparison of deep learning models on time series forecasting: a case study of dissolved oxygen prediction. preprint [2019]. Available at: https://www.researchgate.net/publication/337386775 [Google Scholar]
  • 33.Przybyła P, Shardlow M, Aubin S, Bossy R, Eckart de Castilho R, Piperidis S, et al. Text mining resources for the life sciences. Database (Oxford). 2016; 2016: baw145. doi: 10.1093/database/baw145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gargiulo F, Silvestri S, Ciampi M. A clustering-based methodology to support the translation of medical specifications to software models. Applied Soft Computing; 2018. (71): 199–212. doi: 10.1016/j.asoc.2018.03.057 [DOI] [Google Scholar]
  • 35.Zhen Y, Jiang Y, Yuan L, Kirkpartrick J, Wu J, Ge Y. Analyzing the usage of standards in radiation therapy clinical studies. IEEE EMBS Int Conf Biomed Health Inform. 2017; 2017: 349–352. doi: 10.1109/BHI.2017.7897277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ramachandran P, Zoph B, Le QV. Searching for activation functions. 2017. http://arxiv.org/abs/1710.05941 [Google Scholar]
  • 37.Cortiñas-Lorenzo B, Pérez-González F. Adam and the ants: on the influence of the optimization algorithm on the detectability of DNN watermarks. Entropy (Basel); 2020. 22(12):1379. doi: 10.3390/e22121379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kingma DP, Ba J. Adam: A method for stochastic optimization; Proceedings of the 3rd International Conference on Learning Representations (ICLR ‘15); San Diego, CA, USA. 7–9 May 2015
  • 39.Ertaş K, Pence I, Cesmeli MS, Ay ZY. Determination of the stage and grade of periodontitis according to the current classification of periodontal and peri-implant diseases and conditions (2018) using machine learning algorithms. J Periodontal Implant Sci. 2023;53(1):38–53. doi: 10.5051/jpis.2201060053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hicks SA, Strümke I, Thambawita V, Hammou M, Riegler MA, Halvorsen P, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep. 2022. Apr 8;12(1):5979. doi: 10.1038/s41598-022-09954-8 ; PMCID: PMC8993826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schwendicke F, Samek W, Krois J. Artificial intelligence in dentistry: chances and challenges. J Dent Res; 2020. 9: 769–74. doi: 10.1177/0022034520915714 Epub 2020 Apr 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Haulcy R, Glass J. Classifying alzeimer’s disease using audio and text-based representations of speech. Front Psychol; 2021. (11): 624137. doi: 10.3389/fpsyg.2020.624137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics; 2020. 36(4): 1234–1240. doi: 10.1093/bioinformatics/btz682 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Liu N, Hu Q, Xu H, Xu X, Chen M. Med-BERT: A pretraining framework for medical records named entity recognition. IEEE Transactions on Industrial Informatics;2021. 18(8), 5600–5608. [Google Scholar]
  • 45.Car Z, Baressi Šegota S, Anđelić N, Lorencin I, Mrzljak V. Modeling the spread of COVID-19 infection using a multilayer perceptron. Computational and mathematical methods in medicine; 2020, 5714714. doi: 10.1155/2020/5714714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Oh SL., Yang J.S. & Kim Y.J. Discrepancies in periodontitis classification among dental practitioners with different educational backgrounds. BMC Oral Health 21, 39 (2021). doi: 10.1186/s12903-020-01371-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Tastan Eroglu Z, Babayigit O, Ozkan Sen D, Ucan Yarkac F. Performance of ChatGPT in classifying periodontitis according to the 2018 classification of periodontal diseases. Clin Oral Investig. 2024;28(7):407. doi: 10.1007/s00784-024-05799-9 ; PMCID: PMC11217036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bichu YM, Hansa I, Bichu AY, Premjani P, Flores-Mir C, Vaid NR. Applications of artificial intelligence and machine learning in orthodontics: a scoping review. Prog Orthod; 2021. (22): 18. doi: 10.1186/s40510-021-00361-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ningrum PK, Pansombut T, Ueranantasun A. Text mining of online job advertisements to identify direct discrimination during job hunting process: A case study in Indonesia. PLoS One; 2020. 15(6): e0233746. Published 2020 Jun 4. doi: 10.1371/journal.pone.0233746 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ossowska A, Kusiak A, Świetlik D. Artificial intelligence in dentistry-narrative review. Int J Environ Res Public Health; 2022. 19(6):3449. Published 2022 Mar 15. doi: 10.3390/ijerph19063449 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Savulescu J, Giubilini A, Vandersluis R, Mishra A. Ethics of artificial intelligence in medicine. Singapore Med J. 2024. Mar 1;65(3):150–158. doi: 10.4103/singaporemedj.SMJ-2023-279 Epub 2024 Mar 26. ; PMCID: PMC7615805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Giovanola B, Tiribelli S. Beyond bias and discrimination: redefining the AI ethics principle of fairness in healthcare machine-learning algorithms. AI Soc. 2023;38(2):549–563. doi: 10.1007/s00146-022-01455-6 Epub 2022 May 21. ; PMCID: PMC9123626. [DOI] [PMC free article] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0000692.r001

Decision Letter 0

Akhilanand Chaurasia, Wisit Cheungpasitporn

23 Jul 2024

PDIG-D-24-00165

Classification of periodontitis stage and grade using natural language processing techniques

PLOS Digital Health

Dear Dr. Ameli,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 60 days Sep 21 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Akhilanand Chaurasia

Section Editor

PLOS Digital Health

Journal Requirements:

Additional Editor Comments (if provided):

The study on periodontitis classification using BERT and RegEx/MLP approaches has several limitations that impact its reliability and generalizability. The small sample size from a single clinic, data imbalance, and lack of comparison with other state-of-the-art methods raise concerns about the model's performance and applicability. Additionally, the paper fails to address important aspects such as model interpretability, overfitting prevention, and ethical considerations in medical AI applications.

The technical details provided in the paper are insufficient for a thorough evaluation. Key information about the RegEx implementation, BERT model specifications, and LSTM integration is missing. The study also lacks crucial data distributions and explanations for performance discrepancies across different stages and grades. Furthermore, the authors did not consider alternative approaches such as using specialized biomedical language models or exploring zero-shot learning with large language models, which could potentially yield better results.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

--------------------

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

--------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study explores the use of NLP and machine learning techniques to classify the stage and grade of periodontitis using patient data from dental records. The researchers developed and compared two approaches: one using a BERT model, and another using a combination of RegEx for text mining and a MLP neural network. They analyzed 309 anonymized patient periodontal charts and corresponding clinician's notes from a university periodontal clinic. The BERT model achieved higher accuracy in predicting both the stage (77%) and grade (75%) of periodontitis compared to the RegEx and MLP approach, which achieved about 70% accuracy for both.

Comments

The study used only 309 patient records from a single university clinic. This small, potentially homogeneous sample may not be representative of the broader population, limiting generalizability.

The authors noted an imbalance in the input data, with more patients in stage III and grade B. This can lead to biased model performance.

While the study compared BERT with a RegEx/MLP approach, it didn't explore other state-of-the-art NLP or machine learning methods.

The study doesn't provide insight into which features or text patterns are most important for classification. Incorporate model interpretability techniques like SHAP values or LIME.

The paper doesn't provide a detailed analysis of misclassified cases.

While the models show good statistical performance, there's no indication of how well they align with expert clinical judgment.

The study doesn't mention techniques used to prevent overfitting, especially given the small dataset.

The paper doesn't address potential ethical concerns of using AI for medical diagnosis.

The paper lacks some important details about model architecture, hyperparameters, and training procedures.

Reviewer #2: Nazila et al. developed a BERT model to automatically classify periodontitis grade and stage. The model was claimed to have 77% and 75% accuracy on classification, which is better than a MLP baseline. However, due to the missing of detailed description of both model and data, it is very difficult to evaluate the validity of the claims in the paper. Please see the comments below:

How difficult it is to determine stage and grade manually? Does the EDR notes usually contain the stage and grade information already?

The technical details in the paper were missing, which makes is very difficult to evalute the paper. For example, what is the exact regular expression used to extract information? Which BERT model was used as base model for fine tuning? How were LSTM combined with BERT?

Why not use BioBERT or MedBERT as base model?

Can RegEx accurately extract the information from notes and charts? It is not clear to me why regular expression can be used to extract information from chart? What is the main bottleneck of the RegEx features based MLP? RegEx part or MLP part?

Can the authors show distributions of import statistics of the dataset? e.g. how many datapoint are in stage I, II, III, and grade A, B, C?

Is there a reason why only stage III and grade B have good performance?

Before branding the proposed the model as diagnostic tool, there are so many issues that needs to be solved. For instance, would the model be used only after being manually diagnosis of periodnotitis? Is there any negative data being used in the study (e.g. data from other close disease or healthy patient)?

Does the authors consider using existing large language models (LLM) for zero or few shot learning? LLMs are a good baseline to compare with the propose model. For instance, with proper prompt engineering for Llama3 or GPT4 by including the standard diagnostic standard as system prompt and the EDR notes as user prompt. In addition, even for the task of extracting information from EDR, LLM should be better than RegEx as RegEx does not understand sematic information.

Can the authors comment on how robust the model is? What will happen if the model was applied to notes that is written by another dentists?

--------------------

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000692.r003

Decision Letter 1

Wisit Cheungpasitporn, Shalmali Joshi

1 Oct 2024

PDIG-D-24-00165R1

Classification of periodontitis stage and grade using natural language processing techniques

PLOS Digital Health

Dear Dr. Ameli,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 60 days Nov 30 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Wisit Cheungpasitporn, MD

Academic Editor

PLOS Digital Health

Additional Editor Comments (if provided):

The reviewers suggest additional several areas for improvement and further exploration. They recommend incorporating model interpretability techniques, providing a detailed analysis of misclassified cases, and addressing potential overfitting issues. Additionally, they advise exploring the alignment between model performance and expert clinical judgment, addressing ethical concerns related to AI in medical diagnosis, and including more details about model architecture, hyperparameters, and training procedures. Overall, while the study shows promise, the reviewers suggest that addressing these concerns would strengthen the research and its potential impact in the field.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

--------------------

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

--------------------

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

--------------------

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

--------------------

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall, the manuscript shows notable improvement in terms of technical detail and transparency. However, there are still some important considerations around clinical applicability and broader context that could be addressed to strengthen the paper further. The core contribution - applying BERT to periodontal staging and grading - remains novel and potentially impactful, but the limitations of the current study should be clearly communicated.

Major comments

While acknowledged as a limitation, the relatively small sample size (309 patients) is still a significant concern for a machine learning study, especially given the class imbalance.

While LIME analysis is mentioned, there could be more in-depth discussion of what features the models are using to make predictions and how this aligns with clinical understanding.

The discussion could be expanded to more clearly articulate how these models could be integrated into clinical workflows and what potential benefits/risks this might entail.

While the study compares BERT to RegEx/MLP, it would be valuable to see how these methods compare to current clinical practice or other published approaches for periodontal staging and grading.

The conclusion could be strengthened by more specific suggestions for future research to address the current limitations.

Reviewer #2: (No Response)

--------------------

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000692.r005

Decision Letter 2

Wisit Cheungpasitporn, Shalmali Joshi

6 Nov 2024

Classification of periodontitis stage and grade using natural language processing techniques

PDIG-D-24-00165R2

Dear Dr. Ameli,

We are pleased to inform you that your manuscript 'Classification of periodontitis stage and grade using natural language processing techniques' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Wisit Cheungpasitporn, MD

Academic Editor

PLOS Digital Health

***********************************************************

Additional Editor Comments (if provided):

It is evident that all concerns raised have been adequately addressed. The manuscript is well-written and demonstrates substantial improvement. I have no additional comments and recommend acceptance for publication.

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: After careful consideration of the revised manuscript and the authors' point-by-point responses, I conclude that the raised issues have been adequately addressed. The revisions have sufficiently improved the manuscript, and I have no additional comments to put forward. I endorse the acceptance of this manuscript in its present state.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Rebuttal letter (PLOS Digital Health).docx

    pdig.0000692.s001.docx (24.9KB, docx)
    Attachment

    Submitted filename: 2nd Rebuttal letter.docx

    pdig.0000692.s002.docx (26.6KB, docx)

    Data Availability Statement

    The data that support the findings of this study are available from the University of Alberta but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available at dentrsch@ualberta.ca upon reasonable request and with permission of University of Alberta.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES