Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Apr 15.
Published in final edited form as: Clin Image Based Proced Distrib Collab Learn Artif Intell Combat COVID 19 Secur Priv Preserv Mach Learn (2021). 2021 Nov 14;12969:78–87. doi: 10.1007/978-3-030-90874-4_8

TMJOAI: An Artificial Web-Based Intelligence Tool for Early Diagnosis of the Temporomandibular Joint Osteoarthritis

Celia Le 1, Romain Deleat-Besson 1, Najla Al Turkestani 1, Lucia Cevidanes 1, Jonas Bianchi 3, Winston Zhang 1, Marcela Gurgel 1, Hina Shah 1, Juan Prieto 2, Tengfei Li 2
PMCID: PMC9012403  NIHMSID: NIHMS1793850  PMID: 35434730

Abstract

Osteoarthritis is a chronic disease that affects the temporomandibular joint (TMJ), causing chronic pain and disability. To diagnose patients suffering from this disease before advanced degradation of the bone, we developed a diagnostic tool called TMJOAI. This machine learning based algorithm is capable of classifying the health status TMJ in of patients using 52 clinical, biological and jaw condyle radiomic markers. The TMJOAI includes three parts. the feature preparation, selection and model evaluation. Feature generation includes the choice of radiomic features (condylar trabecular bone or mandibular fossa), the histogram matching of the images prior to the extraction of the radiomic markers, the generation of feature pairwise interaction, etc.; the feature selection are based on the p-values or AUCs of single features using the training data; the model evaluation compares multiple machine learning algorithms (e.g. regression-based, tree-based and boosting algorithms) from 10 times 5-fold cross validation. The best performance was achieved with averaging the predictions of XGBoost and LightGBM models; and the inclusion of 32 additional markers from the mandibular fossa of the joint improved the AUC prediction performance from 0.83 to 0.88. After cross-validation and testing, the tools presented here have been deployed on an open-source, web-based system, making it accessible to clinicians. TMJOAI allows users to add data and automatically train and update the machine learning models, and therefore improve their performance.

Keywords: Machine learning, Early diagnosis, Osteoarthritis

1. Introduction

Temporomandibular joints (TMJ) are small joints that connect the lower jaw (mandible) to the skull. They are susceptible to suffer from disorders causing recurrent or chronic pain and dysfunction, making them among the most common causes of facial pain [20]. Osteoarthritis (OA) is the most common form of arthritis which is a condition affecting over 50 million US adults and leading to chronic disability and alteration in the structure of the joints [5]. An early diagnosis could help in reducing the destruction of the bone by slowing the disease’s progression, which is essential for suffering patients since there is to this day no cure for OA, other than reducing its symptoms [19,23].

Integration of clinical, biological and radiomic markers have been shown in previous studies to contribute to a precise diagnosis of TMJ osteoarthritis (TMJOA) [2,6]. Biological markers are obtained by quantifying protein levels in synovial fluid, serum, and saliva, while imaging markers are obtained by acquiring cone-beam computed tomography (CBCT) images of the TMJ region [10,22].

To meet the need for a robust early diagnosis tool, we developed TMJOAI, an artificial intelligence-based tool, using machine learning algorithms such as regression trees, which have been shown to help finding correlation between variables in order to predict OA [2,12,13,17], or other diseases [9]. We compared different algorithms to find the most efficient one in classifying patients’ health status.

The dataset used for training those machine learning models is detailed in Sect. 2. The method and the different algorithms are described in Sect. 3. The results of the performance experiments are presented in Sect. 4, and the conclusions are drawn in Sect. 5.

2. Dataset

Our dataset consisted in 92 subjects, 46 suffering from TMJ OA, and 46 healthy controls, making it a balanced dataset. Moreover, the OA and the control groups were age and sex matched. The data acquisition protocol was the same for all subjects. We obtained the values of 52 markers: 2 demographic values, 13 protein level values from serum, 12 protein level values from saliva, 5 clinical features evaluating the pain and 20 imaging features representing the grey-level values of the region of interest. The imaging features originate from the lateral region of the trabecular bone of the condyle, which is the lower part of the TMJ. For more robust radiomic markers, we also tested 32 mandibular fossa radiomic features, which forms the upper part of the TMJ. Therefore, our training dataset had a total of 3828 features and interactions features.

Those values were merged and stored in a csv file, which is given as input for our machine learning algorithms.

3. Proposed Methods

3.1. Feature Selection

Out of the 52 features composing our dataset, we calculated the interaction value between each of them by multiplying them, resulting in 1326 additional features, for a total of 1378 features. We then computed the Area Under the Receiver Operating Curve (AUC) of each feature to evaluate their relevance and select the features with higher AUC (Fig. 1), which have better performance in the classification task of the TMJ health status.

Fig.1.

Fig.1.

Circular plot of the AUC, −log(pvalue) and −log(qvalue) of the features (outside to inside)

In addition, we calculated the correlation between each pair of the selected features. For those highly auto-correlated pairs we chose the one with the highest AUC and excluded the others.

This correlation-based selection helps to reduce the complexity of the model and prevent overfitting.

3.2. Comparison of Multiple Machine Learning Algorithms

We divided our dataset into 5-folds to perform a leave-one-out cross validation, using stratified folds to keep the balance between the OA and control groups in each fold. We repeated the operation 10-times to bypass the sampling bias from the random subdivision of the train-test folds, by using different seeds for randomly creating the folds of the cross validation. The seeds were fixed to allow the reproducibility of the algorithm. This resulted in 50 models for each of the trained algorithms.

We compared 5 different algorithms to select which performed the best: Random Forest, XGBoost, LightGBM, Ridge and Logistic Regressions. Those algorithms are well-indicated to solve both a regression or a classification problem, such as the diagnosis of a disease, like in the present case [15,16]. Regression models weigh the prediction and provide outputs as a probability of disease.

Regression Analysis.

Regression analysis designates a collection of methods used in machine learning to predict a dependent variable (the health status of a patient) from multiple independent variables or predictors (our 52 features values and their associated interactions). These methods use different functions during the regression.

The ridge regression uses a linear regression, to which we add a ridge regression penalty to solve the problem:

L(x,y)=min((yiwixi)2+λwi2) (1)

where λ = [1,10], and its optimal value is determined by doing a nested 5-folds cross validation.

Whereas the logistic regression uses a sigmoid function:

f(x)=11ex (2)

Regression Trees.

Regression trees are an underlying branch of regression analysis. As decision trees, they use a binary recursive partitioning to split the data into 2 branches (diseased or healthy) according to the value of a random subset of the features. The tree will keep growing on the branch that minimizes the mean squared error:

1/n*(yprediction)2 (3)

A Random Forest is composed of an ensemble of regression trees (500 in our case), where each tree is trained using only a subdivision of the features and a subdivision of the dataset. The final prediction is the average of the predictions from all the trees. It has shown its ability to solve both classification and regression problems [3].

Gradient-Boosted Regression Trees.

eXtreme Gradient Boosted machine (XGBoost) [7] and Light Gradient Boosted Machine (LightGBM) [14] are gradient-boosted tree algorithms. They both originate from Random Forest but vary in their tree’s growth pattern. Moreover, gradient boosting methods combine the predictions while building the trees instead of doing it at the end of the process.

The gradient-boosted methods have the difference that XGBoost has a level-wise tree growth whereas LightGBM has a leaf-wise tree growth and uses Gradient-based One-Side Sampling (GOSS).

They also offer the possibility to tune more parameters, and if done properly demonstrated superior performances than Random Forest [1,8,11,18,21].

For the gradient-boosted algorithms, we performed grid search to determine the optimal value of the different hyperparameters:

  • Learning rate: Step size.

  • Subsample/Bagging fraction: The fraction of cases to be used in each tree, to prevent overfitting, applied once in every boosting iteration.

  • Colsample by tree/Feature fraction: The fraction of features to be used in each tree, to prevent overfitting, applied once every tree.

  • Minimum child weight: The minimum weight of a branch, under which the branch will stop growing to prevent the overfitting of the tree.

  • Max depth: Maximum depth to which the trees will grow, a low value is set to prevent overfitting.

  • Number of estimators: Number of trees in the forest, combined with a nested 5-folds cross validation to find the optimal number of trees.

The range of values used during the grid search and the values selected to train the models are reported in Table 1.

Table 1.

Range and values of hyperparameters.

Learning rate Subsample Colsample Min weight Max depth Estimators
Range [0.01, 0.001] 0.5, 0.7 0.5, 0.7 1, 2 [1, 10] [1000, 10000]
XGBoost 0.01 0.5 0.7 2 1 5000
LightGBM 0.005 0.5 0.7 2 1 1000

3.3. Histogram Matching

The differences among the grey-levels of the trabecular imaging data were adjusted by histogram matching to control subjects. All images were each matched to 5 different references from the control subjects. The mean value of each radiomic feature was then used as a training data.

4. Experimental Results

4.1. Experiments

We obtained a probability of TMJ OA for each patient from our 5-folds cross validation. We applied a threshold of 0.5 to determine the final health status prediction of the model and calculated the following metrics to evaluate the performances of the model: accuracy, precision and recall for OA group (1) and control group (0), F1-score and AUC.

4.2. Algorithm Comparison Results

We averaged the metrics of the 50 models for each algorithm and reported the results in Table 2.

Table 2.

Comparison of metrics for the different algorithms.

Models Accuracy Precision1 Precision0 Recall1 Recall0 F1-Score AUC
RandomForest Std 0.705 ± 0.038 0.710 ± 0.039 0.704 ± 0.041 0.696 ± 0.069 0.715 ± 0.053 0.701 ± 0.048 0.763 ± 0.034
XGBoost Std 0.714 ± 0.029 0.717 ± 0.033 0.712 ± 0.030 0.709 ± 0.037 0.720 ± 0.039 0.712 ± 0.030 0.780 ± 0.038
LightGBM Std 0.738 ± 0.038 0.740 ± 0.037 0.738 ± 0.043 0.735 ± 0.053 0.741 ± 0.040 0.737 ± 0.041 0.802 ± 0.040
Ridge Std 0.670 ± 0.029 0.678 ± 0.029 0.66.3 ± 0.032 0.646 ± 0.057 0.693 ± 0.040 0.661 ± 0.038 0.704 ± 0.033
Logistic Std 0.715 ± 0.027 0.740 ± 0.027 0.695 ± 0.028 0.663 ± 0.039 0.767 ± 0.025 0.699 ± 0.031 0.775 ± 0.022

We have concluded from this experiment that LightGBM and XGBoost are the most efficient diagnostic methods since they had the highest AUCs and F1-scores. Consequently, we decided to combine the models by averaging the 50 XGBoost and the 50 LightGBM predictions to give the final classification.

Due to the small size of the dataset, we decided not to use a testing dataset, and thus, in order to evaluate the performance of the model, the prediction of each subject only used the models out of the 100 models that were trained without this subject. The classification was therefore made by 20 different models (10 XGBoost and 10 LightGBM) for each of the folds, and compared to the true health status.

As we can see in the Table 3, by combining XGBoost and LightGBM models, we obtain a final model more efficient than each of them separately.

Table 3.

Comparison of metrics for the final model.

Models Accuracy Precision1 Precision0 Recall1 Recall0 F1-Score AUC
Mean XGB+LGBM 0.726 0.728 0.725 0.722 0.730 0.725 0.791
Models combination 0.761 0.761 0.761 0.761 0.761 0.761 0.831

4.3. Histogram Matching and Mandibular Fossa Features Results

Figure 2 shows the AUC of the interaction features before and after histogram matching. In Fig. 2b, the condylar trabecular features (in cyan) provided a high contrast in their interactions with each of the features; some of them demonstrated a significantly higher AUC. By adding the mandibular fossa features, we observed that a substantial number of these features, particularly those combined with the clinic and condyle trabecular features, had an overall higher AUC than the other features.

Fig.2.

Fig.2.

AUC of the interaction features.

When we compared the results of the training with and without histogram matched features, including mandibular fossa features in Table 4, using the same training parameters, we noticed that the AUC is slightly higher with histogram matched features, and improved considerably by adding the mandibular fossa features.

Table 4.

Comparison of metrics for the final model with histogram matched imaging/trabecular features.

Models Accuracy Precision1 Precision0 Recall1 Recall0 F1-Score AUC
Original 0.761 0.761 0.761 0.761 0.761 0.761 0.831
Histogram Matched 0.750 0.767 0.735 0.717 0.783 0.742 0.835
HM+Fossa 0.794 0.814 0.776 0.761 0.826 0.787 0.882

The fossa features improved the performances of the final model, in particular the AUC by 0.5 and the F1-score by 0.25. These findings demonstrate the importance of including mandibular fossa features as well as the trabecular ones.

4.4. Deployment

Web-Based System.

Our aim was to make this tool available for clinicians without the need for a developer to run it. We uploaded the TMJOAI tool into a docker image, running on a web-based system called Data Storage for Computation and Integration (DSCI) [4].

The code is written in python which makes it easy to deploy, since python environments are available in Docker and are highly maintainable.

Prediction.

The prediction algorithm takes the values of the clinical, biological and radiomic markers of one or multiple patients as an input in a single csv file. Then, it returns a csv file with the predicted diagnosis for each patient.

Since having the value of all of the features can be complex, it is possible to maintain the efficiency of the prediction with part of the data. However, absence of essential features can impact the accuracy of the diagnosis (Fig. 3).

Fig.3.

Fig.3.

Statistics of the top features

Training.

Since our dataset only included 92 patients, our goal was to enable the addition of new cases to the training dataset. This will improve the TMJOAI tool by making the model more robust. We implemented the functionality to add data and automatically retraining the models and therefore improve their performances. In that way, augmenting the dataset and improving the tool is fast (about 20min), effortless and doesn’t require many of developer supervision.

5. Conclusion

By combining two efficient regression tree algorithms, we have been able to develop a tool capable of predicting TMJ OA before the appearance of symptoms, by using clinical, biological and radiomic markers. This early diagnosis could help in applying earlier intervention methods that prevent progressive destruction of the TMJ bone.

These experiments have shown the efficiency of gradient-boosted methods, as well as associating multiple models, using regression to give more importance to some predictions by giving as output a probability instead of a classification.

However, only 92 cases have been used for this study, therefore this tool can still be improved by adding new patients to our dataset to make the model more robust, hence the necessity of being able to easily add data and train quickly the model.

Finally, in this study, we successfully deployed the artificial web-based intelligence tool TMJOAI towards an early diagnosis of the Temporomandibular Joint Osteoarthritis.

Acknowledgments

Supported by NIDCR DE024550 and AAOF Dewel Biomedical research Award.

References

  • 1.Appel R, Fuchs T, Dollár P, Perona P: Quickly boosting decision trees-pruning underachieving features early. In: International Conference on Machine Learning, pp. 594–602. PMLR; (2013) [Google Scholar]
  • 2.Bianchi J, et al. : Osteoarthritis of the temporomandibular joint can be diagnosed earlier using biomarkers and machine learning. Sci. Rep 10(1), 1–14 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Breiman L: Random forests. Mach. Learn 45(1), 5–32 (2001) [Google Scholar]
  • 4.Brosset S, et al. : Web infrastructure for data management, storage and computation. In: Medical Imaging 2021: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 11600, p. 116001N. International Society for Optics and Photonics; (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Center for disease control and prevention. data and statistics. [July 2021]. https://www.cdc.gov/arthritis/datastatistics/index.htm. Accessed.
  • 6.Cevidanes LH, et al. : 3D osteoarthritic changes in TMJ condylar morphology correlates with specific systemic and local biomarkers of disease. Osteoarthr. Cartil 22(10), 1657–1667 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen T, Guestrin C: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, New York: (2016). 10(2939672.2939785) [Google Scholar]
  • 8.Chen T, Li H, Yang Q, Yu Y: General functional matrix factorization using gradient boosting. In: International Conference on Machine Learning, pp. 436–444. PMLR; (2013) [Google Scholar]
  • 9.Cosma G, Brown D, Archer M, Khan M, Pockley AG: A survey on computational intelligence approaches for predictive modeling in prostate cancer. Expert Syst. Appl 70, 1–19 (2017) [Google Scholar]
  • 10.Ebrahim FH, et al. : Accuracy of biomarkers obtained from cone beam computed tomography in assessing the internal trabecular structure of the mandibular condyle. Oral Surg. Oral Med. Oral Pathol. Oral Radiol 124(6), 588–599 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Friedman JH: Greedy function approximation: a gradient boosting machine. Ann.Stat 29, 1189–1232 (2001) [Google Scholar]
  • 12.Heard BJ, Rosvold JM, Fritzler MJ, El-Gabalawy H, Wiley JP, Krawetz RJ: A computational method to differentiate normal individuals, osteoarthritis and rheumatoid arthritis patients using serum biomarkers. J. R. Soc. Interface 11(97), 20140428 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jamshidi A, Pelletier JP, Martel-Pelletier J: Machine-learning-based patient-specific prediction models for knee osteoarthritis. Nat. Rev. Rheumatol 15(1), 49–60 (2019) [DOI] [PubMed] [Google Scholar]
  • 14.Ke G, et al. : LightGBM: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst 30, 3146–3154 (2017) [Google Scholar]
  • 15.Kuo DE, et al. : Gradient boosted decision tree classification of endophthalmitis versus uveitis and lymphoma from aqueous and vitreous IL-6 and IL-10 levels. J. Ocul. Pharmacol. Ther 33(4), 319–324 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kuo DE, et al. : Logistic regression classification of primary vitreoretinal lymphoma versus uveitis by interleukin 6 and interleukin 10 levels. Ophthalmology 127(7), 956–962 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lazzarini N, et al. : A machine learning approach for the identification of new biomarkers for knee osteoarthritis development in overweight and obese women. Osteoarthr. Cartil 25(12), 2014–2021 (2017) [DOI] [PubMed] [Google Scholar]
  • 18.Li P, Wu Q, Burges C: McRank: learning to rank using multiple classification and gradient boosting. Adv. Neural Inf. Process. Syst 20, 897–904 (2007) [Google Scholar]
  • 19.Liu Y, et al. : Multiple treatment meta-analysis of intra-articular injection for temporomandibular osteoarthritis. J. Oral Maxillofac. Surg 78(3), 373–e1 (2020) [DOI] [PubMed] [Google Scholar]
  • 20.National institute of dental and craniofacial research. facial pain https://www.nidcr.nih.gov/research/data-statistics/facial-pain. Accessed July 2021
  • 21.Oguz BU, Shinohara RT, Yushkevich PA, Oguz I: Gradient boosted trees for corrective learning. In: Wang Q, Shi Y, Suk H-I, Suzuki K (eds.) MLMI 2017. LNCS, vol. 10541, pp. 203–211. Springer, Cham: (2017). 10.1007/978-3-319-67389-9_24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Paniagua B, et al. : Validation of CBCT for the computation of textural biomarkers. In: Medical Imaging 2015: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 9417, p. 94171B. International Society for Optics and Photonics; (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wang X, Zhang J, Gan Y, Zhou Y: Current understanding of pathogenesis and treatment of TMJ osteoarthritis. J. Dent. Res 94(5), 666–673 (2015) [DOI] [PubMed] [Google Scholar]

RESOURCES