Abstract
Osteoarthritis of the temporomandibular joint (TMJ OA) is the most common disorder of the TMJ. A clinical decision support (CDS) system designed to detect TMJ OA could function as a useful screening tool as part of regular check-ups to detect early onset. This study implements a CDS concept model based on Random Forest and dubbed RF+ to predict TMJ OA with the hypothesis that a model which leverages high-resolution radiological and biomarker data in training only can improve predictions compared with a baseline model which does not use privileged information. We found that the RF+ model can outperform the baseline model even when privileged features are not of gold standard quality. Additionally, we introduce a novel method for post-hoc feature analysis, finding shortRunHighGreyLevelEmphasis of the lateral condyles and joint distance to be the most important features from the privileged modalities for predicting TMJ OA.
Keywords: Temporomandibular joint, Privileged learning, Multimodal learning
1. Introduction
The temporomandibular joint (TMJ) plays an essential role in mouth movement and consists of a complex system of bone, cartilage and muscle. Osteoarthritis of the TMJ (TMJ OA), a degenerative disease which affects all structures therein, is the most common disorder of the TMJ [13]. Observations from radiological images show TMJ OA is associated with flattening or deformation of the lateral condyles, reduction of joint space, and possible alterations to the articular fossa region [8, 12]. Although prevalence of TMJ OA has been difficult to calculate, post-mortem analysis of modern bone collections have found a 30.2% prevalence among modern humans [8], with 40 to 75% of the population reporting at least one symptom of overall disorders of the TMJ (TMD) [12]. TMJ OA falls under the umbrella of osteoarthritis, which is the second most prevalent musculoskeletal disorder behind lower back pain, occurring with a global incidence of nearly 15,000 per year [16] (Fig. 1).
Fig. 1.

Workflow for the reported study. In this study, we utilize Leave-One-Out Cross Validation on a sample of 97 patients. For each fold, a feature selection process consisting of Logistic Regression is computed (A), and then a Random Forest+ model is constructed based on the selected features (B). After all folds have been calculated, a post-hoc analysis is conducted to determine the most important privileged and non-privileged features for tree-based transforms.
Recently, clinical decision support (CDS) models have made waves in the medical community, assisting in diagnosis of a wide range of conditions [1, 9, 14]. Although CDS models cannot replace the need for experienced dental experts, a CDS system designed to detect TMJ OA could function as a useful screening tool as part of regular check-ups, with the goal of detecting early TMJ OA and thus permitting dental experts to initiate treatment and preventive behavioral strategies to decelerate degradation of the TMJ at an early stage.
While clinical questionnaires designed to screen for TMD may help screen for TMJ OA, we hypothesize that including radiological imaging information from the TMJ site as well as protein biomarkers collected from serum/saliva could provide additional information which may be useful for discriminating TMJ OA patients from healthy patients. Studies analyzing protein biomarkers and radiological information in TMJ OA patients have already asserted the predictive utility of these features [3, 17].
However, although radiological imaging and protein biomarkers could be useful additions to a TMJ OA CDS model, it is not reasonable to expect that most clinics would be able to provide such data, as high-resolution cone-beam computed tomography (CBCT) scans of the articular fossa and lateral condyle regions of the TMJ as well as protein microarrays of human serum and saliva samples are more common in research rather than clinical practice. Since typical predictive models require all modalities to be present with no missing data, multimodal co-learning strategies must be explored.
One such strategy incorporating privileged information was developed as a part of a concept called “knowledge transfer” [15]. In knowledge transfer models, a “privileged” modality of data exists in the model solely as a “teacher”, providing information which assists the “student” model solely during the training phase, while disappearing in the test phase. With proper knowledge transfer, the final student model should perform more accurately with the assistance of the privileged information during training than without. In this study, we consider multimodal models which incorporate privileged information, where clinical features will be considered non-privileged information available in training and testing and radiological and biomarker features will be considered privileged information available in the training set only. This will allow the latter, rarer modalities to still assist the model while only requiring basic clinical questionnaire information at test-time, thus generalizing such a decision support model to a larger audience.
The most common privileged learning frameworks are based on artificial neural network (ANN) or support vector machine (SVM) frameworks [5, 6, 15, 17]. However, these models work best under very specific conditions. ANNs are primarily useful with large data samples and features, but considered largely inappropriate for smaller datasets due to the scale of trained parameters required. The well-known SVM+ model, a framework of SVM designed specifically to incorporate privileged information, can be problematic because the privileged modality functions as an error corrector in the model. This means that the privileged modality must provide discriminatory capabilities equivalent to a gold standard, or risk introduction of erroneous error corrections, thus reducing AUCs of the student model. Although some models such as [10] have attempted modifications of the SVM+ algorithm to improve upon this shortcoming, such models are not widely available and come with large computational overhead. In another model, [7] developed a Random Forest model which incorporates privileged information through the construction of “tree-based feature transforms”. The authors claimed that their model can perform at least as well as a non-privileged model, even in the case of substandard privileged information, because of the Random Forest’s unique ability to select best features from a given feature bag.
This study implements a CDS concept model based on the framework from [7] for predicting TMJ OA vs healthy controls, with the hypothesis that a model which leverages our available high-resolution radiological and biomarker data in training can improve predictions compared with a baseline model which requires only clinical features in testing. We further expand the work of [7] by introducing a novel method for post-hoc feature analysis, tracing back the most important features for prediction among both privileged and non-privileged feature sets.
2. Methods
2.1. Data Acquisition and Preparation
Our dataset consisted of 51 early-stage TMJ OA patients and 50 healthy controls recruited at the University of Michigan. All the diagnoses were confirmed by a TMD and orofacial pain specialist following the Diagnostic Criteria for Temporomandibular Disorders (DC/TMD) [11]. The clinical, biological and radiographic data described below were collected from TMJ OA and control subjects with informed consent and following the guidelines of the Institutional Review Board HUM00113199.
Details on the dataset can be found in [3]. Briefly, the clinical dataset was collected following DC/TMD criteria. The biological data comprised of proteins that were previously correlated with arthritis initiation, progression and bone morphological alterations [4]. Using customized protein microarrays (Ray-Biotech, Inc. Norcross, GA), the expression level of 13 proteins was measured in the participants’ saliva and serum samples, respectively. The radiological data was collected from CBCT scans taken using 3D Accuitomo machine (J. Morita MFG. CORP Tokyo, Japan). It consisted of 3D superior condylar-to-fossa joint space measurements and radiomic features. Using BoneTexture module from 3D-Slicer software (www.3Dslicer.org), 43 radiomic features were attained following a standardized protocol reported by Bianchi et al. [2].
Of the 101 patients obtained, four were removed due to missing data, resulting in a final sample size of 97 patients. Features were split into “privileged” and “non-privileged” information based on their probable availability in a real-world clinical setting. Due to the greater difficulty of obtaining high-resolution CBCT scans and microarray biological samples in a clinical setting, we classified these modalities as privileged information while the clinical data were marked as non-privileged features. In total, 68 privileged features and six non-privileged features were included in the dataset.
2.2. Model Construction
The primary model utilized in this study, which here is dubbed “RF+”, is based on the tree-based feature transforms framework from [7] and illustrated in Figure Fig 2. In our RF+ model, a Random Forest model called the “support forest” consisting of decision trees is first constructed based on both privileged features and non-privileged features in the training set only (Fig 2D). After the support forest is constructed, a simple algorithm searches through all nodes of each tree to identify nodes of interest called “link nodes” (Fig 2E). In order to qualify as a link node, any node from tree must satisfy at one of the following criteria:
Fig. 2.

Workflow of the RF+ framework using tree-based feature transforms. The top bar of the figure indicates the feature space used ( or ()).
Node is a root node
Node has a parent with a node feature and has a node feature
Node has a parent with a node feature and has a node feature .
For each link node, the observations at the left and right children of the node are annotated as “0” and “1”, respectively. Then, these labels are utilized to train a “scandent tree” for each link node, which attempts to replicate the discriminative power of the link node utilizing only non-privileged features (Fig 2F).
After all scandent trees are formulated, “tree-based feature transforms” are constructed for each data observation based on the label assigned by each scandent tree. Therefore, if link nodes are discovered, then scandent trees are formulated, resulting in number of binary-labeled tree-based feature transforms (Fig 2J). Then, a final model is formulated based on the non-privileged features and tree-based features only. Since the scandent trees are also based only on non-privileged features, no privileged features are required in testing.
2.3. Cross Validation and Evaluation
Two types of cross validation were utilized in this study. The first was Leave-One-Out Cross Validation (LOOCV), due to its ability to demonstrate fullest use of the training data in a single run. In order to provide a more robust study, we also incorporated a second validation method consisting of 400 times random bootstrapping of 15% of the dataset. Because this method is essentially “Out-of-Bag” sampling for Random Forest models, we denote this validation method with the acronym “OOB” from here onward.
For comparative analysis, four additional models were constructed: 1) one consisting of only privileged features, 2) one consisting of both non-privileged and privileged features, 3) one with only tree-based features, 4) the Baseline model, consisting of only non-privileged features. All models were evaluated for Area Under the Receiver Operating Curve (AUC) for both LOO and OOB validation methods, and standard error was calculated. For OOB, mean AUC and mean standard error were calculated, respectively.
2.4. Post-hoc Feature Analysis
Finally, after all models were run, a post-hoc feature analysis was performed on the tree-based feature transforms. For each tree-based feature transform, we traced back the link node from which it was based and analyzed the node feature at that link node. We then totaled up the frequency with which each feature appeared as a node feature for a link node. Based on the definition of a link node, we decided to distinguish a feature at a link node as a “Root” feature if the node feature appeared at a link node defined by criteria 1 for identifying link nodes (See Sect. 2.2). This is because criteria 2 and 3 for defining link nodes are based on node features of a node given the node feature of a parent node. Thus, although our feature analysis identifies a specific feature at a link node, for non-“Root” features, the scandent tree formed for the link node listed will try to replicate the discriminatory ability of the node feature at the link node given settings of previous node features. Scandent trees from “Root” features, by contrast, will try to replicate the discriminatory ability of the node feature only.
2.5. Implementation
Due to the large number of privileged features, some of which may not be important, a univariate logistic regression to predict TMJ OA was run on the training set for each fold before initiating the RF+ model workflow. Namely, only privileged features were analyzed by logistic regression, and privileged features with an were included in the RF+ model. Because there were only six variables included in the non-privileged feature set, all non-privileged variables were included for all folds.
The model implementation from [7] was preserved in our work. Namely, a feature bagging size of was implemented, and the entire training set was utilized in the construction of the scandent trees. In order to reduce the number of unimportant tree-based features, we implemented a feature importance calculation on the training set immediately after construction of the features and features with an importance score of 0 were eliminated. Due to the large imbalance of non-privileged and tree-based feature transforms, we force equal sampling of each feature set to construct each tree in the final forest. We set max depth equal to 7 and number of trees equal to 100.
3. Results and Discussion
3.1. Dataset Analysis
A patient demographic data table with sample sizes and summaries of the baseline data can be shown in Table S1. As expected, clinical questions for TMD demonstrated discriminative ability to discern TMJ OA from healthy control patients. Averages for privileged information were omitted to save space.
3.2. Feature Selection Analysis
Results from the feature selection using univariate Logistic Regression are shown in Table S2 ranked by the percent of folds in which the features were included as well as the average AUC across all 97 folds. Included in the table was also the performance of non-privileged variables.
3.3. Model Results
AUCs and their respective standard errors for each tested model are shown in Table 1 and Fig S1. The top box (top two models) consists of models in which privileged features are included in the test set, while the bottom box (bottom three models) consists of models in which only non-privileged features are included in the test set. The proposed RF+ model outperformed both the Baseline model as well as the model based on tree-based feature transforms alone. Interestingly, while the Baseline+Privileged model (which incorporates privileged features during testing) outperforms all other models as expected, the Privileged Only model performs lower than expected, even when a Logistic Regression is used for feature selection. This may indicate that although radiomic images are useful for detecting TMJ OA, the extracted features themselves may not be a better screen of TMJ OA compared to a simple clinical questionnaire, but when combined with the clinical questions, can provide some supplementary information.
Table 1.
Model comparison results
| Model | LOO AUC | LOO stderr | OOB AUC | OOB stderr |
|---|---|---|---|---|
| Privileged only Baseline+Privileged |
0.6390 0.7198 |
0.0252 0.0267 |
0.6163 0.7184 |
0.0513 0.0530 |
|
RF+ Tree-based only Baseline |
0.6798 0.6692 0.6518 |
0.0306 0.0477 0.0309 |
0.6974 0.6535 0.6940 |
0.0602 0.0939 0.0590 |
The improved performance of the Tree-Based Only model over the Baseline model demonstrates the potential for tree-based feature transforms to mimic the predictive power of privileged features with only non-privileged features, and suggests that with only six non-privileged features, this model can still coax out interesting non-linear relationships between existing features that were not easily ascertained otherwise.
Lastly, the performance of the RF+ model is interesting in that it can improve the baseline model, even where privileged features are not a “gold standard” source of information, confirming the advantages of this model stated in [7]. In privileged learning models where privileged information is utilized as an “error corrector” [15], privileged features must be close to gold standard quality in order to prevent introduction of erroneous error corrections to a non-privileged model. However, with tree-based transforms, when privileged information is poor, a decision tree can choose tree-based transforms which originate from a non-privileged root node if it outperforms those originating from privileged root nodes. Thus, the RF+ can leverage the discriminative capabilities of privileged features, while downplaying weaknesses of the features.
3.4. Feature Importance Based on Tree-Based Feature Transforms
Feature importance for the top 20 features is shown in Fig. 3. Frequencies were rescaled into a score in range [0, 1] by dividing all feature frequencies by the total number of feature appearances. The top features were Vertical Range Unassisted w/o Pain, which is a clinical feature whereby a patient is asked to open their mouth to the fullest range before pain is felt. The most important privileged features were shortRunHighGreyLevelEmphasis of the lateral condyle and 3D_JS_SI (joint distances). Of the top 10 unique privileged features which ranked highest using this method, eight also appeared in the top 10 most predictive privileged features from the Logistic Regression rankings in Table S2.
Fig. 3.

Top 20 features derived from tree-based feature transforms and their respective importance scores.
4. Conclusion
In this study we implemented an RF+ CDS concept model based on tree-based feature transforms to detect TMJ OA in 97 patients. We incorporate two modalities of privileged information, namely radiological imaging features and biomarker protein data, and one set of non-privileged information consisting of clinical questionnaire data. We demonstrated that our proposed RF+ model outperforms the baseline model, even though both models use only non-privileged information at test time. Furthermore, we expand upon the RF+ model framework to incorporate our own feature importance scores based on appearance of link node features among the most popular tree-based features in the RF+ framework. We show that tree-based feature transforms identify some of the most discriminative features of the dataset and sufficiently replicate their discriminatory capabilities with non-privileged clinical features alone. This work demonstrates both the usefulness of RF+ in predicting TMJ OA and elucidates benefits of incorporating research-obtained information that is not normally obtained clinically as a means to improve upon CDS models.
Supplementary Material
Acknowledgements.
E.W. and A.R. are supported by NIH Grant R37-CA214955. E.W. was supported by T32GM070449 as well. This study was supported by NIDCR R01DE024450 and AAOF Graber Family Teaching and Research Award and by Research Enhancement Award Activity 141 from the University of the Pacific, School of Dentistry.
Footnotes
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-23223-7_7.
References
- 1.Ackermann K, Baker J, Green M, et al. : Computerized clinical decision support systems for the early detection of sepsis among adult inpatients: scoping review. J. Med. Internet Res. 24(2), e31083 (2022). 10.2196/31083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bianchi J, Gonçalves JR, de Oliveira Ruellas AC, et al. : Software comparison to analyze bone radiomics from high resolution CBCT scans of mandibular condyles. Dentomaxillofacial Radiol. 48(6), 20190049 (2019). 10.1259/dmfr.20190049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bianchi J, de Oliveira Ruellas AC, Gonçalves JR, et al. : Osteoarthritis of the temporomandibular joint can be diagnosed earlier using biomarkers and machine learning. Sci. Rep. 10(1) (2020). 10.1038/s41598-020-64942-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cevidanes L, Walker D, Schilling J, et al. : 3d osteoarthritic changes in TMJ condylar morphology correlates with specific systemic and local biomarkers of disease. Osteoarthritis Cartilage 22(10), 1657–1667 (2014). 10.1016/j.joca.2014.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chauhan G, Liao R, Wells W, et al. : Joint modeling of chest radiographs and radiology reports for pulmonary EDEMA assessment (2020). 10.48550/ARXIV.2008.09884, https://arxiv.org/abs/2008.09884 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hu M, et alet al. : Knowledge distillation from multi-modal to mono-modal segmentation networks. In: Martel AL., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 772–781. Springer, Cham; (2020). 10.1007/978-3-030-59710-8_75 [DOI] [Google Scholar]
- 7.Moradi M, Syeda-Mahmood T, Hor S: Tree-based transforms for privileged learning. In: Wang L., Adeli E., Wang Q., Shi Y., Suk H-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 188–195. Springer, Cham; (2016). 10.1007/978-3-319-47157-0_23 [DOI] [Google Scholar]
- 8.Rando C, Waldron T: TMJ osteoarthritis: a new approach to diagnosis. Am. J. Phys. Anthropol. 148(1), 45–53 (2012). 10.1002/ajpa.22039 [DOI] [PubMed] [Google Scholar]
- 9.Rao A, Palma J: Clinical decision support in the neonatal ICU. Seminars Fetal Neonatal Med. 101332 (2022). 10.1016/j.siny.2022.101332 [DOI] [PubMed] [Google Scholar]
- 10.Sabeti E, Drews J, Reamaroon N, et al. : Learning using partially available privileged information and label uncertainty: application in detection of acute respiratory distress syndrome. IEEE J. Biomed. Health Inform. 25(3), 784–796 (2021). 10.1109/jbhi.2020.3008601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Schiffman E, Ohrbach R, Truelove E, et al. : Diagnostic criteria for temporomandibular disorders (DC/TMD) for clinical and research applications: recommendations of the international RDC/TMD consortium network and orofacial pain special interest group. J. Oral Facial Pain Headache 28(1), 6–27 (2014). 10.11607/jop.1151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Scrivani SJ, Keith DA, Kaban LB: Temporomandibular disorders. New Engl. J. Med. 359(25), 2693–2705 (2008). 10.1056/nejmra0802472 [DOI] [PubMed] [Google Scholar]
- 13.Tanaka E, Detamore M, Mercuri L: Degenerative disorders of the temporomandibular joint: etiology, diagnosis, and treatment. J. Dental Res. 87(4), 296–307 (2008). 10.1177/154405910808700406 [DOI] [PubMed] [Google Scholar]
- 14.Tuppad A, Patil SD: Machine learning for diabetes clinical decision support: a review. Adv. Comput. Intell. 2(2), 1–24 (2022). 10.1007/s43674-022-00034-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Vapnik V, Vashist A: A new learning paradigm: learning using privileged information. Neural Netw. 22(5–6), 544–557 (2009). 10.1016/j.neunet.2009.06.042 [DOI] [PubMed] [Google Scholar]
- 16.Vos T, Abajobir AA, Abate KH, et al. : Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the global burden of disease study 2016. Lancet 390(10100), 1211–1259 (2017). 10.1016/s0140-6736(17)32154-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang W, Bianchi J, Turkestani NA, et al. : Temporomandibular joint osteoarthritis diagnosis using privileged learning of protein markers. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE (2021). 10.1109/embc46164.2021.9629990 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
