Application of machine learning in Chinese medicine differentiation of dampness-heat pattern in patients with type 2 diabetes mellitus

Xinyu Liu; Xiaoqiang Huang; Jindong Zhao; Yanjin Su; Lu Shen; Yuhong Duan; Jing Gong; Zhihai Zhang; Shenghua Piao; Qing Zhu; Xianglu Rong; Jiao Guo

doi:10.1016/j.heliyon.2023.e13289

. 2023 Feb 13;9(2):e13289. doi: 10.1016/j.heliyon.2023.e13289

Application of machine learning in Chinese medicine differentiation of dampness-heat pattern in patients with type 2 diabetes mellitus

Xinyu Liu ^a,^b,^c,^d,¹, Xiaoqiang Huang ^e,¹, Jindong Zhao ^f,¹, Yanjin Su ^g, Lu Shen ^h, Yuhong Duan ⁱ, Jing Gong ^j, Zhihai Zhang ^k, Shenghua Piao ^a,^b,^c,^d, Qing Zhu ^a,^b,^c,^d, Xianglu Rong ^a,^b,^c,^d, Jiao Guo ^a,^b,^c,^d,^∗

PMCID: PMC9975099 PMID: 36873141

Abstract

Background

China has become the country with the largest number of people with type 2 diabetes mellitus (T2DM), and Chinese medicine (CM) has unique advantages in preventing and treating T2DM, while accurate pattern differentiation is the guarantee for proper treatment.

Objective

The establishment of the CM pattern differentiation model of T2DM is helpful to the pattern diagnosis of the disease. At present, there are few studies on dampness-heat pattern differentiation models of T2DM. Therefore, we establish a machine learning model, hoping to provide an efficient tool for the pattern diagnosis of CM for T2DM in the future.

Methods

A total of 1021 effective samples of T2DM patients from ten CM hospitals or clinics were collected by a questionnaire including patients' demographic and dampness-heat-related symptoms and signs. All information and the diagnosis of the dampness-heat pattern of patients were completed by experienced CM physicians at each visit. We applied six machine learning algorithms (Artificial Neural Network [ANN], K-Nearest Neighbor [KNN], Naïve Bayes [NB], Support Vector Machine [SVM], Extreme Gradient Boosting [XGBoost] and Random Forest [RF]) and compared their performance. And then we also utilized Shapley additive explanation (SHAP) method to explain the best performance model.

Results

The XGBoost model had the highest AUC (0.951, 95% CI 0.925–0.978) among the six models, with the best sensitivity, accuracy, F1 score, negative predictive value, and excellent specificity, precision, and positive predictive value. The SHAP method based on XGBoost showed that slimy yellow tongue fur was the most important sign in dampness-heat pattern diagnosis. The slippery pulse or rapid-slippery pulse, sticky stool with ungratifying defecation also performed an important role in this diagnostic model. Furthermore, the red tongue acted as an important tongue sign for the dampness-heat pattern.

Conclusion

This study constructed a dampness-heat pattern differentiation model of T2DM based on machine learning. The XGBoost model is a tool with the potential to help CM practitioners make quick diagnosis decisions and contribute to the standardization and international application of CM patterns.

Keywords: Dampness-heat pattern, Machine learning, Diagnostic model, Pattern differentiation

1. Introduction

Nowadays, diabetes is a global epidemic, and its prevalence has accelerated quickly. The International Diabetes Federation (IDF) estimated that more than 5.37 million people have diabetes, and this number is expected to reach 7.84 million by 2045 [[1]]. Especially since the epidemic of COVID-19, several studies have revealed that patients with COVID-19 complicated with diabetes mellitus (DM) have an increased risk of morbidity and mortality [[2,3]]. The prevalence of diabetes in China has risen substantially in recent years, with research data showing that it reached 12.8% in 2015–2017 [[4]], making it the country with the largest number of diabetics in the world [[1]]. Traditional Chinese medicine (TCM) has been used for thousands of years to treat and prevent diseases and health care in China. During this pandemic, the clinical use of TCM in fighting against COVID-19 in China indicated the integration of TCM in planning for clinical management was worthy of consideration, which was recommended by specialists to WHO [[5]]. Currently, TCM as a treatment of DM has made great progress in recent years, and its effect has been acknowledged [[6]].

Pattern identification as the basis for determining treatment is the core of TCM theory [[7]]. TCM patterns, also known as ZHENG (证,zhèng) or syndrome, is distinguished by symptoms and signs examined in an individual by four main diagnostic techniques: inspection, auscultation and smell, palpation, and interrogation, which a comprehensive summary of the cause, location, nature, and development tendency of an illness at a certain stage during its course. It specifies the state of interaction between pathogenic factors and the corresponding reactions of the body [[8]]. The World Health Organization International Classification of Diseases (ICD-11) [[9]] has incorporated the TCM pattern as a supplementary chapter. Accordingly, TCM is bound to receive more attention in the future. It differs from the conventional diagnosis approach of western medicine in that TCM establishes patterns using four main diagnostic procedures. Figuratively speaking, it acts as a bridge to analyze four diagnosis methods and then guides the choice of TCM therapy with acupuncture and herbal formulas in accordance with TCM diagnosis and treatment theory. A correct diagnosis is an essential prerequisite to appropriate treatment [[10]]. With the release of ICD-11, there is an urgent need to standardize pattern diagnoses.

Nonetheless, each of these diagnostic methods requires considerable skill, which would spend many years for beginners to understand the complicated relationships between symptoms and patterns, even learning knowledge from distinguished CM veteran doctors [[11]]. Therefore, it is worthwhile for TCM doctors and scholars to develop an objective and reliable aid for pattern diagnosis. Machine learning (ML) is a burgeoning field of medicine where computer science and statistics are applied to solve medical problems [[12]], spurred on by the modernization of TCM, which relies heavily on ML for diagnosing syndromes [[11]] and related research of Chinese herbal medicine[[13]]. Although there have been some exploratory studies [[14]] and expert consensus [[15]] on the diagnosis of TCM pattern of type 2 diabetes mellitus (T2DM), there have been few studies on solving the problem of single pattern diagnosis, i.e. dampness-heat pattern of T2DM. Traditional Chinese medicine is effective for T2DM, but it is difficult to distinguish the syndrome effectively in the clinic. Therefore, our team, based on the dampness-heat-related symptoms/signs obtained by the Delphi method, collected multicenter data. Six machine learning methods were used to explore a new diagnostic method for the dampness-heat pattern of T2DM. Finally, an efficient diagnosis model of the dampness-heat pattern of T2DM based on Extreme Gradient Boosting (XGboost) was obtained, and the model was successfully interpreted by the Shapley additive explanation (SHAP) method.

2. Data and methods

2.1. Study design and population

The Institutional Ethics Committee (ICE) of the First Affiliated Hospital of Guangdong Pharmaceutical University approved all experimental protocols related to this study (ICE approval ID:2019-ICE-109) and confirmed that informed consent was obtained. This prospective observational study was conducted at multiple centers. Participants with T2DM who visited one of the ten CM hospitals or clinics completed electronic questionnaires. In this research, we analyzed the same data from these ten sites as our previous study and merged them without considering the original site. Patients diagnosed with T2DM, according to the diagnostic criteria established by the 2020 Chinese Medical Association Diabetes Branch [[16]]. Exclusion criteria were:(1) an unwillingness to participate in the study; (2) age younger than 18 years; (3) Diseases with severe respiratory symptoms, severe infectious diseases, severe heart diseases, severe liver diseases, or tumors; (4) presence of any complications of diabetes (such as diabetic kidney disease or diabetic coronary artery disease); (5) or pregnancy. A total of 1973 questionnaires were collected from Jun 18, 2021, to Aug 9, 2021. By using the Python package (Scikit-learn), patients were randomly divided into two groups, a training set (n = 715) and a validation set (n = 306). For preprocessing optimization and hyperparameter tuning, five-fold cross-validation was performed on the training set [Fig. 1].

Fig. 1 — Flow diagram of patient selection and model building.

2.2. Patient questionnaire

Patients' demographic and dampness-heat-related symptoms/signs were recorded by electronic questionnaire, which was designed to conduct in type 2 diabetes mellitus. All dampness-heat-related items were unanimously selected after a 2-round Delphi study by the CM experts with 10–30 years of clinical experience in a previous study [[17]]. There were 14 dampness-heat-related symptoms/signs in the questionnaire, which mainly consisted of three domains: TCM symptoms, pulse conditions and tongue pictures, and one CM veteran doctor assessed or inquired about the symptoms/signs and recorded their value. Symptoms and signs include heavy body, obesity, heavy sensation of head, sticky and greasy in mouth, sticky stool with ungratifying defecation, bitter taste in the mouth, halitosis, dry mouth and thirst, deep-colored urine, constipation, slimy yellow tongue fur, thick tongue fur, red tongue, slippery pulse, or rapid-slippery pulse. The items of TCM syndromes, tongue, and pulse characters by inspecting, listening to the sound, and smelling the odors, inquiring and pulse-taking. The TCM syndromes were described by “none"(0), “mild"(1), “moderate"(2), and “severe"(3), and for the tongue and pulse characters, response options were “present” (1) or “absent” (0). However, there is no standard case definition for the dampness-heat syndrome of T2DM, given the limitations of existing diagnostic tools. The diagnosis of the dampness-heat pattern of patients was completed by experienced CM physicians at each visit. The participants were informed that they could leave an interview at any time, and all recordings would be transcribed confidentially and analyzed anonymously. We were granted a waiver of written informed consent but obtained verbal informed consent from participants before their interview because of a shortage of staff and funds. STARD guidelines were followed in reporting our results. To improve the reliability and response rate of the questionnaire, we adjusted the questionnaire many times to make it easier for operators and respondents to understand the Chinese context to the maximum extent. At the same time, we gave explanations of TCM terms to facilitate the understanding of operators and respondents. And we did not use ordinary investigators, but all invited doctors with TCM qualifications to conduct the survey. Mainland China has a strict examination and training system for TCM qualification, so the credibility of the information collection of the four diagnoses of TCM in this study can be improved to the maximum extent.

2.3. Model application

Due to the serious difficulty of TCM pattern diagnosis, experienced and high-level TCM doctors are often needed for clinical data collection. Therefore, it is difficult to form a large sample of TCM research due to the lack of personnel. Based on this, we believe that traditional machine learning seems to be able to get better results from small samples compared to deep learning [[18]]. At the same time, the “black box” problem of deep learning is more challenging than traditional machine learning [[18]]. Besides, the labels and results in this study are binary or multi-classification data, and it seems more appropriate to adopt the classification algorithm in the supervised mode [[19]]. For the above reasons, six machine learning models, including artificial neural network (ANN), K-nearest neighbor (KNN), naive Bayes (NB), support vector machine (SVM), extreme gradient boosting (XGBoost) and random forest (RF), were used to develop models that distinguished dampness-heat pattern as a binary outcome (presence and absence).

Interpretable solutions will be key to machine learning becoming routine clinical and healthcare practice [[20]]. The use of interpretable models can effectively reduce bias [ [21]]. RF and XGboost have unique explanatory properties, and they are both integrated algorithms of the decision tree, which can improve the accuracy of the decision tree to a certain extent [[22]]. According to a previous study, for classification purposes, the RF and XGBoost classification models performed most optimally with clinical data [[23]]. In addition, compared with other algorithms, the features of multicollinearity do not affect the predictive ability of RF and XGboost models based on decision trees [[24,25]].

SVM is used for binary classification problems in numerous fields, especially in the field of medicine [[26]]. The core principle of SVM classification is that they map vectors into a higher dimensional space, and in this space, there is a maximum margin hyperplane. On either side of the hyperplane separating the data are two hyperplanes that are parallel to each other, separating the hyperplanes to maximize the distance between the two parallel hyperplanes [[27]]. The algorithm is data-driven and can perform fairly well when the sample size is small in comparison to the number of variables, which is why it is widely used in prognostic studies for tasks related to the automatic classification of diseases [[28]].

KNN, also called Reference Sample Plot Method, is another classification technique. The basic principle is to assign labels of classified data points to the closest unclassified data points [[26]], which is a simple classification algorithm with good performance in medical diagnosis [[29]].

ANN belong to a subtype of artificial intelligence and has been used in many subspecialties of clinical medicine [[30]]. The ANN consists of nodes connected by weighted edges in a multilayer architecture, including an input layer, an output layer and one or more hidden layers [[27]]. It can help doctors to identify complex TCM patterns, process large amounts of data, and reduce diagnosis time and the possibility of ignoring relevant information [[31]]. Moreover, ANN-based models usually have optimal accuracy and AUC values [[30]].

NB is based on the Bayes theorem. This simple yet effective rule calculates the probability of an event based on information gained about that event [[26]]. Similar to ANN, both NB and ANN demonstrate robust performance in classification [[32]].

2.4. Statistical analysis

Python (https://www.python.org/; v3.10) was used for statistical description and analysis. Measurement data were expressed as means and standard deviations (median or quartile is used for abnormal conditions), while enumeration data were described as frequencies and percentages. Student's test, Chi-square test, or Mann-Whitney U test was used for differences between the two groups according to data type. P < 0.05 was considered statistically significant. Model building and evaluation mainly include the following aspects. First, we imported the electronic questionnaire data into Python, including 14 feature items and one result item. Secondly, we randomly divided the data into the training set and the test set at a percentage of 70%/30%. After that, six models were initially built by using a package (Scikit-learn). And then, the optimal parameters of the model were found by 5-fold cross-validation of the training data, and the optimal parameters were used to further adjust the model. Subsequently, the validation set was used to evaluate the model performance by confusion matrix, receiver operator characteristic (ROC) curve, precision-recall(P-R) curve, area under ROC curve (AUC), average precision (AP), sensitivity, specificity, accuracy, recall rate, F1 score. Finally, the Shapley additive explanation (SHAP) method was used to explain individual predictions of the best-performing prediction model in our study by quantifying and ranking the importance of each variable to the diagnosis [[33]]. The Python package SHAP was used to estimate SHAP values for the trained models and to visualize the results.

3. Results

3.1. Patient characteristics

During the study period, a total of 1973 cases were diagnosed with T2DM in ten CM hospitals or clinics. Among them, the following 952 cases were excluded: 543 of diabetic kidney disease and 409 of diabetic coronary heart disease. Ultimately, 1021 cases were included in the analysis. A total of 38.1% of the patients had the dampness-heat syndrome, the average age was 57.5 years, 592 (58.0%) of the patients were male, and the median duration of diabetes was six years. Table 1 shows the demographics and items of dampness-heat-related symptoms/signs. No significant differences in age, gender, or any other demographic factor were found between the two groups, but for the clinical symptoms of the two groups, there were apparently distinctions and significant differences.

Table 1.

Baseline demographics and items of dampness-heat-related symptoms/signs.

	Whole Sample	Non-dampness-heat pattern	dampness-heat pattern	P-Value
Sample size, n (%)	1021(100)	632(61.9)	389(38.1)
Gander, n (%)				0.119
Female	429 (42.0)	278 (44.0)	151 (38.8)
Male	592 (58.0)	354 (56.0)	238 (61.2)
Age, y	57.5 ± 13.1	57.8 ± 12.8	57.0 ± 13.5	0.318
Height, cm	166.1 ± 7.9	165.9 ± 8.2	166.5 ± 7.3	0.205
Weight, kg	68.1 ± 12.2	67.8 ± 12.8	68.6 ± 11.2	0.297
Course of T2MD, y	6.0 [2.0,12.0]	7.0 [2.0,12.0]	5.0 [2.0,10.0]	0.051
Heavy body, n (%)				<0.001
None	648	500	148
Mild	303 (29.7)	125 (19.8)	178 (45.8)
Moderate	54 (5.3)	4 (0.6)	50 (12.9)
Severe	16 (1.6)	3 (0.5)	13 (3.3)
The heavy sensation of the head, n (%)				<0.001
None	713	537	176
Mild	260 (25.5)	92 (14.6)	168 (43.2)
Moderate	39 (3.8)	3 (0.5)	36 (9.3)
Severe	9 (0.9)		9 (2.3)
Bitter taste in the mouth, n (%)				<0.001
None	687(67.3)	546(86.4)	141(36.2)
Mild	282 (27.6)	80 (12.7)	202 (51.9)
Moderate	46 (4.5)	5 (0.8)	41 (10.5)
Severe	6 (0.6)	1 (0.2)	5 (1.3)
Sticky and greasy in mouth, n (%)				<0.001
None	681(66.7)	556(86.4)	125(32.1)
Mild	270 (26.4)	74 (11.7)	196 (50.4)
Moderate	64 (6.3)	2 (0.3)	62 (15.9)
Severe	6 (0.6)		6 (1.5)
Halitosis, n (%)				<0.001
None	752(73.7)	574(90.8)	178(45.8)
Mild	221 (21.6)	56 (8.9)	165 (42.4)
Moderate	42 (4.1)	2 (0.3)	40 (10.3)
Severe	6 (0.6)		6 (1.5)
Deep-colored urine, n (%)				<0.001
None	683(66.9)	555(87.8)	128(32.9)
Mild	282 (27.6)	76 (12.0)	206 (53.0)
Moderate	51 (5.0)	1 (0.2)	50 (12.9)
Severe	5 (0.5)		5 (1.3)
Sticky stool with ungratifying defecation, n (%)				<0.001
None	761(74.5)	592(93.7)	169(43.4)
Mild	212 (20.8)	38 (6.0)	174 (44.7)
Moderate	45 (4.4)	2 (0.3)	43 (11.1)
Severe	3 (0.3)		3 (0.8)
Dry mouth and thirst, n (%)				0.003
None	199(19.5)	146(23.1)	53(13.6)
Mild	572 (56.0)	342 (54.1)	230 (59.1)
Moderate	229 (22.4)	132 (20.9)	97 (24.9)
Severe	21 (2.1)	12 (1.9)	9 (2.3)
Constipation, n (%)				<0.001
None	670(65.6)	488(77.2)	182(46.8)
Mild	280 (27.4)	117 (18.5)	163 (41.9)
Moderate	55 (5.4)	20 (3.2)	35 (9.0)
Severe	16 (1.6)	7 (1.1)	9 (2.3)
Obesity, n (%)				<0.001
None	572(56)	389(61.6)	183(47)
Mild	304 (29.8)	174 (27.5)	130 (33.4)
Moderate	107 (10.5)	53 (8.4)	54 (13.9)
Severe	38 (3.7)	16 (2.5)	22 (5.7)
Slimy yellow tongue fur, n (%)				<0.001
Absent	697(68.3)	595(94.1)	102(26.2)
Present	324 (31.7)	37 (5.9)	287 (73.8)
Thick tongue fur, n (%)				<0.001
Absent	698(68.4)	557(88.1)	141(36.2)
Present	323 (31.6)	75 (11.9)	248 (63.8)
Red tongue, n (%)				<0.001
Absent	517(50.6)	428(67.7)	89(22.9)
Present	504 (49.4)	204 (32.3)	300 (77.1)
Slippery pulse or rapid-slippery pulse, n (%)				<0.001
Absent	660(64.6)	562(88.9)	98(25.2)
Present	361 (35.4)	70 (11.1)	291 (74.8)

Open in a new tab

3.2. Model building and evaluation

To create the diagnostic model, the 14 symptoms and signs were used as a feature for six machine-learning models. A comparison of the performance of the six machine-learning models is shown in Table 2. The XGBoost model had the optimal AUC (0.951, 95% CI 0.925–0.978), sensitivity, accuracy, average precision, F1 score, negative predictive value, excellent specificity, and positive predictive value. The KNN and NB model had the lowest AUC value [Fig. 2a]. Based on the diagnosed results, we calculated the P-R curve of six models and calculated the area under the P-R curve (average precision, AP) to measure the models' AP of the dampness-heat syndrome in T2DM. The XGBoost model also had the highest AP, and the KNN and NB models had the lowest [Fig. 2b].

Table 2.

Performance of the diagnostic models generated by six machine learning algorithms.

	SE(Recall)	SP	AC	AP	F1	PPV	NPV	AUC and 95%CI
SVM	0.769	0.973	0.902	0.929	0.849	0.949	0.86	0.945(0.917 0.973)
ANN	0.826	0.919	0.883	0.931	0.848	0.87	0.891	0.947(0.921 0.974)
KNN	0.785	0.962	0.893	0.885	0.852	0.931	0.873	0.922(0.899 0.962)
XGboost	0.843	0.957	0.912	0.94	0.883	0.927	0.904	0.951(0.925 0.978)
NB	0.818	0.936	0.889	0.885	0.853	0.892	0.888	0.922(0.892 0.958)
RF	0.826	0.941	0.896	0.927	0.862	0.901	0.893	0.942(0.925 0.976)

Open in a new tab

*SE: sensitivity; SP: specificity; AC: accuracy; AP: average precision; F1: F1 score; PPV: positive predictive value; NPV: negative predictive value.

Fig. 2a — Evaluation of the six machine learning algorithms based on the AUC of the ROC curve. AUC, area under the curve; ROC, receiver operating characteristic; 2b P-R curve and AP of the six models. P-R curve: precision-recall curve; AP: average precision.

3.3. Model performance interpretation

To interpret the best performance machine-learning model XGBoost that was important for pattern differentiation, we used Shapley additive explanations (SHAP). SHAP values for all 1021 patients in the train set are shown in Fig. 3a. SHAP force plots show the contours of patients at the high or low likelihood of diagnosing dampness-heat pattern. One typical patient in the positive group (diagnosis for dampness-heat pattern) and one in the negative group (non-diagnosis for dampness-heat pattern) are shown in Fig. 3b and c with the detailed SHAP values of the most important variables.

Fig. 3 — Shapley additive explanation (SHAP) values to show the interpretability of the effects of variables as the input factors for the diagnosis of the dampness-heat pattern of T2DM. (a) SHAP values for all 1021 patients in the train set. (b, c) SHAP values of two typical patients from the positive group (b) and the negative group (c) are illustrated with their most important variables.

*F(x): out SHAP value; E(f(x)): basic value.

3.4. Explanation of variables

We used SHAP to find the features that were important for pattern differentiation. The importance matrix graph [Fig. 4a] and SHAP summary graph [Fig. 4b] for the XGBoost model identified how each variable is important for the diagnosis of the dampness-heat pattern. SHAP values greater than zero represented a higher possibility of dampness-heat pattern in T2DM.

The importance matrix plot ranked the variables contributing to dampness-heat diagnosis from most to least important and showed that slimy yellow tongue fur was the most important sign in dampness-heat pattern diagnosis. The slippery pulse or rapid-slippery pulse, sticky stool with ungratifying defecation also performed an important role in this diagnostic model. Furthermore, the red tongue acted as an important tongue sign for the dampness-heat pattern. In addition, other symptoms and signs contributed to the diagnosis model [Fig. 4].

4. Discussion

This study constructed a CM pattern differentiation model for dampness-heat in patients with type 2 diabetes mellitus patients based on machine learning and clinical variables from four main diagnostic procedures. High performance was achieved by all models, with AUCs ranging from 0.922 to 0.951. Compared to other models, the XGBoost model performed the best, with the best performance of diagnosis in AUC (0.951, 95% CI 0.925–0.978), sensitivity, accuracy, average precision, F1 score, negative predictive value, and excellent specificity and positive predictive value. The XGBoost model is high-performance and overcomes the shortcomings(long learning times and overfitting problems) of the gradient boosting machine(GBM) that has been used for diagnosis and prediction in multiple clinical scenarios for T2DM [[34]]. Among all syndromes, tongue and pulse characters from four main diagnostic procedures, slimy yellow tongue fur was the most important sign in dampness-heat pattern diagnosis, determined by machine learning in our study. Our study indicated that machine learning algorithms appear to be a feasible and viable enhancement for pattern differentiation in Chinese Medicine clinical practice.

The dampness-heat pattern is the most common CM pattern in patients with T2DM and is also a hot spot for combined disease and CM pattern research [[35]]. To the best of our knowledge, this will be the first published study to generate machine-learning algorithms to distinguish the dampness-heat pattern of T2DM. Due to the diagnostic criteria for dampness-heat pattern in T2DM were not standardized in the past, the prevalence of dampness-heat pattern ranged from 13.2% to 58.29% [[36,37]], while in our study, the percentage of patients with a positive diagnosis was 38.10% and the accuracy of XGBoost algorithm in validate set was 91.2%, which implied the reproducibility of the model is excellent. Our results were consistent with the previous findings. Previous studies have demonstrated the significant role of the XGBoost algorithm in other medical fields, such as electronic medical records and natural language processing for pattern diagnosis [[38]], development of risk score model [[39]], and prediction of mortality [[40]]. Our results confirmed the outstanding performance of the XGBoost model in the diagnosis of the CM pattern.

Recently, there has been an increase in the application and modelling of machine learning methods in medicine, which provides a viable avenue for constructing pattern differentiation diagnostic models [[11]]. However, the inability of machine learning users to understand the results of complex machine learning models becomes problematic, presented as black boxes [[12]]. This situation is not more receptive than a doctor performing pattern differentiation. However, pattern differentiation can be modelled as a dimensionality reduction process that deserves further exploration and research based on the machine learning perspective [[41]]. SHAP methods are now commonly used in medical diagnostic or predictive models for variable importance interpretation, especially machine learning models, and are constructive in understanding the importance of clinical characteristics for disease diagnosis [[42,43]]. In the present study, to facilitate the interpretation of the decision-making process of the XGBoost algorithm model, we used the SHAP methodology to explain our diagnosis model [[33]]. Slimy yellow tongue fur, slippery pulse or rapid-slippery pulse, sticky stool with ungratifying defecation and red tongue are the symptoms most associated with the diagnosis of the dampness-heat pattern. Previous studies have identified Slimy yellow tongue fur as one of the features of the diabetic tongue characteristic [[44]], and it is likewise a representative of the hot pattern [[45]], where the dampness-heat pattern belongs to. Slimy yellow tongue fur, red tongue and slippery pulse or rapid-slippery pulse have been used as a typical sign in the diagnosis of dampness-heat patterns in diabetes in expert consensus [[15]], and furthermore, has also demonstrated a strong correlation with dampness-heat pattern in other diseases [[46]]. Sticky stool with ungratifying defecation, as a typical symptom of intestinal dampness-heat pattern, has been previously included in several expert consensuses on the diagnosis of CM pattern of digestive system diseases [[47]] and is also an objective symptom as one of the evaluation indicators of the animal model of the dampness-heat pattern [[48]], and also performed an important role in the present model. Consistent with this, we used SHAP's visualization approach to provide clinical insight and inform clinical pattern differentiation and highlight the most important symptoms and signs of diagnostic models.

The ICD-11, the new release of the ICD, contains a supplementary chapter for Traditional Medicine Conditions [[9]]. This chapter describes various types of traditional medicine patterns, including the dampness-heat pattern in the liver-gallbladder, uterus, bladder, liver meridian, spleen system et al. Although the revision of ICD-11 added a chapter on TCM, and WHO had made clear that this chapter does not refer to nor endorse any specific form of traditional medical treatment [[49]], there still was some worry voice about how to provide objective, reliable, reproducible assessment and to reduce inter-rater variability by diagnostic procedures [[50]]. The basic methodology of CM practitioners for pattern diagnostic is still primarily based on experience, tacit knowledge and possibly subjective perceptions from rigorous training. This can lead to inconsistent diagnoses because doctors rely heavily on subjective experience and personal knowledge [[11]]. In detail, the inconsistency comes from two aspects: identification of symptoms and signs and pattern differentiation. For a CM practitioner, recognizing signs and symptoms are as basic a diagnostic art as the physical examination in Western medicine and does not require over-reliance on medical diagnostic techniques and tests. A reproducibility study supported that there was reasonable to a very good agreement on a range of clinical data collected from diagnostic methods used in a TCM examination, such as inspection, auscultation, and palpation [[51]]. This means there is a greater need to develop an auxiliary tool to improve the accuracy and reproducibility of pattern differentiation, as we have done in this study, applying machine learning algorithms to assist in pattern differentiation.

In practice, successful and appropriate treatment will require accurate pattern differentiation based on the signs and symptoms collected. Further speaking, pattern differentiation is critical not only for the clinical consistency and efficacy of different TCM experts but also for the development of TCM standardization. The number of CM pattern differences using machine learning methodology has recently increased. The majority of these studies have attempted to develop diagnosis models capable of reproducing a CM doctor's diagnosis [[52,53]]. While at the same time, there are studies using machine learning techniques driven data mining methods to study patient pattern differentiation [[53,54]]. Simultaneously some studies solely analyzed the Chinese medicine tongue images so that diabetes can be effectively differentiated [[55]]. Palpation diagnosis is also a non-invasive and effective method for CM practitioners to check the location and extent of a patient's disease, and the data was collected as pulse waveforms data [[11,56]]. These advances mean that CM researchers are facing new challenges with big data as the use of instruments and sensors increases [[57]]. And all patterns are the theoretical profiles of symptoms and signs, and each pattern is based on the diagnostic conclusions of the four diagnostic methods of TCM. As newly emerged approaches that recognize the potential and useful information from a large number of data, ML approaches are favored for their inherent advantages in handling big data [[13]]. It also means that multimodal data will be the future application of machine learning in CM diagnosis.

This study has the following advantages. First, this is the first study to implement machine learning methods to differentiate the dampness-heat pattern of T2DM, and all applied models showed good differentiation performance. Second, among the other models, the XGBoost model performed best. XGBoost is an efficient and scalable machine learning classifier that has advantages such as ease of use, ease of parallelization, and high predictive accuracy. Third, we applied the SHAP method to rank the importance of the included variables and found that slimy yellow tongue fur, slippery pulse or rapid-slippery pulse, sticky stool with ungratifying defecation and red tongue were the most important diagnostic factors for the dampness-heat pattern of T2DM, which compensated to some extent for the machine learning model as an uninterpretable black box. Fourth, our model can be used by CM beginners as a visual approach to support the decision-making or diagnose dampness-heat pattern of T2DM before they become CM veteran doctors, which means it may accelerate the cultivation of CM talents. Last, clinical data for pattern differentiation was collected in several provinces in China, which underlines the generalizability of our findings.

This study was associated with some limitations. First, this study was conducted only for a single label for the dampness-heat pattern of T2DM. However, a patient may suffer from several diseases at the same time, and one disease can reflect several syndromes. For these complex cases, there is no satisfactory framework to deal with the diagnosis of multiple coexisting patterns, so we just focus on one of the typical patterns in T2DM. Second, limited by funding and human resources, an independent external validation patient cohort was not used to verify the stability in the performance of our diagnosis model. Notwithstanding, we believe our rigorous methodology generated a robust predictive model of dampness-heat pattern diagnosis model based on CM's four diagnostic methods. In the future, in addition to the models from the present study, we will also develop applications and conduct a prospective study to further validate our results.

5. Conclusion

In conclusion, our study attempts to the utility of machine learning algorithms trained on CM four diagnostic datasets to estimate diagnosis accuracy and potentially apply in pattern differentiation. The XGBoost model we established as a tool to diagnose dampness-heat patterns in patients with T2DM may pave the way to help CM practitioners make quick diagnosis decisions. However, this model should be further evaluated, specifically in clinical scenarios in the future.

Declarations

Author contribution statement

Xinyu Liu and Xiaoqiang Huang: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper Jindong Zhao: Conceived and designed the experiments; Performed the experiments; Contributed reagents, materials, analysis tools or data Yanjin Su, Lu Shen, Yuhong Duan, Jing Gong, Zhihai Zhang, Qing Zhu and Xianglu Rong: Contributed reagents, materials, analysis tools or data; Performed the experiments Shenghua Piao: Conceived and designed the experiments; Performed the experiments Jiao Guo: Conceived and designed the experiments; Wrote the paper.

Funding statement

Jiao Guo was supported by National Key R&D Plan [2018YFC1704200], National Natural Science Foundation of China [81830113], Major Basic and Applied Basic Research Projects of Guangdong Province of China [2019B030302005].

Data availability statement

Data will be made available on request.

Additional information

No additional information is available for this paper.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Xinyu Liu, Email: liuxinyu126@foxmail.com.

Jiao Guo, Email: gyguoyz@163.com.

References

1.International Diabetes Federation . tenth ed. Brussels, Belgium; 2021. IDF Diabetes Atlas.https://www.diabetesatlas.org Available at: [Google Scholar]
2.Pranata R., Henrina J., Raffaello W.M., Lawrensia S., Huang I. Diabetes and COVID-19: the past, the present, and the future. Metabolism. 2021 Aug;121 doi: 10.1016/j.metabol.2021.154814. PMID: 34119537. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sardu C., Gargiulo G., Esposito G., Paolisso G., Marfella R. Impact of diabetes mellitus on clinical outcomes in patients affected by Covid-19. Cardiovasc. Diabetol. 2020 Jun 11;19(1):76. doi: 10.1186/s12933-020-01047-y. PMID: 32527257. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Li Y., Teng D., Shi X., Qin G., Qin Y., Quan H., et al. Prevalence of diabetes recorded in mainland China using 2018 diagnostic criteria from the American Diabetes Association. national cross sectional study. BMJ. 2020:369. doi: 10.1136/bmj.m997. Apr 28. m997. PMID: 32345662. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.World Health Organization . World Health Organization; Geneva: 2022. WHO Expert Meeting on Evaluation of Traditional Chinese Medicine in the Treatment of COVID-19.https://www.who.int/publications/m/item/who-expert-meeting-on-evaluation-of-traditional-chinese-medicine-in-the-treatment-of-covid-19 Available from: [Google Scholar]
6.Lian F., Ni Q., Shen Y., Yang S., Piao C., Wang J., et al. International traditional Chinese medicine guideline for diagnostic and treatment principles of diabetes. Ann. Palliat. Med. 2020 Jul;9(4):2237–2250. doi: 10.21037/apm-19-271. PMID: 32648463. [DOI] [PubMed] [Google Scholar]
7.Tang J.L., Liu B.Y., Ma K.W. Traditional Chinese medicine. Lancet. 2008 Dec 6;372(9654):1938–1940. doi: 10.1016/S0140-6736(08)61354-9. PMID: 18930523. [DOI] [PubMed] [Google Scholar]
8.Wang T., Dong JJJoTCMS What is “zheng” in traditional. Chin. Med. 2017;4(1):14–15. doi: 10.1016/j.jtcms.2017.08.005. [DOI] [Google Scholar]
9.World Health Organization . World Health Organization; Geneva: 2022. International Statistical Classification of Diseases and Related Health Problems (ICD)https://www.who.int/classifications/classification-of-diseases Available from: [Google Scholar]
10.Candong L. In: Diagnostics of Traditional Chinese Medicine. Candong L., editor. China traditional Chinese Medicine Press; Beijing, China: 2016. [Google Scholar]
11.Zhao C., Li G.Z., Wang C., Niu J. Advances in patient classification for traditional Chinese medicine: a machine learning perspective. Evid Based Complement Alternat Med. 2015;2015 doi: 10.1155/2015/376716. PMID: 26246834. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Handelman G.S., Kok H.K., Chandra R.V., Razavi A.H., Lee M.J., Asadi H. eDoctor: machine learning and the future of medicine. J. Intern. Med. 2018 Dec;284(6):603–619. doi: 10.1111/joim.12822. PMID: 30102808. [DOI] [PubMed] [Google Scholar]
13.Chen H., He Y. Machine learning approaches in traditional Chinese medicine: a systematic review. Am. J. Chin. Med. 2022;50(1):91–131. doi: 10.1142/S0192415X22500045. PMID: 34931589. [DOI] [PubMed] [Google Scholar]
14.Wang Z., Sun S., Poon J., Poon S., editors. IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2018. Cnn based multi-instance multi-task learning for syndrome differentiation of diabetic patients. 2018. [Google Scholar]
15.Guo J., Chen H., Song J., Wang J., Zhao L., Tong X. Syndrome differentiation of diabetes by the traditional Chinese medicine according to evidence-based medicine and expert consensus opinion. Evid Based Complement Alternat Med. 2014;2014 doi: 10.1155/2014/492193. PMID: 25132859. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Society C.D. Guideline for the prevention and treatment of type 2 diabetes mellitus in China (2020 edition) Int. J. Endocrinol. Metabol. 2021;41(5):482–548. doi: 10.3760/cma.j.cn121383-20210825-08063. [DOI] [Google Scholar]
17.China Association of Chinese Medicine . China Association of Chinese Medicine; Beijing, China: 2022. Notice on the Public Announcement of Diagnostic Criteria for Dampness-Heat Syndrome of Type 2 Diabetes Mellitus Issued by China Association of Traditional Chinese Medicine.http://www.cacm.org.cn/2022/04/19/17672/ Available from: [Google Scholar]
18.Choi R.Y., Coyner A.S., Kalpathy-Cramer J., Chiang M.F., Campbell J.P. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol. 2020 Feb 27;9(2):14. doi: 10.1167/tvst.9.2.14. PMID: 32704420. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kwekha-Rashid A.S., Abduljabbar H.N., Alhayani B. Coronavirus disease (COVID-19) cases analysis using machine-learning applications. Appl. Nanosci. 2021 May 21:1–13. doi: 10.1007/s13204-021-01868-7. PMID: 34036034. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput. Appl. 2020 2020/12/01;32(24):18069–18083. doi: 10.1007/s00521-019-04051-w. [DOI] [Google Scholar]
21.Vokinger K.N., Feuerriegel S., Kesselheim A.S. Mitigating bias in machine learning for medicine. Commun. Med. 2021:1–25. doi: 10.1038/s43856-021-00028-w. Aug 23. PMID: 34522916. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Erickson B.J., Korfiatis P., Akkus Z., Kline T.L. Machine learning for medical imaging. Radiographics. 2017 Mar-Apr;37(2):505–515. doi: 10.1148/rg.2017160130. PMID: 28212054. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Paul S.G., Saha A., Biswas A.A., Zulfiker M.S., Arefin M.S., Rahman M.M., et al. Combating Covid-19 using machine learning and deep learning: applications, challenges, and future perspectives. Array (N Y). 2023 Mar;17 doi: 10.1016/j.array.2022.100271. PMID: 36530931. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Quist J., Taylor L., Staaf J., Grigoriadis A. Random forest modelling of high-dimensional mixed-type data for breast cancer classification. Cancers. 2021 Feb 27;13(5) doi: 10.3390/cancers13050991. PMID: 33673506. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Davagdorj K., Pham V.H., Theera-Umpon N., Ryu K.H. XGBoost-based framework for smoking-induced noncommunicable disease prediction. Int. J. Environ. Res. Publ. Health. 2020 Sep 7;17(18) doi: 10.3390/ijerph17186513. PMID: 32906777. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Garg A., Mago V. Role of machine learning in medical research: a survey. Computer Science Review. 2021;40 doi: 10.1016/j.cosrev.2021.100370. 2021/05/01/ [DOI] [Google Scholar]
27.Dallora A.L., Eivazzadeh S., Mendes E., Berglund J., Anderberg P. Machine learning and microsimulation techniques on the prognosis of dementia: a systematic literature review. PLoS One. 2017;12(6) doi: 10.1371/journal.pone.0179804. PMID: 28662070. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Yu W., Liu T., Valdez R., Gwinn M., Khoury M.J. Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med. Inf. Decis. Making. 2010 Mar 22;10:16. doi: 10.1186/1472-6947-10-16. PMID: 20307319. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Dasarathy BVJICST . 1991. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. [DOI] [Google Scholar]
30.Cao B., Zhang K.C., Wei B., Chen L. Status quo and future prospects of artificial neural network from the perspective of gastroenterologists. World J. Gastroenterol. 2021 Jun 7;27(21):2681–2709. doi: 10.3748/wjg.v27.i21.2681. PMID: 34135549. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Shi M., Zhou C., editors. An Approach to Syndrome Differentiation in Traditional Chinese Medicine Based on Neural Network. Third International Conference on Natural Computation (ICNC 2007) IEEE; 2007. [Google Scholar]
32.Saritas M.M., Ajijois Yasar, ai engineering. Performance analysis of ANN and naive Bayes. classification algorithm for data classification. 2019;7(2):88–91. doi: 10.18201//ijisae.2019252786. [DOI] [Google Scholar]
33.Lundberg S.M., Erion G., Chen H., DeGrave A., Prutkin J.M., Nair B., et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020 2020/01/01;2(1):56–67. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Prabha A., Yadav J., Rani A., Singh V. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comput. Biol. Med. 2021 Sep;136 doi: 10.1016/j.compbiomed.2021.104664. PMID: 34329866. [DOI] [PubMed] [Google Scholar]
35.Yuan S., Wang N., Wang J.L., Pan J., Xue X.Y., Zhang Y.N., et al. Gender differences in damp-heat syndrome: a review. Biomed. Pharmacother. 2021 Nov;143 doi: 10.1016/j.biopha.2021.112128. PMID: 34492424. [DOI] [PubMed] [Google Scholar]
36.Wei J., Wu R., Zhao D. Analysis on traditional Chinese medicine syndrome elements and relevant factors for senile diabetes. J. Tradit. Chin. Med. 2013 Aug;33(4):473–478. doi: 10.1016/s0254-6272(13)60151-x. PMID: 24187868. [DOI] [PubMed] [Google Scholar]
37.Zhang G.D., Liu X.X., Liang J.L., Hu Q.M. The distribution pattern of traditional Chinese medicine syndromes in 549 patients with type 2 diabetes. Diabetes Metab Syndr Obes. 2021;14:2209–2216. doi: 10.2147/DMSO.S295351. PMID: 34040406. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Geng W., Qin X., Yang T., Cong Z., Wang Z., Kong Q., et al. Model-based reasoning of clinical diagnosis in integrative medicine: real-world methodological study of electronic medical records and natural language processing methods. JMIR Med Inform. 2020 Dec 21;8(12) doi: 10.2196/23082. PMID: 33346740. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Chen P., Zhang L., Zhang W., Sun C., Wu C., He Y., et al. Galectin-9-based immune risk score model helps to predict relapse in stage I-III small cell lung cancer. J Immunother Cancer. 2020 Oct;8(2) doi: 10.1136/jitc-2020-001391. PMID: 33082168. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Hou N., Li M., He L., Xie B., Wang L., Zhang R., et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J. Transl. Med. 2020 Dec 7;18(1):462. doi: 10.1186/s12967-020-02620-5. PMID: 33287854. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Bae H., Lee S., Lee C.Y., Kim C.E. A novel framework for understanding the pattern identification of traditional asian medicine from the machine learning perspective. Front. Med. 2021;8 doi: 10.3389/fmed.2021.763533. PMID: 35186965. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Zhang L., Wang Y., Niu M., Wang C., Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci. Rep. 2020 Mar 10;10(1):4406. doi: 10.1038/s41598-020-61123-x. PMID: 32157171. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Wei L., Cao Y., Zhang K., Xu Y., Zhou X., Meng J., et al. Prediction of progression to severe stroke in initially diagnosed anterior circulation ischemic cerebral infarction. Front. Neurol. 2021;12 doi: 10.3389/fneur.2021.652757. PMID: 34220671. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Hsu P.C., Wu H.K., Huang Y.C., Chang H.H., Lee T.C., Chen Y.P., et al. The tongue features associated with type 2 diabetes mellitus. Medicine (Baltim.) 2019 May;98(19) doi: 10.1097/MD.0000000000015567. PMID: 31083226. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Jiang B., Liang X., Chen Y., Ma T., Liu L., Li J., et al. Integrating next-generation sequencing and traditional tongue diagnosis to determine tongue coating microbiome. Sci. Rep. 2012;2:936. doi: 10.1038/srep00936. PMID: 23226834. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Xu W.-F., Liu G.-P., Yan J.-J., Wang Y.-Q., Lu X., Zhong T., editors. IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2015. Directed cyclic graph-based feature selection and modeling of the dampness syndrome of chronic gastritis. 2015. [Google Scholar]
47.Lao S., Zhou Z., Lin W., Chen G., Huang Z., Ouyang HJJoGUoTCM . 2004. Establishment of Diagnostic Criteria of Spleen-Stomach Damp-Heat Syndrome in Chronic Superficial Gastritis. [Google Scholar]
48.Hua Y.L., Ma Q., Zhang X.S., Jia Y.Q., Peng X.T., Yao W.L., et al. Pulsatilla decoction can treat the dampness-heat diarrhea rat model by regulating glycerinphospholipid metabolism based lipidomics approach. Front. Pharmacol. 2020;11:197. doi: 10.3389/fphar.2020.00197. PMID: 32194420. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.The L. Icd-11. Lancet. 2019 Jun 8;393(10188):2275. PMID: 31180012. doi: 10.1016/S0140-6736(19)31205-X. [DOI] [PubMed]
50.Fears R., Griffin G.E., Larhammar D., Ter Meulen V., van der Meer J.W.M. Globalization of Traditional Chinese Medicine: what are the issues for ensuring evidence-based diagnosis and therapy? J. Intern. Med. 2020 Feb;287(2):210–213. doi: 10.1111/joim.12989. PMID: 31697414. [DOI] [PubMed] [Google Scholar]
51.O'Brien K.A., Abbas E., Zhang J., Guo Z.X., Luo R., Bensoussan A., et al. Understanding the reliability of diagnostic variables in a Chinese Medicine examination. J. Alternative Compl. Med. 2009 Jul;15(7):727–734. doi: 10.1089/acm.2008.0554. PMID: 19552599. [DOI] [PubMed] [Google Scholar]
52.Wang Y., Shi X., Li L., Efferth T., Shang D. The impact of artificial intelligence on traditional Chinese medicine. Am. J. Chin. Med. 2021;49(6):1297–1314. doi: 10.1142/S0192415X21500622. PMID: 34247564. [DOI] [PubMed] [Google Scholar]
53.Xu Q., Tang W., Teng F., Peng W., Zhang Y., Li W., et al. Intelligent syndrome differentiation of traditional Chinese medicine by ANN. a case study of chronic obstructive pulmonary disease. 2019;7:76167–76175. doi: 10.1109/ACCESS.2019.2921318. [DOI] [Google Scholar]
54.Xia S., Zhang J., Du G., Li S., Vong C.T., Yang Z., et al. A microcosmic syndrome differentiation model for metabolic syndrome with multilabel learning. Evid Based Complement Alternat Med. 2020;2020 doi: 10.1155/2020/9081641. PMID: 33294001. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Zhang J., Xu J., Hu X., Chen Q., Tu L., Huang J., et al. 2017. Diagnostic Method of Diabetes Based on Support Vector Machine and Tongue Images. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Shi Y.L., Liu J.Y., Hu X.J., Tu L.P., Cui J., Li J., et al. A new method for syndrome classification of non-small-cell lung cancer based on data of tongue and pulse with machine learning. BioMed Res. Int. 2021;2021 doi: 10.1155/2021/1337558. PMID: 34423031. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Mainenti D.C., editor. IEEE International Conference on Big Data (Big Data) IEEE; 2019. Big data and traditional Chinese medicine (TCM): what's state of the art? 2019. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.

[bib1] 1.International Diabetes Federation . tenth ed. Brussels, Belgium; 2021. IDF Diabetes Atlas.https://www.diabetesatlas.org Available at: [Google Scholar]

[bib2] 2.Pranata R., Henrina J., Raffaello W.M., Lawrensia S., Huang I. Diabetes and COVID-19: the past, the present, and the future. Metabolism. 2021 Aug;121 doi: 10.1016/j.metabol.2021.154814. PMID: 34119537. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Sardu C., Gargiulo G., Esposito G., Paolisso G., Marfella R. Impact of diabetes mellitus on clinical outcomes in patients affected by Covid-19. Cardiovasc. Diabetol. 2020 Jun 11;19(1):76. doi: 10.1186/s12933-020-01047-y. PMID: 32527257. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Li Y., Teng D., Shi X., Qin G., Qin Y., Quan H., et al. Prevalence of diabetes recorded in mainland China using 2018 diagnostic criteria from the American Diabetes Association. national cross sectional study. BMJ. 2020:369. doi: 10.1136/bmj.m997. Apr 28. m997. PMID: 32345662. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.World Health Organization . World Health Organization; Geneva: 2022. WHO Expert Meeting on Evaluation of Traditional Chinese Medicine in the Treatment of COVID-19.https://www.who.int/publications/m/item/who-expert-meeting-on-evaluation-of-traditional-chinese-medicine-in-the-treatment-of-covid-19 Available from: [Google Scholar]

[bib6] 6.Lian F., Ni Q., Shen Y., Yang S., Piao C., Wang J., et al. International traditional Chinese medicine guideline for diagnostic and treatment principles of diabetes. Ann. Palliat. Med. 2020 Jul;9(4):2237–2250. doi: 10.21037/apm-19-271. PMID: 32648463. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Tang J.L., Liu B.Y., Ma K.W. Traditional Chinese medicine. Lancet. 2008 Dec 6;372(9654):1938–1940. doi: 10.1016/S0140-6736(08)61354-9. PMID: 18930523. [DOI] [PubMed] [Google Scholar]

[bib8] 8.Wang T., Dong JJJoTCMS What is “zheng” in traditional. Chin. Med. 2017;4(1):14–15. doi: 10.1016/j.jtcms.2017.08.005. [DOI] [Google Scholar]

[bib9] 9.World Health Organization . World Health Organization; Geneva: 2022. International Statistical Classification of Diseases and Related Health Problems (ICD)https://www.who.int/classifications/classification-of-diseases Available from: [Google Scholar]

[bib10] 10.Candong L. In: Diagnostics of Traditional Chinese Medicine. Candong L., editor. China traditional Chinese Medicine Press; Beijing, China: 2016. [Google Scholar]

[bib11] 11.Zhao C., Li G.Z., Wang C., Niu J. Advances in patient classification for traditional Chinese medicine: a machine learning perspective. Evid Based Complement Alternat Med. 2015;2015 doi: 10.1155/2015/376716. PMID: 26246834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Handelman G.S., Kok H.K., Chandra R.V., Razavi A.H., Lee M.J., Asadi H. eDoctor: machine learning and the future of medicine. J. Intern. Med. 2018 Dec;284(6):603–619. doi: 10.1111/joim.12822. PMID: 30102808. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Chen H., He Y. Machine learning approaches in traditional Chinese medicine: a systematic review. Am. J. Chin. Med. 2022;50(1):91–131. doi: 10.1142/S0192415X22500045. PMID: 34931589. [DOI] [PubMed] [Google Scholar]

[bib14] 14.Wang Z., Sun S., Poon J., Poon S., editors. IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2018. Cnn based multi-instance multi-task learning for syndrome differentiation of diabetic patients. 2018. [Google Scholar]

[bib15] 15.Guo J., Chen H., Song J., Wang J., Zhao L., Tong X. Syndrome differentiation of diabetes by the traditional Chinese medicine according to evidence-based medicine and expert consensus opinion. Evid Based Complement Alternat Med. 2014;2014 doi: 10.1155/2014/492193. PMID: 25132859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Society C.D. Guideline for the prevention and treatment of type 2 diabetes mellitus in China (2020 edition) Int. J. Endocrinol. Metabol. 2021;41(5):482–548. doi: 10.3760/cma.j.cn121383-20210825-08063. [DOI] [Google Scholar]

[bib17] 17.China Association of Chinese Medicine . China Association of Chinese Medicine; Beijing, China: 2022. Notice on the Public Announcement of Diagnostic Criteria for Dampness-Heat Syndrome of Type 2 Diabetes Mellitus Issued by China Association of Traditional Chinese Medicine.http://www.cacm.org.cn/2022/04/19/17672/ Available from: [Google Scholar]

[bib18] 18.Choi R.Y., Coyner A.S., Kalpathy-Cramer J., Chiang M.F., Campbell J.P. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol. 2020 Feb 27;9(2):14. doi: 10.1167/tvst.9.2.14. PMID: 32704420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Kwekha-Rashid A.S., Abduljabbar H.N., Alhayani B. Coronavirus disease (COVID-19) cases analysis using machine-learning applications. Appl. Nanosci. 2021 May 21:1–13. doi: 10.1007/s13204-021-01868-7. PMID: 34036034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput. Appl. 2020 2020/12/01;32(24):18069–18083. doi: 10.1007/s00521-019-04051-w. [DOI] [Google Scholar]

[bib21] 21.Vokinger K.N., Feuerriegel S., Kesselheim A.S. Mitigating bias in machine learning for medicine. Commun. Med. 2021:1–25. doi: 10.1038/s43856-021-00028-w. Aug 23. PMID: 34522916. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Erickson B.J., Korfiatis P., Akkus Z., Kline T.L. Machine learning for medical imaging. Radiographics. 2017 Mar-Apr;37(2):505–515. doi: 10.1148/rg.2017160130. PMID: 28212054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Paul S.G., Saha A., Biswas A.A., Zulfiker M.S., Arefin M.S., Rahman M.M., et al. Combating Covid-19 using machine learning and deep learning: applications, challenges, and future perspectives. Array (N Y). 2023 Mar;17 doi: 10.1016/j.array.2022.100271. PMID: 36530931. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Quist J., Taylor L., Staaf J., Grigoriadis A. Random forest modelling of high-dimensional mixed-type data for breast cancer classification. Cancers. 2021 Feb 27;13(5) doi: 10.3390/cancers13050991. PMID: 33673506. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Davagdorj K., Pham V.H., Theera-Umpon N., Ryu K.H. XGBoost-based framework for smoking-induced noncommunicable disease prediction. Int. J. Environ. Res. Publ. Health. 2020 Sep 7;17(18) doi: 10.3390/ijerph17186513. PMID: 32906777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Garg A., Mago V. Role of machine learning in medical research: a survey. Computer Science Review. 2021;40 doi: 10.1016/j.cosrev.2021.100370. 2021/05/01/ [DOI] [Google Scholar]

[bib27] 27.Dallora A.L., Eivazzadeh S., Mendes E., Berglund J., Anderberg P. Machine learning and microsimulation techniques on the prognosis of dementia: a systematic literature review. PLoS One. 2017;12(6) doi: 10.1371/journal.pone.0179804. PMID: 28662070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Yu W., Liu T., Valdez R., Gwinn M., Khoury M.J. Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med. Inf. Decis. Making. 2010 Mar 22;10:16. doi: 10.1186/1472-6947-10-16. PMID: 20307319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Dasarathy BVJICST . 1991. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. [DOI] [Google Scholar]

[bib30] 30.Cao B., Zhang K.C., Wei B., Chen L. Status quo and future prospects of artificial neural network from the perspective of gastroenterologists. World J. Gastroenterol. 2021 Jun 7;27(21):2681–2709. doi: 10.3748/wjg.v27.i21.2681. PMID: 34135549. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Shi M., Zhou C., editors. An Approach to Syndrome Differentiation in Traditional Chinese Medicine Based on Neural Network. Third International Conference on Natural Computation (ICNC 2007) IEEE; 2007. [Google Scholar]

[bib32] 32.Saritas M.M., Ajijois Yasar, ai engineering. Performance analysis of ANN and naive Bayes. classification algorithm for data classification. 2019;7(2):88–91. doi: 10.18201//ijisae.2019252786. [DOI] [Google Scholar]

[bib33] 33.Lundberg S.M., Erion G., Chen H., DeGrave A., Prutkin J.M., Nair B., et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020 2020/01/01;2(1):56–67. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Prabha A., Yadav J., Rani A., Singh V. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comput. Biol. Med. 2021 Sep;136 doi: 10.1016/j.compbiomed.2021.104664. PMID: 34329866. [DOI] [PubMed] [Google Scholar]

[bib35] 35.Yuan S., Wang N., Wang J.L., Pan J., Xue X.Y., Zhang Y.N., et al. Gender differences in damp-heat syndrome: a review. Biomed. Pharmacother. 2021 Nov;143 doi: 10.1016/j.biopha.2021.112128. PMID: 34492424. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Wei J., Wu R., Zhao D. Analysis on traditional Chinese medicine syndrome elements and relevant factors for senile diabetes. J. Tradit. Chin. Med. 2013 Aug;33(4):473–478. doi: 10.1016/s0254-6272(13)60151-x. PMID: 24187868. [DOI] [PubMed] [Google Scholar]

[bib37] 37.Zhang G.D., Liu X.X., Liang J.L., Hu Q.M. The distribution pattern of traditional Chinese medicine syndromes in 549 patients with type 2 diabetes. Diabetes Metab Syndr Obes. 2021;14:2209–2216. doi: 10.2147/DMSO.S295351. PMID: 34040406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Geng W., Qin X., Yang T., Cong Z., Wang Z., Kong Q., et al. Model-based reasoning of clinical diagnosis in integrative medicine: real-world methodological study of electronic medical records and natural language processing methods. JMIR Med Inform. 2020 Dec 21;8(12) doi: 10.2196/23082. PMID: 33346740. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Chen P., Zhang L., Zhang W., Sun C., Wu C., He Y., et al. Galectin-9-based immune risk score model helps to predict relapse in stage I-III small cell lung cancer. J Immunother Cancer. 2020 Oct;8(2) doi: 10.1136/jitc-2020-001391. PMID: 33082168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Hou N., Li M., He L., Xie B., Wang L., Zhang R., et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J. Transl. Med. 2020 Dec 7;18(1):462. doi: 10.1186/s12967-020-02620-5. PMID: 33287854. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 41.Bae H., Lee S., Lee C.Y., Kim C.E. A novel framework for understanding the pattern identification of traditional asian medicine from the machine learning perspective. Front. Med. 2021;8 doi: 10.3389/fmed.2021.763533. PMID: 35186965. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42.Zhang L., Wang Y., Niu M., Wang C., Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci. Rep. 2020 Mar 10;10(1):4406. doi: 10.1038/s41598-020-61123-x. PMID: 32157171. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] 43.Wei L., Cao Y., Zhang K., Xu Y., Zhou X., Meng J., et al. Prediction of progression to severe stroke in initially diagnosed anterior circulation ischemic cerebral infarction. Front. Neurol. 2021;12 doi: 10.3389/fneur.2021.652757. PMID: 34220671. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] 44.Hsu P.C., Wu H.K., Huang Y.C., Chang H.H., Lee T.C., Chen Y.P., et al. The tongue features associated with type 2 diabetes mellitus. Medicine (Baltim.) 2019 May;98(19) doi: 10.1097/MD.0000000000015567. PMID: 31083226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Jiang B., Liang X., Chen Y., Ma T., Liu L., Li J., et al. Integrating next-generation sequencing and traditional tongue diagnosis to determine tongue coating microbiome. Sci. Rep. 2012;2:936. doi: 10.1038/srep00936. PMID: 23226834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46.Xu W.-F., Liu G.-P., Yan J.-J., Wang Y.-Q., Lu X., Zhong T., editors. IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2015. Directed cyclic graph-based feature selection and modeling of the dampness syndrome of chronic gastritis. 2015. [Google Scholar]

[bib47] 47.Lao S., Zhou Z., Lin W., Chen G., Huang Z., Ouyang HJJoGUoTCM . 2004. Establishment of Diagnostic Criteria of Spleen-Stomach Damp-Heat Syndrome in Chronic Superficial Gastritis. [Google Scholar]

[bib48] 48.Hua Y.L., Ma Q., Zhang X.S., Jia Y.Q., Peng X.T., Yao W.L., et al. Pulsatilla decoction can treat the dampness-heat diarrhea rat model by regulating glycerinphospholipid metabolism based lipidomics approach. Front. Pharmacol. 2020;11:197. doi: 10.3389/fphar.2020.00197. PMID: 32194420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] 49.The L. Icd-11. Lancet. 2019 Jun 8;393(10188):2275. PMID: 31180012. doi: 10.1016/S0140-6736(19)31205-X. [DOI] [PubMed]

[bib50] 50.Fears R., Griffin G.E., Larhammar D., Ter Meulen V., van der Meer J.W.M. Globalization of Traditional Chinese Medicine: what are the issues for ensuring evidence-based diagnosis and therapy? J. Intern. Med. 2020 Feb;287(2):210–213. doi: 10.1111/joim.12989. PMID: 31697414. [DOI] [PubMed] [Google Scholar]

[bib51] 51.O'Brien K.A., Abbas E., Zhang J., Guo Z.X., Luo R., Bensoussan A., et al. Understanding the reliability of diagnostic variables in a Chinese Medicine examination. J. Alternative Compl. Med. 2009 Jul;15(7):727–734. doi: 10.1089/acm.2008.0554. PMID: 19552599. [DOI] [PubMed] [Google Scholar]

[bib52] 52.Wang Y., Shi X., Li L., Efferth T., Shang D. The impact of artificial intelligence on traditional Chinese medicine. Am. J. Chin. Med. 2021;49(6):1297–1314. doi: 10.1142/S0192415X21500622. PMID: 34247564. [DOI] [PubMed] [Google Scholar]

[bib53] 53.Xu Q., Tang W., Teng F., Peng W., Zhang Y., Li W., et al. Intelligent syndrome differentiation of traditional Chinese medicine by ANN. a case study of chronic obstructive pulmonary disease. 2019;7:76167–76175. doi: 10.1109/ACCESS.2019.2921318. [DOI] [Google Scholar]

[bib54] 54.Xia S., Zhang J., Du G., Li S., Vong C.T., Yang Z., et al. A microcosmic syndrome differentiation model for metabolic syndrome with multilabel learning. Evid Based Complement Alternat Med. 2020;2020 doi: 10.1155/2020/9081641. PMID: 33294001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] 55.Zhang J., Xu J., Hu X., Chen Q., Tu L., Huang J., et al. 2017. Diagnostic Method of Diabetes Based on Support Vector Machine and Tongue Images. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] 56.Shi Y.L., Liu J.Y., Hu X.J., Tu L.P., Cui J., Li J., et al. A new method for syndrome classification of non-small-cell lung cancer based on data of tongue and pulse with machine learning. BioMed Res. Int. 2021;2021 doi: 10.1155/2021/1337558. PMID: 34423031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] 57.Mainenti D.C., editor. IEEE International Conference on Big Data (Big Data) IEEE; 2019. Big data and traditional Chinese medicine (TCM): what's state of the art? 2019. [Google Scholar]

PERMALINK

Application of machine learning in Chinese medicine differentiation of dampness-heat pattern in patients with type 2 diabetes mellitus

Xinyu Liu

Xiaoqiang Huang

Jindong Zhao

Yanjin Su

Lu Shen

Yuhong Duan

Jing Gong

Zhihai Zhang

Shenghua Piao

Qing Zhu

Xianglu Rong

Jiao Guo

Abstract

Background

Objective

Methods

Results

Conclusion

1. Introduction

2. Data and methods

2.1. Study design and population

Fig. 1.

2.2. Patient questionnaire

2.3. Model application

2.4. Statistical analysis

3. Results

3.1. Patient characteristics

Table 1.

3.2. Model building and evaluation

Table 2.

Fig. 2a.

3.3. Model performance interpretation

Fig. 3.

3.4. Explanation of variables

Fig. 4a.

4. Discussion

5. Conclusion

Declarations

Author contribution statement

Funding statement

Data availability statement

Additional information

Declaration of competing interest

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases