Abstract
Background
Research on intelligent tongue diagnosis is a main direction in the modernization of tongue diagnosis technology. Identification of tongue shape and texture features is a difficult task for tongue diagnosis in traditional Chinese medicine (TCM). This study aimed to explore the application of deep learning techniques in tongue image analyses.
Methods
A total of 8676 tongue images were annotated by clinical experts, into seven categories, including the fissured tongue, tooth-marked tongue, stasis tongue, spotted tongue, greasy coating, peeled coating, and rotten coating. Based on the labeled tongue images, the deep learning model faster region-based convolutional neural networks (Faster R-CNN) was utilized to classify tongue images. Four performance indices, i.e., accuracy, recall, precision, and F1-score, were selected to evaluate the model. Also, we applied it to analyze tongue image features of 3601 medical checkup participants in order to explore gender and age factors and the correlations among tongue features in diseases through complex networks.
Results
The average accuracy, recall, precision, and F1-score of our model achieved 90.67%, 91.25%, 99.28%, and 95.00%, respectively. Over the tongue images from the medical checkup population, the model Faster R-CNN detected 41.49% fissured tongue images, 37.16% tooth-marked tongue images, 29.66% greasy coating images, 18.66% spotted tongue images, 9.97% stasis tongue images, 3.97% peeled coating images, and 1.22% rotten coating images. There were significant differences in the incidence of the fissured tongue, tooth-marked tongue, spotted tongue, and greasy coating among age and gender. Complex networks revealed that fissured tongue and tooth-marked were closely related to hypertension, dyslipidemia, overweight and nonalcoholic fatty liver disease (NAFLD), and a greasy coating tongue was associated with hypertension and overweight.
Conclusion
The model Faster R-CNN shows good performance in the tongue image classification. And we have preliminarily revealed the relationship between tongue features and gender, age, and metabolic diseases in a medical checkup population.
1. Introduction
Tongue inspection is the most common, intuitive, and effective diagnostic method of traditional Chinese medicine (TCM) [1]. Recent TCM researches have realized measurable and digitized color features of tongue images by means of color space parameters such as RGB, Lab, and HIS [2–4].
However, the quantification of the shape and texture of tongue images remains a difficult point in tongue diagnosis. Much attention has focused on the automatic recognition methods of the shape and texture of tongue images. Obafemi-Ajayi et al. [5] have proposed a feature extraction method for automated tongue image shape classification based on geometric features and polynomial equations. Yang et al. [6] extracted the cracks by applying the G component of the false-color image in RGB color space, and the accuracy of detection was 82.00%. Douglas–Peucker algorithm was implemented to extract the number of features for tooth-marked tongue and achieved an accuracy of 80% [7]. Xu et al. [8] conducted an RGB color range and a gray mean value of acantha and ecchymosis in tongue patterns, and the overall accuracy of recognition was 77.10%. Wang et al. [9] realized the prickles extraction on the green tongue image, with an accuracy of 88.47%. Yet, due to the complex and diverse tongue features, classical image processing methods have some problems, such as space-time consumptive algorithm, difficulties in automated high-throughput processing, and weak migration ability in correlation research [10–12], which make the comprehensive analysis of tongue images unavailable.
Intelligent diagnosis based on images is a main direction of modernization of tongue diagnosis technology [13]. As the current mainstream technology, a convolutional neural network (CNN) has a powerful capability of feature extraction and representation [14, 15], which greatly improves the accuracy and efficiency of tongue image segmentation, and classification [16–20]. For example, Chen's team has utilized the deep residual neural network (ResNet) to identify the tooth-marked tongue, with an accuracy of over 90% [21]. Xu et al. [22] have proposed a CNN model combining a u-shaped net (U-NET) and Discriminative Filter Learning (DFL) for classification and recognition of different types of tongue coating, achieving an F1-Score of 93%. The research on the recognition and classification of the tooth-marked tongue [23] and cracked tongue [24] has significantly improved the accuracy of tongue image identification.
However, tongue images have multi-label attributes (Figure 1(a)). Although the classical model CNN shows better recognition performance for single tongue features such as tooth marks or fissures (Figure 1(b)), the multi-CNN fusion model has no apparent superiority in the multi-label classification of tongue images with diverse features (Figure 1(c)). Under nonparallel conditions, multiple CNN models require huge space and time. The classical CNN model fails to accurately identify, locate and quantify complex and diverse fine-grained features of tongue images simultaneously, and it is difficult to achieve efficient detection and recognition of tongue images in parallel.
Figure 1.

Deep learning methods for tongue diagnosis analyses. (a) Fissured tongue image, tooth-marked tongue image, and tongue image with fissures and tooth marks; (b) CNN method of single-object detection; (c) CNN method of multi-label detection in tongue images.
Object detection technology is considered as a method to find a specific object in an image and determine the specific position of the object. As one of the mainstream neural networks for object detection, faster region-based convolutional neural networks (Faster R-CNN) [25] can perform multi-label recognition with only one model, thus reducing the cost of training multiple models. Here, we utilized Faster R-CNN and fine-tune method to extract local features of tongue images, learning the high-level semantic features. Aiming at 7 categories of tongue shape and texture in TCM, ResNet [26] was used as the backbone network for feature extraction to construct a deep learning model.
In this research, we constructed a standard database for training, testing, and validation realized the efficient and accurate classification and recognition of local features of tongue images and applied it to a population undergoing medical checkups with Chinese medicine, in order to reveal the association of tongue image features with diseases.
2. Materials and Methods
We proposed a deep learning multi-label tongue image model based on Faster R-CNN. A total of 8676 tongue images were collected to train and test the proposed model. The collected tongue images annotated by experts were divided into seven categories. Furthermore, this approach was applied to a population undergoing medical checkups with Chinese medicine. The specific process of this study is shown in Figure 2.
Figure 2.

The workflow of the entire research. (a) Demonstration of the acquisition process of tongue image; (b) example samples of tongue image differentiation dataset calibrated by experts; (c) interest regions marked manually by TCM practitioners using the LabelImg software; (d) Faster R-CNN model trained by tongue images and object location of training set; (e) study on tongue image features of 3601 people undergoing medical checkup based on Faster R-CNN model.
2.1. Tongue Image Collection and Preprocess
As shown in Figure 3, all the tongue images were acquired by using TFDA-1 and TDA-1 tongue diagnosis instruments designed by Xu's team at Shanghai University of TCM. The instruments were equipped with unified CCD equipment, a standard D50 light source, a color temperature of 5003K, and a color rendering index of 97 [27]. Tongue images were obtained from September 2017 to December 2018 at Shuguang Hospital. The raw tongue image size was 5568 × 3711 pixels in JPG format. To reduce the amount of deep learning calculation and eliminate the interference of other regions except the tongue body, all tongue images were automatically cut to the size of 400 × 400 pixels by mask R-CNN.
Figure 3.

TFDA-1 and TDA-1 tongue diagnosis instrument. (a) Side view of TFDA-1; (b) front view of TFDA-1; (c) side view of TDA-1; (d) front view of TDA-1.
2.2. Tongue Image Labeling and Datasets Construction
All tongue image labels were evaluated and screened by 10 TCM experts with normal vision and reported normal color vision [28]. To avoid the chromatic differences from the monitor, experts interpreted and screened under uniform conditions with 27 inches APPLE Cinema HD monitor. With reference to the diagnostic criteria of tongue image features [1, 29], tongue images were divided into seven categories. At least 8 out of 10 experts confirmed that the dataset contained the same labels, and all 8676 tongue images were annotated by two experts as seven different folders. Example samples of each typical tongue image were shown in Figure 2(b), and the other eight experts respectively checked the labeled folders. The images with the inconsistent diagnosis were excluded from this research.
The datasets for Faster R-CNN were with the MS COCO format, which was the most popular standard format in the field of object detection [30]. We used LabelImg (Version 1.8.1) to annotate the interest regions of shape and texture on the tongue image. The annotation was confirmed by experts and the process interface is shown in Figure 2(c). Then, the generated “XML” annotation files were transferred into the “JSON” format file using Python (Version 3.6).
2.3. Dataset Partition
The constructed datasets were randomly partitioned into 80% training set, 10% validation set, and 10% testing set. The number of the training datasets and labels in 7 categories used to build the Faster R-CNN model were shown in Table 1. In addition, the number of tongue images in each category included in the testing set was equal to the validation set.
Table 1.
The training set.
| Tongue image categories | Training datasets | Labels |
|---|---|---|
| Fissured tongue | 1570 | 1792 |
| Tooth-marked tongue | 1386 | 2589 |
| Spotted tongue | 746 | 920 |
| Stasis tongue | 1107 | 1942 |
| Greasy coating | 1559 | 1652 |
| Peeled coating | 478 | 639 |
| Rotten coating | 96 | 132 |
| Total | 6942 | 9666 |
Notes: one tongue image can contain multiple labels.
2.4. Faster R-CNN Model Development for Recognizing Tongue Shape and Texture
Figure 4(a) shows the network architecture diagram of Faster R-CNN, mainly consisting of four parts: convolution layer, regional proposal network, the region of interest (ROI) pooling layer, and a layer of classifier and regressor [31]. The backbone convolution layer ResNet101, as shown in Figure 4(b), extracts feature maps from the input tongue images; the region proposal network (RPN) centers on each pixel of the feature maps, and generated anchor boxes with different scales in the tongue images by using nonmaximum suppression; the ROI pooling computes feature maps for region proposals; the final output feature maps of the ROI pooling layer are performed for classification. Finally, an average pooling is applied, and the features obtained from the pooling are used for classification and bounding box regression, respectively.
Figure 4.

Faster R-CNN Model development for recognizing tongue shape and texture. (a) The workflow of Faster R-CNN; (b) the architecture of backbone feature extraction network ResNet101; (c) the accuracy and loss of model training.
2.5. Model Training, Validation, and Testing
The Faster R-CNN model based on the Caffe framework was deployed in the Ubuntu operating system by using open-source code and was trained in a computing environment with 4 NVIDIA GTX 1080Ti GPUs, 12 Intel Core I7-6850K CPUs, and 128 GB DDR4 RAM.
The model training, validation, and testing were conducted according to the following steps: First, the Faster R-CNN network was fine-tuned on a tongue image dataset for 40000 iterations with an optimizer of stochastic gradient descent (SGD), the learning rate of 0.03, weight decay of 0.0001, the momentum of 0.9, gamma value of 0.1, and batch size of 128. Detailed initial parameters were shown in Table 2.
Table 2.
Initial parameters of faster R-CNN model for training.
| Parameters | Values |
|---|---|
| Base learning rate | 0.03 |
| Weight decay | 0.0001 |
| Momentum | 0.9 |
| Gamma value | 0.1 |
| Steps | (0, 13333, 26666) |
| Max iteration | 40000 |
| Scales | 400, 500 |
| Batch size | 128 |
Then, the tongue image and the marked position information were fed into an integrated Faster R-CNN model for training. In each training iteration, features were extracted, labels and frame position were predicted, and losses (i.e., errors) between predicted frame position, predicted labels, and object actual position and object actual label were calculated. The parameters were updated according to the backward error propagation. Complete the training and generate the object detection model of tongue image of TCM. At the end of the training, a well-trained object detection model for tongue images was achieved. Collect and observe results using a well-trained model with different hyper-parameters over the validation set. The operation of validation was performed during the training process. Based on the results over the validation set, the state of the model was checked, and the hyperparameters of the model were adjusted. When the results of accuracy in the validation do not increase, the training is stopped. The loss function for Faster R-CNN sums the classification loss and regression loss, as defined in the following equation [24]:
| (1) |
where Ncls and Nreg are the number of anchors in minibatch and number of anchor locations, λ and i mean the selected anchor box index and the balancing parameter; pi and pi∗ represent the predicted probability and the ground truth of tongue feature; ti and ti∗ represent the predicted bounding box and actual tongue feature label box. The accuracy results and the loss changes in the training are depicted in Figure 4(b).
Finally, by adjusting the initial learning rate and comprehensive comparison, the training model with a learning rate of 0.001 and iteration of 40000 was finally selected as the final object detection model. Then the trained model was applied to the testing set.
2.6. Strategies for the Prevention of Overfitting
In this study, the two means of regularization and dropout were deployed to prevent overfitting. In the process of the training model, the regularization of L2 was leveraged to constrain the weight estimates, so as to help in preventing overfitting [32]. In addition, the technique of dropout was applied for training the last several classification layers in the neural network of Faster R-CNN. By means of the dropout, convolution kernels were randomly deactivated in the training process [33]. Furthermore, the importance of convolution kernels in the classification layers was dynamically balanced. Also, the overfitting phenomenon could be alleviated.
2.7. Indices for Model Evaluation
Based on the classification results of the model Faster R-CNN over the testing set, the five indices, including accuracy ((2)), recall ((3)), precision ((4)), and F1-score ((5)), were selected as metrics to evaluate the performance of Faster R-CNN in the multiclass classification of tongue images [34–37]. True positive (TP) means that the expert's conclusion and the result of object detection are the same, and false negative (FN) represents that the existing tongue feature category is not detected. False positive (FP) means if the tongue feature detection algorithm classifies those that are not in this category. True negative (TN) denotes that if the tongue image does not belong to a certain category, the tongue feature detection algorithm is the same as the expert conclusion. Macro averaged measures for the above indices are calculated for the model Faster R-CNN with respect to the 7-classes classification of tongue images.
| (2) |
| (3) |
| (4) |
| (5) |
2.8. Application of Tongue Image Detection Model
The Faster R-CNN model obtained from the above training was applied to the population undergoing routine medical checkups with Chinese medicine to explore the association between tongue features and diseases. All samples were collected from January 2019 to December 2019. A total of 3601 subjects were included in the physical examination center of Eastern Hospital of Shuguang Hospital affiliated with Shanghai University of TCM. We excluded women who were pregnant or nursing; those who cannot cooperate with researchers. All volunteers signed informed consent, all subjects completed routine medical checkups and simultaneously used the TFDA-1 tongue diagnosis instrument to capture tongue images.
All tongue images were analyzed by a trained Faster R-CNN model. All analysis and test results were verified by experts for the second time, and the analysis results were unanimously confirmed. If they were inconsistent, comprehensive analysis results should be adopted. The indicators of shape and texture features of tongue images were classified into two categories. Doctors at the physical examination center of Shuguang Hospital affiliated with Shanghai University of TCM made a diagnosis with reference to the corresponding clinical guidelines for diseases, aiming at the common and multiple diseases in the medical checkup population.
2.9. Statistical Analysis Methods
Excel and Python3.6 were used for data matching, merging, and sorting. The tongue images were described by percentage (%) and were compared using the Pearson χ2 test. Statistical analysis was performed using the IBM SPSS Statistics for Windows, version 25 (IBM Corp., Armonk, N.Y., USA). All results were compared using the two-tailedt test and differences were considered statistically significant when P < 0.05.
A complex network by the improved node contraction method [38–40] is a weighted network that contains the degree and weight of edges based on the obtained node importance. The weight of the weighted network was defined as
| (6) |
The visualization tool Python NetworkX [41] was used to store the constructed network in the form of the adjacency matrix and triple, and the complex network diagram was built with disease and tongue image features as nodes.
3. Results
3.1. Tongue Image Detection Over Testing Set
In our testing set, the average accuracy of the model achieved 90.67%, with a precision of 99.28%, recall of 91.27%, and F1 score of 95.00%, indicating that the model had a good detection effect and can accomplish the multiobject recognition task well, as shown in Table 3.
Table 3.
Tongue images object detection results based on Faster R-CNN.
| Tongue feature | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) |
|---|---|---|---|---|
| Fissured | 99.49 | 99.49 | 99.49 | 98.97 |
| Tooth-marked | 100.00 | 98.84 | 99.42 | 98.84 |
| Stasis | 99.22 | 93.43 | 96.23 | 92.75 |
| Spot | 98.73 | 84.78 | 91.22 | 83.87 |
| Greasy | 99.44 | 90.72 | 94.88 | 90.26 |
| Peel | 98.11 | 88.14 | 92.86 | 86.67 |
| Rot | 100.00 | 83.33 | 90.91 | 83.33 |
| Average | 99.28 | 91.25 | 95.00 | 90.67 |
Our method detected tongue shape and texture features with different scales and ratios. Figure 5(a) shows a normal tongue image (without tooth marks, fissures, stasis, spots, greasy coating, peeled coating, and rotten coating), and there was no mark in the test result; Figures 5(b)–5(d)) show tongue images with a single feature, (b) is a fissured tongue, (c) is a greasy coating (there are 2 spots), and (d) is a stasis tongue (there were 3 spots). Figures 5(e)–5(h)) show a combination of two different tongue images, in which (e) is a tooth-marked tongue with fissures, (f) is a peeled tongue with fissures, (g) is a greasy tongue with fissures, and (h) is a greasy tongue with tooth marks. Figures 5(i)–5(l)) show three or more different tongue-shaped features, in which (i) shows a greasy coating, tooth marks and stasis were detected simultaneously, (j) shows a greasy coating, tooth marks and spots were detected simultaneously, (k) shows a peeled coating, fissures and stasis were detected simultaneously, and (l) shows a greasy coating, tooth marks, fissures, and stasis were detected simultaneously.
Figure 5.

Examples of tongue image feature detection.
3.2. Distribution of Tongue Image Features in the Medical Checkup Population
The tongue images were input into the established optimal Faster R-CNN intelligent tongue diagnosis analysis model, and 1494 cases (41.49%) of the fissured tongue, 1338 cases (37.16%) of the tooth-marked tongue, 1068 cases (29.66%) of greasy coating, 672 cases (18.66%) of the spotted tongue, 359 cases (9.97%) of stasis tongue, 143 cases (3.97%) of peeled coating, and 44 cases (1.22%) of rotten coating, as shown in Figure 6.
Figure 6.

Distribution of different tongue shape and texture features.
3.3. Statistics of Tongue Image Features on the Gender Factors and the Age Factors
It showed that the proportion of fissured tongue, tooth-marked tongue, and greasy coating in the male group was higher than that in the female group (P < 0.001), whereas the proportion of spotted tongue and stasis tongue in females was significantly higher than that in males (P < 0.001). There was no significant difference between the two groups in the proportion of peeled and greasy coating, as shown in Table 4, Figures 7(a) and 7(b).
Table 4.
Comparison of tongue image features between different genders.
| Male (n = 2006) | Female (n = 1595) | χ 2 | P | |
|---|---|---|---|---|
| Fissure (yes) | 978 (48.8%) | 516 (32.4%) | 98.475 | <0.001 |
| Tooth (yes) | 794 (39.6%) | 544 (34.1%) | 11.405 | <0.001 |
| Spot (yes) | 336 (16.7%) | 336 (21.1%) | 10.904 | 0.001 |
| Stasis (yes) | 92 (4.6%) | 267 (16.7%) | 146.223 | <0.001 |
| Greasy (yes) | 569 (53.3%) | 499 (46.7%) | 3.632 | 0.057 |
| Peel (yes) | 84 (4.2%) | 59 (3.7%) | 0.556 | 0.456 |
| Rot (yes) | 37 (1.8%) | 7 (0.4%) | 14.544 | <0.001 |
Figure 7.

Comparison of tongue shape and texture features of different age ranges and genders.
The results from the above table illustrated that there were significant differences in the incidence of the fissured tongue, tooth-marked tongue, spotted tongue, greasy coating, and rotten coating among the four age gradients, but there was no significant difference in the incidence of stasis tongue and peeled coating. Overall, with the increase of age, the incidence of fissured tongue and greasy coating increased gradually, while the incidence of spotted tongue and tooth-marked tongue decreased gradually, as shown in Table 5, Figures 7(c)–7(d).
Table 5.
Comparison of tongue image features among different age ranges.
| <30 years (n = 848) | 30–39 years (n = 1418) | 40–49 years (n = 826) | ≥50 years (n = 509) | |
|---|---|---|---|---|
| Fissure (yes) | 299 (35.3%) | 510 (36.0%) | 382 (46.2%)∗# | 303 (59.5%)∗#▲ |
| Tooth (yes) | 321 (37.9%) | 580 (40.9%) | 302 (36.6%) | 135 (26.5%)∗#▲ |
| Spot (yes) | 249 (29.4%) | 291 (20.5%)∗ | 94 (11.4%)∗# | 38 (7.5%)∗# |
| Stasis (yes) | 81 (9.6%) | 150 (10.6%) | 92 (11.1%) | 36 (7.1%) |
| Greasy (yes) | 205 (24.2%) | 391 (27.6%) | 273 (33.1%)∗# | 199 (39.1%)∗# |
| Peel (yes) | 27 (3.2%) | 60 (4.2%) | 33 (4.0%) | 23 (4.5%) |
| Rot (yes) | 3 (0.4%) | 13 (0.9%) | 8 (1.0%) | 20 (3.9%)∗#▲ |
Note: ∗ denotes significant difference compared to < 3 0 years old group, # denotes significant difference compared to 30–39 years old group, and ▲ denotes significant difference compared to 40–49 years old group.
3.4. Correlation Analysis among Tongue Features and Diseases Based on Complex Networks
Overall, the tongue features of diseases in medical checkups were mainly characterized by increased fissures, tooth marks, and greasy coating. Table 6 showed the top ten weights relationships between tongue features and diseases. Fissured tongue, tooth-marked tongue, and greasy coating are most closely related to glucolipid metabolic diseases. Specifically, the fissured tongue had the highest weight in hypertension, reaching 0.974, and the weights for dyslipidemia, overweight, and NAFLD were 0.812, 0.799, and 0.775, respectively. For tooth-marked, the weights of hypertension, dyslipidemia overweight and NAFLD were 0.786, 0.649, 0.639, and 0.623, respectively. For greasy coating, the weights of hypertension and overweight were 0.649 and 0.540, respectively. As shown in Figure 8, greasy coating, tooth-marked tongue, and fissured tongue were more closely related to hypertension, dyslipidemia, NAFLD, and overweight.
Table 6.
Top 10 weight of tongue features and diseases in medical checkups.
| Tongue feature | Disease | Weight |
|---|---|---|
| Fissured tongue | Hypertension | 0.974 |
| Dyslipidemia | 0.812 | |
| Overweight | 0.799 | |
| NAFLD | 0.775 | |
|
| ||
| Tooth-marked tongue | Hypertension | 0.786 |
| Dyslipidemia | 0.649 | |
| Overweight | 0.639 | |
| NAFLD | 0.623 | |
|
| ||
| Greasy coating | Hypertension | 0.649 |
| Overweight | 0.540 | |
Figure 8.

Correlation analysis between tongue features and diseases based on complex network.
4. Discussion
Intelligent tongue diagnosis is one part of the important content in clinical TCM diagnosis. The researchers have applied the tongue image features extracted by deep learning to diabetes mellitus [4, 42, 43], NAFLD [44], lung cancer-assisted diagnosis [45], and TCM constitution recognition [46–48] with good performance of disease classification [13, 49]. Professor Yang Junlin's team [50] has applied the AI screening system for scoliosis developed and established by Faster R-CNN, and quantified the severity of scoliosis, with the accuracy reaching the average level of human experts. Tang et al. [51] have proposed a tongue image classification model based on multitask CNN, and the classification accuracy achieved 98.33%. However, due to the small sample size, the advantages of deep learning methods cannot be brought into full play, and the tongue features such as rotten, greasy, spotted, stasis, dryness, or thickness remain unexplored [52]. Liu et al. [53] applied Faster R-CNN to identify tooth-marked tongues and fissured tongues, and the accuracy of identifying fissured tongues and tooth-marked tongues was 0.960 and 0.860, respectively. The research only involved tooth marks and fissures due to the small sample size, so the advantages of the deep learning multi-label object detection model were not fully exerted.
Compared with the tongue classification model constructed by the classical CNN, Faster R-CNN as a highly integrated and end-to-end model is still the mainstream object detection neural network at present [54–56].
In our research, we focused on the categories of the tongue image features, rather than the precise feature position, so we applied the method of object detection to the multiclass recognition problem of tongue features. Our tongue feature detection model based on Faster R-CNN had a good generalization ability. With the unique advantages of deep learning and transfer learning in the identification of shape and texture features of tongue images, it can realize automatic high-throughput processing, better solve the problems of local tongue image recognition, integrate the identification and annotation of tongue images, and has a good visualization effect. Our model accomplished the multi-label object detection of 7 categories of tongue images, and the average accuracy achieved 90.67%, showing that Faster R-CNN had a good visualization effect in clinical TCM applications. In addition, the quantitative analysis of tongue features associated with diseases is an important link in the clinical diagnosis of tongue in TCM. The relationship between different features of tongue images, and the association of them with gender/age are not clear [57], and the correlations between them and the occurrence and progress of diseases are unknown. In this study, tongue feature diagnosis based on Faster R-CNN applied to a population undergoing routine medical checkup was a beneficial attempt to mine the implicit information of TCM tongue image and diseases through a complex network [40].
The intelligent diagnostic analysis was established to analyze 3601 physical examination population, and the results showed that the incidence of the fissured tongue was 41.49%, the tooth-marked tongue was 37.16%, the greasy coating was 29.66%, the spotted tongue was 18.66%, stasis tongue was 9.97%, the peeled coating was 3.97%, and the rotten coating was 1.22%, the incidence of fissures, tooth marks and greasy coating in men was higher than that in women, and the incidence of spotted tongue and stasis tongue in women was significantly higher than that in men, which may be related to deficiency of spleen qi, essence and blood in male subjects and excessive blood heat in female subjects. With age increasing, the incidence of fissured tongue and greasy coating increased, while the incidence of spotted tongue and tooth-marked tongue decreased, which may be related to the tendency of both qi and yin deficiency in the elderly and excess syndrome in the young. In the population with glucose and lipid metabolic diseases such as fatty liver and metabolic syndrome, fissures and greasy coating increased, which may be related to the pathogenesis of glucose and lipid metabolism, such as deficiency of qi and yin and dampness. These results were consistent with the clinical practice of TCM [58].
Although the method has some advantages, our model also has limitations.
Firstly, we will conduct further research on the multiclass classification of tongue images in the future. The performance of other neural network models such as VGGnet, ResNet, and DenseNet, will be explored in the task of tongue image classification.
Secondly, the tongue image object detection model has still to be optimized. Annotation of large samples requires a lot of labor cost. Tongue image data acquired by standardized technology has high stability, but the scalability is not strong. Regardless of the fact that the user visualization effect is good, it is still difficult to explain the extracted feature [59]. A more efficient model algorithm, such as unsupervised deep learning based on the flow generation model [60] and a self-attention mechanism based on end-to-end object detection with transformers [61], would be used to further optimize and establish a robust intelligent diagnosis and analysis model of tongue image.
Thirdly, our approach for the detection of tongue images is a qualitative model. However, the identification of tongue images in TCM clinics is complicated, which is not only a binary problem but also a quantification of pathological change. The changes in tongue image features are also of great value in the diagnosis of disease symptoms, which will be the focus of our subsequent research.
5. Conclusions
This study was a cross-sectional study of healthy people with medical checkups. Furthermore, a case-control study will be carried out on patients with major chronic diseases in order to prove the value of tongue features in the diagnosis of disease. In addition, we will optimize the Faster R-CNN model with the respect to the precise location of objects in a tongue image. This paper presents a supervised deep learning method based on a large amount of labeled data. In the future, we will explore a more robust self-supervised deep learning model for the multiclassification of tongue features.
The model Faster R-CNN shows good performance in tongue image classification. And we have preliminarily revealed the relationship between tongue features and gender, age, and metabolic diseases in a medical checkup population.
Acknowledgments
The authors would like to thank all the involved professional TCM clinicians, nurses, students, and technicians for dedicating their time and skill to the completion of this study. This study was supported by the National Key Technology R&D Program of China (grant number: 2017YFC1703301), the National Natural Science Foundation of China (grant numbers: 82104736 and 82104738), Shanghai Science and Technology Commission (grant number: 21010504400), and Shanghai Municipal Health Commission (grant numbers: 201940117 and 2020JQ003).
Contributor Information
Xinghua Yao, Email: xhyaosues@aliyun.com.
Jiatuo Xu, Email: xjt@fudan.edu.cn.
Data Availability
The datasets used and/or analyzed in this study are available upon reasonable request from the corresponding author.
Ethical Approval
This study was reviewed and approved by the Institutional Research Ethics Committee of Shuguang Hospital affiliated to Shanghai University of TCM (No. 2018-626-55–01). The clinical trial has been registered at the Chinese Clinical Trial Registry with the registration number: https://clinicaltrials.gov/ct2/show/ChiCTR1900026008.
Consent
The patients/participants provided their written informed consent to participate in this study..
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors' Contributions
T.J. and J.X. conceptualized the study; X.H. and X.Y. developed methodology; C.Z. designed software; X.H. and X.Y.validated the study; J.H. formally analyzed the study; L.T. investigated the study; J.C. collected resources; L.T. curated the data; T.J. and Z.L.wrote the original draft; T.J. and Z.L. reviewed and edited the manuscript; X.Y. visualized the study; C.Z. supervised the study; X.M. and L.Z. administered the project; J.X. and T.J. acquired funding. All authors have read and agreed to the published version of the manuscript. Tao Jiang and Zhou Lu contributed equally to this work.
References
- 1.Zhang D., Zhang H., Zhang B. Tongue Image Analysis . Singapore: Springer; 2017. [Google Scholar]
- 2.Tomooka K., Saito I., Furukawa S., et al. Yellow tongue coating is associated with diabetes mellitus among Japanese non-smoking men and women: the toon health study. Journal of Epidemiology . 2018;28(6):287–291. doi: 10.2188/jea.je20160169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jiao W., Hu X. J., Tu L. P., et al. Tongue color clustering and visual application based on 2D information. International Journal of Computer Assisted Radiology and Surgery . 2020;15(2):203–212. doi: 10.1007/s11548-019-02076-z. [DOI] [PubMed] [Google Scholar]
- 4.Li J., Yuan P., Hu X., et al. A tongue features fusion approach to predicting prediabetes and diabetes with machine learning. Journal of Biomedical Informatics . 2021;115 doi: 10.1016/j.jbi.2021.103693.103693 [DOI] [PubMed] [Google Scholar]
- 5.Obafemi-Ajayi T., Kanawong R., Dong X., Shao L., Duan Y. Features for automated tongue image shape classification. Proceedings of the IEEE International Conference on Bioinformatics & Biomedicine Workshops; 2013; Philadelphia, PA, USA. pp. 273–279. [Google Scholar]
- 6.Yang Z., Zhang D., Li N. M. Kernel false-colour transformation and line extraction for fissured tongue image. Journal of Computer-Aided Design & Computer Graphics . 2010;22(5):771–776. doi: 10.3724/sp.j.1089.2010.10754. [DOI] [Google Scholar]
- 7.Zhumu L. M., Lu P., Xia C. M., Wang Y. Q. Research on douglas-peucker method in feature extration from 55 cases of tooth-marked tongue images. Chinese Archives of Traditional Chinese Medicine . 2014;32(9):2138–2140. [Google Scholar]
- 8.Xu J., Zhang Z., Sun Y., Bao Y. M., Li W. S. Recognition of Acantha and Ecchymosis in tongue pattern. Academic Journal of Shanghai University of Traditional Chinese Medicine . 2004;4:38–40. [Google Scholar]
- 9.Wang X., Wang R., Guo D., Lu X. Z., Zhou P. A research about tongue-prickled recognition method based on auxiliary light source. Chinese Journal of Sensors and Actuators . 2016;29(10):1553–1559. [Google Scholar]
- 10.Liu L. L., Zhang D. Lecture Notes in Computer Science . Berlin, Germany: Springer; 2008. Extracting tongue cracks using the wide line detector; pp. 49–56. [Google Scholar]
- 11.Huang B., Wu J. S., Zhang D., Li N. M. Tongue shape classification by geometric features. Information Sciences . 2010;180(2):312–324. doi: 10.1016/j.ins.2009.09.016. [DOI] [Google Scholar]
- 12.Li X. Q., Wang D., Cui Q. WLDF: effective statistical shape feature for cracked tongue recognition. Journal of Electrical Engineering and Technology . 2017;12(1):420–427. doi: 10.5370/jeet.2017.12.1.420. [DOI] [Google Scholar]
- 13.Wang X., Wang X., Lou Y. Constructing tongue coating recognition model using deep transfer learning to assist syndrome diagnosis and its potential in noninvasive ethnopharmacological evaluation. Journal of Ethnopharmacology . 2021;285 doi: 10.1016/j.jep.2021.114905.114905 [DOI] [PubMed] [Google Scholar]
- 14.Esteva A., Kuprel B., Novoa R. A., et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature . 2017;542(7639):115–118. doi: 10.1038/nature21056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ker J., Wang L., Rao J., Lim T. Deep learning applications in medical image analysis. IEEE Access . 2017;6:9375–9389. [Google Scholar]
- 16.Hu M. C., Lan K. C., Fang W. C., et al. Automated tongue diagnosis on the smartphone and its applications. Computer Methods and Programs in Biomedicine . 2019;174:51–64. doi: 10.1016/j.cmpb.2017.12.029. [DOI] [PubMed] [Google Scholar]
- 17.Zhou C., Fan H., Li Z. Tonguenet: accurate localization and segmentation for tongue images using deep neural networks. IEEE Access . 2019;7:148779–148789. doi: 10.1109/access.2019.2946681. [DOI] [Google Scholar]
- 18.Lin B., Xle J., Li C., Qu Y. Deeptongue: tongue segmentation via resnet. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2018; Calgary, Alberta, Canada. IEEE; pp. 1035–1039. [DOI] [Google Scholar]
- 19.Cai Y., Wang T., Liu W., Luo Z. A robust interclass and intraclass loss function for deep learning based tongue segmentation. Concurrency and Computation: Practice and Experience . 2020;32(22):p. e5849. doi: 10.1002/cpe.5849. [DOI] [Google Scholar]
- 20.Li L., Luo Z., Zhang M., Cai Y., Li C., Li S. An iterative transfer learning framework for cross-domain tongue segmentation. Concurrency and Computation: Practice and Experience . 2020;32(14):p. e5714. doi: 10.1002/cpe.5714. [DOI] [Google Scholar]
- 21.Wang X., Liu J., Wu C., et al. Artificial intelligence in tongue diagnosis: using deep convolutional neural network for recognizing unhealthy tongue with tooth-mark. Computational and Structural Biotechnology Journal . 2020;18:973–980. doi: 10.1016/j.csbj.2020.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Xu Q., Zeng Y., Tang W., et al. Multi-task joint learning model for segmenting and classifying tongue images using a deep neural network. IEEE Journal of Biomedical and Health Informatics . 2020;24(9):2481–2489. doi: 10.1109/jbhi.2020.2986376. [DOI] [PubMed] [Google Scholar]
- 23.Tang W., Gao Y., Liu L., et al. An automatic recognition of tooth- marked tongue based on tongue region detection and tongue landmark detection via deep learning. IEEE Access . 2020;8:153470–153478. doi: 10.1109/access.2020.3017725. [DOI] [Google Scholar]
- 24.Weng H., Li L., Lei H., Luo Z., Li C., Li S. A weakly supervised tooth-mark and crack detection method in tongue image. Concurrency and Computation: Practice and Experience . 2021;33(16):p. e6262. doi: 10.1002/cpe.6262. [DOI] [Google Scholar]
- 25.Ren S., He K., Girshick R. B., Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2017;39(6):1137–1149. doi: 10.1109/tpami.2016.2577031. [DOI] [PubMed] [Google Scholar]
- 26.He K., Zhang X., Ren S., Jian S. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition (CVPR); 2016; Seattle, WA, USA. pp. 770–778. [DOI] [Google Scholar]
- 27.Jiang T., Hu X.-J., Yao X.-H., et al. Tongue image quality assessment based on a deep convolutional neural network. BMC Medical Informatics and Decision Making . 2021;21(1):p. 147. doi: 10.1186/s12911-021-01508-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Qi Z., Tu L. P., Luo Z. Y. Tongue image database construction based on the expert opinions: assessment for individual agreement and methods for expert selection. Evidence Based Complementary and Alternative Medicine . 2018;2018:9. doi: 10.1155/2018/8491057.8491057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Xu J. Clinical Illustration of Tongue Diagnosis of Traditional Chinese Medicine . Beijing, China: Chemical Industry Press; 2017. [Google Scholar]
- 30.Lin T.-Y., Maire M., Belongie S., et al. European Conference on Computer Vision . Berlin, Germany: Springer; 2014. Microsoft coco: common objects in context; pp. 740–755. [DOI] [Google Scholar]
- 31.Zhang A., Lipton Z. C., Li M., Smola A. J. Dive into deep learning. 2021. https://arxiv.org/abs/210611342 .
- 32.Gu Y., Li Z., Yang F. Infrared vehicle detection algorithm with complex background based on improved faster R-CNN. Laser & Infrared . 2022;52(4):614–619. [Google Scholar]
- 33.Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research . 2014;15(1):1929–1958. [Google Scholar]
- 34.Powers D. M. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. 2020. https://arxiv.org/abs/2010.16061 .
- 35.Olson D. L., Delen D. Advanced Data Mining Techniques . Berlin, Germany: Springer Science & Business Media; 2008. [Google Scholar]
- 36.Tharwat A. Classification assessment methods. Applied Computing and Informatics . 2021;17(1):168–192. doi: 10.1016/j.aci.2018.08.003. [DOI] [Google Scholar]
- 37.Jurman D., Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics . 2020;21(1):p. 6. doi: 10.1186/s12864-019-6413-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhu T., Zhang S., Guo R., Chang G.-C. Improved evaluation method for node importance based on node contraction in weighted complex networks. Systems Engineering and Electronics . 2009;31(8):1902–1905. [Google Scholar]
- 39.Shi Y., Hu X., Cui J. Clinical data mining on network of symptom and index and correlation of tongue-pulse data in fatigue population. BMC Medical Informatics and Decision Making . 2020;21(1):1–14. doi: 10.1186/s12911-021-01410-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Li C., Wang W., Li J., Xu J., Li X. Community detector on symptom networks with applications to fatty liver disease. Physica A: Statistical Mechanics and Its Applications . 2019;527 doi: 10.1016/j.physa.2019.121328.121328 [DOI] [Google Scholar]
- 41.Hagberg A., Swart P., Chult D. S. Exploring network structure, dynamics, and function using network. In: Varoquaux T. V. G., Millman J., editors. Proceedings of the 7th Python in Science Conference (SciPy 2008); 2008; Pasadena, CA, USA. pp. 11–15. [Google Scholar]
- 42.Balasubramaniyan S., Jeyakumar V., Nachimuthu D. S. Panoramic tongue imaging and deep convolutional machine learning model for diabetes diagnosis in humans. Scientific Reports . 2020;12(1):p. 186. doi: 10.1038/s41598-021-03879-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhang B., Kumar B. V., Zhang D. Detecting diabetes mellitus and nonproliferative diabetic retinopathy using tongue color, texture, and geometry features. IEEE Transactions on Biomedical Engineering . 2014;61(2):491–501. doi: 10.1109/tbme.2013.2282625. [DOI] [PubMed] [Google Scholar]
- 44.Jiang T., Guo X.-J., Tu L.-P. Application of computer tongue image analysis technology in the diagnosis of NAFLD. Computers in Biology and Medicine . 2021;135 doi: 10.1016/j.compbiomed.2021.104622.104622 [DOI] [PubMed] [Google Scholar]
- 45.Zhou J., Zhang Q., Zhang B. An automatic multi-view disease detection system via collective deep region-based feature representation. Future Generation Computer Systems . 2021;115:59–75. doi: 10.1016/j.future.2020.08.038. [DOI] [Google Scholar]
- 46.Li H. H., Wen G. H., Zeng H. B. Natural tongue physique identification using hybrid deep learning methods. Multimedia Tools and Applications . 2019;78(6):6847–6868. doi: 10.1007/s11042-018-6279-8. [DOI] [Google Scholar]
- 47.Wen G., Ma J., Hu Y., Li H., Jiang L. Grouping attributes zero-shot learning for tongue constitution recognition. Artificial Intelligence in Medicine . 2020;109 doi: 10.1016/j.artmed.2020.101951.101951 [DOI] [PubMed] [Google Scholar]
- 48.Ma J., Wen G., Wang C., Jiang L. Complexity perception classification method for tongue constitution recognition. Artificial Intelligence in Medicine . 2019;96:123–133. doi: 10.1016/j.artmed.2019.03.008. [DOI] [PubMed] [Google Scholar]
- 49.Hu Y., Wen G., Luo M. Fully-channel regional attention network for disease-location recognition with tongue images. Artificial Intelligence in Medicine . 2021;118 doi: 10.1016/j.artmed.2021.102110.102110 [DOI] [PubMed] [Google Scholar]
- 50.Yang J., Zhang K., Fan H., et al. Development and validation of deep learning algorithms for scoliosis screening using back images. Communications Biology . 2019;2(1):p. 390. doi: 10.1038/s42003-019-0635-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Tang Y., Wang L., He X., Chen P., Yuan G. Classification of tongue image based on multi-task deep convolutional neural network. Computer Science . 2021;45(12):255–261. [Google Scholar]
- 52.Zhang X., Chen Z., Gao J., Huang W., Li P., Zhang J. A two-stage deep transfer learning model and its application for medical image processing in Traditional Chinese Medicine. Knowledge-Based Systems . 2022;239 doi: 10.1016/j.knosys.2021.108060.108060 [DOI] [Google Scholar]
- 53.Liu M., Wang T., Zhou L. Study on extraction and recognition of traditional Chinese medicine tongue manifestation: based on deep learning and migration learning. Journal of Traditional Chinese Medicine . 2019;60(10):835–840. [Google Scholar]
- 54.Girshick R. Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); 2015; Montreal, British Columbia, Canada. pp. 1440–1448. [Google Scholar]
- 55.Zheng L., Zhang X., Hu J., et al. Establishment and applicability of a diagnostic system for advanced gastric cancer T staging based on a faster region-based convolutional neural network. Frontiers in Oncology . 2020;10:p. 1238. doi: 10.3389/fonc.2020.01238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.He Y., Tan J., Han X. High-resolution computer tomography image features of lungs for patients with type 2 diabetes under the faster-region recurrent convolutional neural network algorithm. Computational and Mathematical Methods in Medicine . 2022;2022:11. doi: 10.1155/2022/4147365.4147365 [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 57.Hsu P.-C., Wu H.-K., Huang Y.-C., et al. Gender-andage-dependent tongue features in a community-based population. Medicine . 2019;98(51):p. e18350. doi: 10.1097/MD.0000000000018350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Hsu P. C., Huang Y. C., Chiang J. Y., Chang H. H., Liao P. Y., Lo L. C. The association between arterial stiffness and tongue manifestations of blood stasis in patients with type 2 diabetes. BMC Complementary and Alternative Medicine . 2016;16(1):p. 324. doi: 10.1186/s12906-016-1308-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Attia Z. I., Noseworthy P. A., Lopez-Jimenez F., et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. The Lancet . 2019;394(10201):861–867. doi: 10.1016/s0140-6736(19)31721-0. [DOI] [PubMed] [Google Scholar]
- 60.Kingma D. P., Dhariwal P. Glow: generative flow with invertible 1 x 1 convolutions. 2018. https://arxiv.org/abs/03039 .
- 61.Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S. European Conference on Computer Vision . Berlin, Germany: Springer; 2020. End-to-end object detection with transformers; pp. 213–229. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and/or analyzed in this study are available upon reasonable request from the corresponding author.
