Abstract
Different viewing or shooting situations can affect color authenticity and generally lead to visual inconsistencies for the same images. At present, deep learning has gained popularity and opened up new avenues for image processing and optimization. In this paper, we propose a novel regression model named TococoNet (Tongue Color Correction Network) that extends from CNN (convolutional neural network) to eliminate the color bias in tongue images. The TococoNet model consists of symmetric encoder-–decoder U-Blocks which are connected by M-Block through concatenation layers for feature fusion at different levels. Initially, we train our model by simulatively introducing five common biased colors. The various image quality indicators holistically demonstrate that our model achieves accurate color correction for tongue images, and simultaneously surpasses conventional algorithms and shallow networks. Furthermore, we conduct correction experiments by introducing random degrees of color bias, and the model continues to perform well for achieving excellent correction effects. The model achieves up to 84% correction effectiveness in terms of color distance ΔE for tongue images with varying degrees of random color cast. Finally, we obtain excellent color correction for actual captured images for tongue diagnosis application. Among these, the maximum ΔE can be reduced from 30.38 to 6.05. Overall, the TococoNet model possesses excellent color correction capabilities, which opens promising opportunities for clinical assistance and automatic diagnosis.
Keywords: color correction, CNNs, deep learning, tongue features
1. Introduction
Tongue diagnosis is a crucial aspect of traditional Chinese medicine (TCM) diagnosis and treatment [1,2]. It offers the advantage of being non-invasive and convenient, as it analyzes human cases through examining the tongue’s condition. Traditional tongue diagnosis mainly relies on visual description, which encounters different situations that can affect color and lead to subjectivity [3]. Therefore, the current development direction of tongue diagnosis in modernizing Chinese medicine involves the application of computer vision technology for tongue image feature recognition [4]. The tongue, rich in features, closely correlates with health status [5], with the tongue color directly indicating abnormalities. However, shooting environments and lighting conditions are often inconsistent. This inconsistency leads to color bias in captured tongue images, making them different from the actual tongue appearance. Even the same tongue may appear differently under various shooting conditions, negatively impacting diagnosis [6,7]. Consequently, color correction of captured tongue image data becomes an essential component of automated diagnostic technology.
Early studies introduced various algorithms for color correction in different environments, such as methods based on Kernel Partial Least Squares Regression (K-PLSR) correction methods [6], backpropagation (BP) neural networks [7], white balance [8], neural network mapping [9], and polynomial regression methods [10,11]. Li Zhuo et al. [6,7] successively proposed color correction models based on simulated annealing (SA)-–genetic algorithm (GA)-–back propagation (BP) and K-PLSR. The SA-GA-BP algorithm can achieve a higher convergence rate than the classical BP algorithm. Meanwhile, K-PLSR converts linear regression to nonlinear and overcomes the difficulty of small data size. Both algorithms are based on Munsell ColorChecker for model training. Then, they substitute biased color tongue images into the model for correction. However, it affects the generalization ability of the model, and neither of them provide much interpretation or analysis of the correction results. Finlayson et al. [8] assumed that after the successful correction of white points in a biased color image, the other colors would be corrected as well. They used least squares and linear transformations for fitting. However, this white balance color correction method is not suitable for tongue images with excessive white points. Wang et al. [9] proposed an optimized BP neural network based on Munsell ColorChecker, aiming to avoid color correction in a particular imaging system. They also compared polynomial regression, ridge regression, and support vector regression algorithms. Wang et al. [11] compared polynomial regression algorithms, artificial neural networks, and support vector machine algorithms. They concluded that polynomial regression algorithms are more suitable for color correction of the tongue, while artificial neural networks are the least effective.
Additionally, the perfect reflection algorithm and derived algorithms, including the gray world algorithm based on standard bias weighting, were proposed [12]. In their study, Finlayson et al. presented the drawbacks of linear color correction (LCC) and conventional polynomial color correction (PCC). Then, they proposed a new root polynomial color correction algorithm. This algorithm avoids the possibility of significant errors in some areas. It also uses the Macbeth color chart to compare color bias before and after correction. Shukeri and Chen et al. performed color correction of tongue images based on device-independent LAB color space in comparison with Munsell color charts [13,14]. However, conventional polynomial regression algorithms have limitations in tongue color correction. These limitations arise from the requirement to cross-reference with standard color blocks.
At present, the medical field widely adopts artificial intelligence technology [15]. This technology extracts information from tongue images to identify physical features imperceptible to the human eye. It also offers new possibilities for color correction. Recent advancements have introduced convolutional neural networks for the color correction of tongue images [16,17]. These networks achieve superior results in objective color correction under diverse lighting conditions and present innovative ideas in the field. Convolutional neural networks are inspired by visual perception and mimic the way the brain works. Convolutional operations are used instead of general matrix multiplication operations, and therefore are more advantageous in the field of image data processing [18]. However, their algorithm only builds five convolutional layers. It does not apply pooling and batch normalization (BN) layers, leaving much room for improvement in the final correction results.
Previous tongue color correction methods have limitations in terms of correction results and do not take into account multiple color bias situations. To address these challenges, this paper makes the following key contributions:
-
(a)
Deep Learning Architecture for Color Correction: We propose TococoNet, a deep convolutional neural network tailored for the regression task of tongue image color correction. Though TococoNet draws inspiration from U-Net’s topological structure, it adopts a more refined design. Specifically, this design incorporates symmetric U-Blocks to realize multi-scale feature extraction, alongside a dedicated M-Block that facilitates effective multi-level feature fusion. Together, these components enable the model to comprehensively integrate both local color details and global color context.
-
(b)
Simulation-to-Real Training Framework: We introduced a robust framework that trains the model on simulated color-biased images. The model’s subsequent successful application to real captured images demonstrates the efficacy and strong generalization capability of this approach.
-
(c)
Comprehensive Experimental Validation: We conduct extensive experiments, including corrections for five common and random degrees of color bias, demonstrating that TococoNet significantly outperforms conventional algorithms and an existing shallow CNN (DCCN). Furthermore, we validate the model’s practical utility by achieving excellent correction results on real captured tongue images.
The remainder of the paper is organized as follows. Section 2 describes classical color correction algorithms. Section 3 outlines the preparation of our proposed model and experiments. Section 4 presents our experimental data and comparison results. The final section discusses and concludes the proposed color correction algorithm.
2. Related Works
In this section, we present three previous color correction algorithms and a shallow convolutional neural network model, along with their respective computational formulas. The conventional algorithms rely on fixed formulas for color correction, lacking generalization capabilities and yielding subpar correction results. While the shallow convolutional neural network demonstrates improved color correction performance, it falls short in quantitative analysis indices. To address the limitations of deeper networks in tongue color correction, we propose TococoNet, a deep convolutional neural network (CNN) model, which offers excellent generalization ability for color correction.
2.1. Gray World Algorithm
Xin et al. [19] proposed a color correction method rooted in the gray world principle. Initially, the dimensionality of the captured tongue image is reduced twice. This reduction decreases the data volume substantially to merely 3.6%. It also remarkably enhances the efficiency of subsequent processing. Following this reduction, a color bias detection is executed on the streamlined image. Any images exhibiting color bias are then subjected to color correction using the gray world method. Kwok et al. [20] also utilize the gray world algorithm for color correction. They distinguish themselves from others by additionally employing a gamma correction method. It fulfills the prerequisites of the gray world algorithm while preserving image intensity and circumventing color saturation.
The fundamental premise of the gray world assumption is that average gray values of the RGB channels in a colorful image will converge to the same gray value, denoted as K. The gray world algorithm accomplishes color correction by adjusting the mean value of three channels in the biased color picture. The calculation formula is as follows:
| (1) |
Here, (Ravg, Gavg, Bavg) are the mean values of the three-color channels. Meanwhile, (R, G, B) represent the pixel value of the original image in each color channel, and (Rnew, Gnew, Bnew) are the adjusted color components of the pixel.
2.2. Perfect Reflection Algorithm
The perfect reflection algorithm, also known as the specular method, operates on the alternative assumption that a white object in an image has extremely large R, G, and B values under any light source. Therefore, this algorithm follows a clear principle. It assumes the brightest point in the image is white, then uses this white point as a reference for white balance to achieve color correction. The calculation steps are as follows:
| (2) |
Here, (Rmax, Gmax, and Bmax) represent the color components of the pixel, which are the maximum value of (R + G + B) in the image. Here, (Rw, Gw, and Bw) are generally taken as 255 each.
In references [12,21], Liu and Xu et al. proposed a color correction algorithm by combining the gray world and perfect reflection methods (SDW), leveraging the advantages of both methods. The algorithm first divides the image into 16 × 16 blocks and calculates the color component weighted mean values of the image. Assigning different weights based on the standard bias of the pixels in each block reduces the effect of large swaths of a single color on the channel mean and better corrects the image.
2.3. Polynomial Regression Algorithm
The correction principle of the polynomial regression algorithm is to calculate the polynomial matrix of the mapping between the off-color color blocks and the standard color blocks, and then substitute the matrix into the overall image to achieve image correction. The calculation process is as follows, where X is the 3-channel matrix of the surrogate and V is the polynomial regression matrix.
| (3) |
Wang et al. [11] conducted a comparative analysis of polynomial regression algorithms, artificial neural networks, and support vector regression. They ultimately concluded that polynomial regression algorithms are more suitable for the color correction of tongue images. Sui et al. [22] implemented a root polynomial regression approach. They incorporated the pth root of the pth polynomial term, effectively eliminating nonlinear variations in certain regions caused by variations in lighting conditions.
Polynomial regression-based algorithms are typically coupled with standard color blocks to correct the biased color blocks to standard color blocks. In Section 4.2, we selected the ColorChecker standard color card. This color card is produced by Menzel Laboratories, a subsidiary of AISI. It has 24 color blocks, and each block represents the actual color of natural objects.
2.4. Convolutional Neural Networks
Lu et al. [16,17] introduced a color correction model named deep-color correction network (DCCN) based on a convolutional neural network. They constructed a regression model comprising just five convolutional layers. This model is segmented into three parts: the input layer, the nonlinear transformation layers, and the output layer. Their use of a 40 × 40 patch for training the model provides valuable insights. It is noteworthy that they provide a color adjustment feature. This feature leverages three supplementary parameters to enable precise manipulation of color correction. In turn, this capability accommodates a wide range of work environments.
3. Method Overview
We employ a convolutional neural network to execute color correction on tongue images. Initially, we introduce a novel deep model capable of handling various color bias situations. Subsequently, we simulate five distinct tongue color biases, including lower brightness, higher brightness, and overall reddish, bluish, and greenish colors. Furthermore, we conduct a thorough evaluation of the correction’s efficacy using diverse metrics and compare our approach with alternative methods.
3.1. Network Architecture
U-Net, a semantic segmentation model based on convolutional neural networks as proposed in [23], is widely recognized for its efficient and accurate segmentation capability. In this work, we proposed a novel regression model for color correction inspired by the U-Net architecture. The TococoNet, combining downsampling coding and upsampling decoding, demonstrates excellent color correction abilities under various bias situations. The architecture of our model is shown in Figure 1.
Figure 1.
Overview of the TococoNet. (a) U-Block, tasked with encoder and decoder. (b) M-Block responsible for connecting the encoder and decoder. (c) Framework for the entire network. (d) Role of each layer.
The U-Block module is a symmetric encoder-–decoder structure, similar to that of U-Net. It is responsible for handling the encoding and decoding tasks. To better understand the image content, it gradually reduces the feature map size by stacking multiple convolutional and pooling layers. The encoder path employs successive convolutional and downsampling layers to capture multi-scale contextual information. The decoder path then upsamples the features to restore spatial resolution. A key feature of our U-Block is the use of internal residual connections, which help to propagate original color and texture information through the network, mitigating the information loss that can occur in very deep networks. Downsampling employs a 2 × 2 max pooling operation while upsampling uses bilinear interpolation, preserving smoothness and detail information. To prevent significant loss of contextual information, U-Block performs downsampling operations, adjusting the pooling step accordingly. Feature map size is restored to its original size after two upsampling passes. Feature map concatenation (concat) operations in each layer of downsampling and upsampling enhance the extraction of feature information, boosting the network model’s expressiveness.
The M-Block connects the encoder and decoder, achieving multi-scale and multi-level information fusion by linking bottom layer features with top layer features. Its core structure consists of two primary components: a block containing 8 convolutional layers, followed by a critical fusion operation. Maximum pooling and scales yield the final color-corrected image. The difference from U-Block is that the image scale remains unchanged during feature extraction with M-Block. The M-Block helps the model to better integrate features from different scales and levels, which is beneficial for color correction as it allows the model to consider both local and global color information.
The TococoNet model consists of two U-Blocks and one M-Block with the use of concatenation layers for fusion of features at different levels. The first U-Block uses 32 intermediate filters and outputs 64 channels. The M-Block uses 32 filters and outputs 64 channels. The second U-Block uses 32 intermediate filters and outputs 64 channels. The output path consists of a Batch Normalization (BN) layer, a 1 × 1 convolutional layer with 64 filters, and a final 3 × 3 convolutional layer with 3 filters. The architecture incorporates a total of 26 convolutional layers, with its depth providing the necessary capacity to learn the complex, non-linear mapping for color correction. A pivotal design choice is the strategic placement of Batch Normalization (BN) layers, particularly following the final feature fusion, which stabilizes training and acts as a regularizer for this deep regression network.
For offline training, we use MSEloss as the model’s loss function, calculated as:
| (4) |
where F and L represent the network output and real information, respectively, and x is the input position. The model was trained using the Adam optimization algorithm.
3.2. Training Environment
The training of this model was conducted on an Ubuntu 20.04 operating system using the PyTorch deep learning framework. The batch size was set to 32, and the model was trained for 200 epochs. We used the Adam optimizer with an initial learning rate of 1 × 10−4 and L2 weight decay of 1 × 10−5. A learning rate scheduler was employed, which reduced the learning rate by a factor of 0.5 if the validation loss plateaued for 15 epochs. Furthermore, we utilize three graphics cards for parallel training, thereby significantly boosting the training efficiency. The specific experimental environments, including the graphics card and CPU, are detailed in Table 1.
Table 1.
Experimental environment.
| Entry | Version |
|---|---|
| GPU | Nvidia RTX4090 |
| CPU | Intel Xeon Silver 4316 |
| CUDA | 11.6 |
| Python | 3.7 |
| Numpy | 1.26.4 |
| Opencv | 4.6.0.66 |
| System | Ubuntu 20.04 |
| PyTorch | 1.12.0 |
3.3. Evaluation Indicators
(a) ΔE: Chromatic distance ΔE, based on the CIE LAB color space, is an evaluation index calculated as:
| (5) |
The LAB color space, composed of one luminance and two color channels, ensures uniform perception and device independence for human color sensation. Smaller ΔE indicates closer similarity to the original image. In this research, we use the ΔE for the initial detection of color bias. We derived the general criteria for ΔE from the results of our numerous experiments and the quantitative analysis of combined metrics and visual observations. The general criteria for ΔE are classified as follows: when ΔE < 6, it can be considered the same as the original image; when ΔE < 15, it is a mild color bias; when ΔE > 15, it is a color bias. In the evaluation, we categorize chromatic distance into five levels, namely the average color moment, the highest color moment, and the first (Q1), second (Q2) and third (Q3) quartiles, to comprehensively assess image quality.
(b) MAE and RMSE: Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) measure the difference between predicted and actual values. They are, respectively, defined as:
| (6) |
| (7) |
Smaller values for both MAE and RMSE indicate a better fit of the model.
(c) PSNR: Peak Signal to Noise Ratio (PSNR) evaluates image quality, with larger values indicating less distortion. The PSNR is calculated using the Mean Squared Error (MSE):
| (8) |
| (9) |
where MAXI represents the maximum pixel value of the image, usually set to 255.
(d) SSIM: The Structural Similarity Index Measure (SSIM) determines the similarity of two images, more aligned with human visual characteristics than PSNR. It ranges between 0 and 1, with a greater value indicating higher similarity, and the value 1 indicating identical images. The SSIM is computed as:
| (10) |
where μx and μy represent the mean, σx and σy represent the variance, σxy represents the covariance, and c1 and c2 are constants used to maintain stability.
3.4. Training Dataset
For the acquisition of tongue images, we employed an instrument to simulate natural light and ensure uniform illumination. The color temperature of the instrument was set to 6600 K, the illuminance was around 3000, and the color rendering index was 90%. We collected a total of 577 tongue images and finally selected 507 of them for training and testing after quality screening. The acquired tongue images were processed for biased color, resulting in the creation of a dataset in different states. We simulated five common bias color conditions, as illustrated in Figure 2: lower brightness (lb), higher brightness (hb), and overall reddish, bluish, and greenish colors. The simulation formulas for lower and higher brightness are as follows:
| (11) |
where the color refers to the adjusted brightness, the contrast takes values in the range of 0.8–1, and the brightness refers to the offset that takes values in the range of 0–30.
Figure 2.
Datasets created by simulating bias color about five common off-colors, from left to right, are (a) origin image, (b) lower brightness image, (c) higher brightness image, (d) overall reddish image, (e) overall bluish image, and (f) overall greenish image, respectively.
The simulation formula for overall reddish, bluish and greenish is as follows:
| (12) |
where the color refers to each color component and adjustment is taken as 50. Due to the numerical limitations of the RGB format, each channel will be clipped after adjustment. If the adjusted color value exceeds 255, it will be set to 255. The brightness adjustments are not subject to this limitation, so no additional cropping was performed. After multiple adjustments to these parameters, the simulated biased color images are similar to the actual biased color.
Insufficiency of the hardware conditions in need of training with the whole image requires us to use color blocks. The images were uniformly sized at 640 × 640 and cropped into 160 × 160 color blocks with a step of 80 to construct the training, validation and test dataset. The ratio of the training set, validation set, and test set is 60%, 30%, and 10%.
4. Results
In this section, we present three distinct experimental outcomes pertaining to the proposed model. Firstly, we showcase the effectiveness of the TococoNet in correcting biased color blocks. Secondly, we conduct a comparative analysis with alternative algorithms. Finally, to further scrutinize the model’s correction capabilities, we present the color correction results for randomly biased tongue images.
4.1. Correction Results of Color Blocks
First, since the TococoNet model is trained on a dataset built around color blocks, we create new color blocks by extracting individual pixels from the tongue images after simulating bias color. The model is then used to correct these color blocks under different bias conditions. The biased color blocks and the correction results are depicted in Figure 3. According to qualitative observations, a pixel-level comparison reveals the model’s excellent color correction capabilities. It is evident that there exists a substantial disparity in color among the color blocks prior to correction. Following the application of the model’s correction, the colors of these blocks become strikingly similar, demonstrating the effective correction of the five bias situations.
Figure 3.
Correction results for color blocks, from left to right, are (a) lower brightness image, (b) higher brightness image, (c) overall reddish image, (d) overall bluish image, and (e) overall greenish image, respectively.
We choose ΔE, PSNR, and MAE for the initial evaluation of model performance with respect to color block correction. As illustrated in Figure 4, the ΔE values for the five biased color cases are reduced compared to before correction, and the PSNR values also show improvement. The MAE value is also decreased, indicating a reduction in the error between the corrected and the original images. Specifically, as depicted in Figure 4a, the ΔE difference before and after correction is minimal for a reddish situation, whereas it is most significant for a greenish situation. This is attributed to the fact that most tongue colors exhibit red, while greenish hues are quite uncommon. The PSNR curves in Figure 4b for the five color-biased scenarios show relatively flat trends. This suggests that corrected images across different color biases have similar graphic quality. As seen in Figure 4c, the case of a bluish situation yields the smallest pixel difference between the biased and standard images, while lower brightness cases result in the largest disparity.
Figure 4.
Evaluation curves of correction results for pixel color blocks. (a) Chromatic distance (ΔE). (b) Peak Signal to Noise Ratio (PSNR). (c) Mean Absolute Error (MAE). Evaluation indicators of the corrected images are significantly improved.
Furthermore, we conduct a thorough analysis of the evolution of the PSNR value and the loss function during the training of the TococoNet. We plotted the corresponding curves in Figure 5 to visually illustrate these changes. As the training progressed, the PSNR value of the training set gradually ascended, indicating an improvement in the model’s performance. It reached a peak value of 43 ultimately, suggesting a significant enhancement in image quality. The curves of the loss function also revealed that the training process was relatively stable, exhibiting gradual convergence and avoiding any signs of overfitting.
Figure 5.
Training results for color blocks. (a) PSNR. (b) Loss function.
4.2. Comparison with Other Algorithms
To further verify the validity of our model, we applied color correction to the images in the dataset using the TococoNet and other algorithms. The correction results are presented in Figure 6. The first two columns from left to right are the original and biased color images. The remaining columns, from left to right, are images of the tongue after color correction using different methods. As shown in Figure 6e, we compared the results of different algorithms. The polynomial regression algorithm introduces a contrasting color block, which is due to its correction characteristics. Among the four conventional methods, the polynomial regression algorithm yields relatively good correction results, while the other algorithms fail to meet the standard requirements for color correction. It is evident that classical methods prove less effective in correcting biased color pictures of the tongue and are unsuitable for tongue color correction.
Figure 6.
Correction results using the TococoNet and other algorithms. (a) Original tongue images. (b) Biased color images, (c) TococoNet, (d) DCCN, (e) Polynomial regression algorithm, (f) Gray world algorithm, (g) Perfect reflection algorithm, (h) SDW algorithm. TococoNet and DCCN corrections are significantly more excellent than the other algorithms.
The outcomes of both CNNs models clearly demonstrate excellent color correction ability, yielding convincing results for various cases of color bias. As depicted in Figure 6c,d, the color correction results of the TococoNet and DCCN exhibit remarkable outcomes when compared to the biased color tongue images. The distinction between the corrected tongue image and the original is imperceptible to the naked eye, necessitating a further quantitative evaluation of the models’ correction efficacy.
To further emphasize the model’s correction ability, evaluation parameters for the calibration of the TococoNet and the DCCN are provided in Table 2, offering robust evidence of the model’s correction capability. After correction, the SSIM indicators of the TococoNet model surpassed 0.99, signifying the exceptional quality of the corrected tongue images. The average ΔE dropped below 2, while the maximum ΔE hovered around 20, demonstrating that the corrected images closely mimic the original ones. Furthermore, the PSNR witnessed a significant improvement, nearing 50, significantly enhancing the color quality of the tongue image. Additionally, the MAE and RMSR values underwent substantial reductions, confirming the minimization of differences between the corrected and original images.
Table 2.
Evaluation indicators for the correction results of our model and the DCCN model. The first of these columns shows the five cases of biased color. The second column presents the different correction states, which are the initial tongue image before correction, after being corrected by our model, and DCCN, respectively. The remaining columns are different evaluation indicators.
| Color Bias | Correction State |
PSNR | SSIM | E_mean | E_Q1 | E_Q2 | E_Q3 | E_Max | MAE | RMSE |
|---|---|---|---|---|---|---|---|---|---|---|
| lb | Original | 13.7290 | 0.9100 | 20.2729 | 19.7567 | 20.5144 | 21.0200 | 25.1379 | 0.2019 | 0.2031 |
| Ours | 43.2377 | 0.9940 | 1.3028 | 1.0000 | 1.2709 | 1.6171 | 17.9678 | 0.0050 | 0.0063 | |
| DCCN | 36.3999 | 0.9762 | 2.5581 | 1.2709 | 2.0000 | 3.0000 | 118.3315 | 0.0058 | 0.0103 | |
| hb | Original | 14.8771 | 0.9532 | 19.0591 | 17.7407 | 18.5665 | 20.6949 | 29.3753 | 0.1753 | 0.1768 |
| Ours | 36.4937 | 0.9907 | 2.4330 | 1.2709 | 2.0381 | 3.1623 | 17.0541 | 0.0059 | 0.0084 | |
| DCCN | 33.1935 | 0.9815 | 4.3246 | 2.2706 | 3.7926 | 6.0510 | 129.9648 | 0.0095 | 0.01296 | |
| red | Original | 19.1908 | 0.9871 | 21.6838 | 21.6436 | 22.3689 | 23.0614 | 33.1298 | 0.0566 | 0.0570 |
| Ours | 40.9000 | 0.9955 | 1.4789 | 1.0741 | 1.2709 | 1.6171 | 11.2078 | 0.0061 | 0.0071 | |
| DCCN | 34.0676 | 0.9622 | 4.0921 | 2.3696 | 3.6268 | 5.1590 | 106.6675 | 0.0112 | 0.0142 | |
| green | Original | 18.9848 | 0.9807 | 34.0577 | 33.6523 | 34.1437 | 34.9353 | 44.6470 | 0.1136 | 0.1139 |
| Ours | 43.5519 | 0.9957 | 1.3325 | 1.0000 | 1.2709 | 1.4676 | 20.6038 | 0.0041 | 0.0061 | |
| DCCN | 32.2688 | 0.9711 | 4.7157 | 3.4413 | 4.4547 | 5.4630 | 88.9739 | 0.0206 | 0.0237 | |
| blue | Original | 18.9457 | 0.9973 | 27.6570 | 26.9515 | 27.2481 | 27.9175 | 35.5181 | 0.0227 | 0.0228 |
| Ours | 43.2933 | 0.9961 | 1.6585 | 1.0741 | 1.4676 | 2.2361 | 17.9950 | 0.0028 | 0.0040 | |
| DCCN | 33.1320 | 0.9667 | 3.2881 | 1.4676 | 2.2361 | 3.6268 | 128.5556 | 0.0124 | 0.0207 |
However, upon examining the evaluation metrics, it becomes evident that the correction performance of DCCN lags behind that of the TococoNet. The PSNR value of DCCN’s correction results barely surpasses 30, and its SSIM value falls below 0.99, significantly trailing behind the TococoNet. Furthermore, the MAE and RMSE values of DCCN are significantly higher than those of the TococoNet, reflecting the inferior quality of its calibration. The detailed processing of the shallow network is inferior to our network, with the maximum ΔE even exceeding 100, which suggests a failure in correcting the individual pixel points. This is likely due to the model’s limited capacity and lack of batch normalization, which can lead to unstable training and poor generalization on outlier pixels. Almost all indicators of DCCN are a bit insufficient compared to the TococoNet.
To ensure the statistical robustness of our results and to quantify the variability of the model’s performance, we employed a bootstrap resampling approach. From the test set, we generated 40 bootstrap samples (with replacement) and evaluated the model on each. All reported performance metrics are subsequently presented as the mean ± standard deviation across these 40 runs. As shown in Table 3, the PSNR and ABC metrics of TococoNet were both superior to those of DCCN. And the significance of the performance differences between TococoNet and the DCCN was assessed using paired t-tests for each color bias scenario and metric. The significance of the performance differences between TococoNet and the DCCN baseline was assessed using paired t-tests for each color bias scenario and metric. A result was considered statistically significant at p < 0.001. As shown in Table 4, the performance difference between the two methods was found to be highly statistically significant.
Table 3.
Evaluation indicators for the bootstrap resampling results between our model and the DCCN model. For key metric (PSNR, ΔE), we report the mean ± standard deviation.
| Indicator | Model | lb | hb | Red | Green | Blue |
|---|---|---|---|---|---|---|
| PSNR | Ours | 43.31 ± 1.53 | 35.31 ± 1.53 | 39.85 ± 1.23 | 43.31 ± 1.53 | 42.80 ± 1.96 |
| DCCN | 36.4 ± 1.48 | 32.92 ± 1.19 | 34.44 ± 0.89 | 31.92 ± 1.19 | 34.85 ± 2.60 | |
| ΔE | Ours | 1.69 ± 1.46 | 2.60 ± 1.40 | 1.69 ± 1.46 | 1.78 ± 1.45 | 1.67 ± 1.47 |
| DCCN | 3.22 ± 1.44 | 4.72 ± 1.44 | 4.22 ± 1.42 | 4.36 ± 1.38 | 3.57 ± 1.33 |
Table 4.
Evaluation indicators for the paired t-tests comparing the bootstrap distributions of TococoNet against the DCCN. The t-value and p-value are used to evaluate the statistical significance of the performance difference (p < 0.001 indicates an extremely significant difference).
| Indicator | t/p | lb | hb | Red | Green | Blue |
|---|---|---|---|---|---|---|
| PSNR | t | 18.65 | 7.11 | 20.63 | 33.93 | 15.19 |
| p | 4.93 × 10−21 | 1.54 × 10−8 | 1.37 × 10−22 | 1.45 × 10−30 | 5.69 × 10−18 | |
| ΔE | t | −13.77 | −20.48 | −22.78 | −31.90 | −24.42 |
| p | 1.46 × 10−16 | 1.79 × 10−22 | 3.86 × 10−24 | 1.47 × 10−29 | 3.06 × 10−25 |
4.3. Correction Results of Random Biased Color Images
To verify the stability and generalization ability of the model, we constructed a second dataset with a random degree of color bias using lower brightness biased images as an example. The element of uncertainty introduced by the random bias posed a greater challenge to the model’s generalization capabilities. Testing the trained model with randomly biased color images, as illustrated in Figure 7, reveals the model’s capability of correct color correction for different degrees of bias. After correction, the details of color in the tongue images were preserved, and tongues with varying degrees of bias color exhibited similar color performance. Evaluation using ΔE is labeled at the top of each picture, clearly indicating that our correction results have a lower ΔE compared to the off-color images. Specifically, among the five tongue images with ΔE ranging from 10.41 to 25.07, their ΔE values were all reduced to around 4 after color correction, with the most significant improvement achieving an 84% reduction.
Figure 7.
(a) Original tongue images. (b) Corrected images. Focused on lower brightness biased images, a significant reduction in the ΔE indicator indicates excellent correction results for random degrees of color bias.
4.4. Correction Results of Actual Biased Color Images
To evaluate the correction effect of the trained model on real images, we captured color-biased tongue images under different colored lighting conditions and compared them with images obtained in a non-color-biased lighting environment. Since lower and higher brightness conditions are unfavorable for images, only color-biased scenarios were selected. For this initial investigation, we captured images from two different tongues under three distinct colored light sources (red, green, blue). This resulted in a total of 6 real color-biased tongue images for testing. As shown in Figure 8, (a) represents the images under non-color-biased lighting, (b) shows the color-biased images, and (c) displays the corrected images. We continued to use ΔE as the evaluation metric, which is annotated above each image. The correction outcomes were favorable for images biased towards blue and green colors, while overall reddish images exhibited slightly less effective correction. Among them, the ΔE of the overall bluish tongue images can be reduced from 23.97 and 30.38 to around 6, while that of the images with overall greenish colors can drop from 32.44 and 34.19 to approximately 7 and 8, respectively.
Figure 8.
The correction effect of the biased color images taken in real life can have the same excellent correction effect. (a) Standard color images. (b) Biased images. (c) Corrected images.
More comprehensive and extensive results of the real-world experiments are shown in Figure 9. We visualized the correction results utilizing both ΔE and MAE metrics. Notably, the correction of overall bluish tongue images exhibits superior performance based on the ΔE metric, whereas the reddish color correction falls behind. Conversely, according to the MAE metric, the reddish correction yields the highest effect, with the bluish correction demonstrating the opposite trend.
Figure 9.
Comparative results of real-world color correction testing. (a) Chromatic distance (ΔE). (b) Mean Absolute Error (MAE).
However, for images with reddish colors, the ΔE can only be lowered to about 14, indicating significant room for further improvement. The reason why reddish images showed less effective correction is that the color of tongue itself is red. And the tongue color is similarly difficult to label when categorizing red features. Specific red, deep red, pale red and other colors are difficult to distinguish when they are close to each other [15]. In tongue diagnosis, different shades of red, such as pale red, bright red, and crimson red, correspond to different physiological states. During training, the model may receive contradictory signals: for an image with a reddish tint, the red color in some areas is a “bias” that needs correction, while the red color in other areas is a “feature” that needs retention. This ambiguity at the semantic level makes the model’s learning objective unclear, causing it to tend toward conservative processing and thus resulting in insufficient correction.
5. Discussion
The experimental results in Section 4 show that our proposed model can effectively correct tongue images after bias coloring. In comparison to other methods, our utilization of a deep convolutional neural network for regression exhibits superior performance. The advantage of our model lies in its deeper architecture compared to Lu et al. [16]’s model. It not only enhances its performance but also effectively avoids the issue of overfitting. This approach is exemplified by Steffens et al. [24], who employed a deep network to conduct correction experiments on images with inadequate exposure.
However, it is noteworthy that this study also possesses certain limitations and holds the potential for further improvement. Firstly, owing to the unique nature of the deep learning approach, our experiments are simulated bias color experiments that are conducted within a controlled laboratory setting. While the experimental data unequivocally demonstrates the model’s effective color correction capabilities, there are numerous challenges that may arise when applying the model for correction in real-world scenarios. Secondly, the model demonstrates relatively inferior performance in correcting reddish color casts compared to blue and green ones, as evidenced by the higher residual ΔE in real-image experiments. Quantitative and more in-depth research will be conducted to analyze the causes in the future. Furthermore, our network lacks optimization mechanisms and holds significant potential for further enhancement and improvement.
As part of our future research, we intend to initially fine-tune the network’s details and introduce structure optimization mechanisms. Optimizing the processes of training and prediction is also paramount, encompassing the improvement of the loss function and hyperparameters, as well as minimizing the training cost. We will explore perceptual loss functions, such as combining MS-SSIM (Multi-Scale Structural Similarity Index Measure) with L1 loss, with the aim of better preserving texture structures during the color conversion process. Secondly, we plan to experiment with other neural network architectures for designing a color correction model, and select the most suitable one through comparison for further optimization. Additionally, we aim to simulate various other biased color situations to enhance the model’s generalization capability and robustness. We will continue to conduct authentic bias color experiments to account for the influence of practical factors on tongue images. On the flip side, there is a notable scarcity of research concerning the detection of biased colors, which necessitates the urgent exploration of innovative ideas in this area. As part of our endeavor, we will design a lightweight color cast detection network to automatically identify the type and degree of color cast before correction. This pursuit is of equal importance in advancing the digitization of tongue diagnosis.
6. Conclusions
The widespread application of deep learning has opened new avenues for research directions in the field of tongue diagnosis. In response to the challenge of color correction in tongue images, a deep regression model named TococoNet is proposed to eliminate the color bias in these images. Initially, we simulate five common cases of color bias to train the model. The correction results of TococoNet are significantly effective, lifting the PSNR of tongue images to 40 and shrinking ΔE to 2. The accurate color correction results demonstrate a higher performance than conventional algorithms and shallow networks. Furthermore, we conduct correction experiments for random degrees of color bias, achieving an excellent correction effect as well. Finally, we also achieve excellent results when applying the correction to actual captured images. Among these, the correction effect is better for the bluish and greenish tongue images, while the effect on the reddish images is relatively average. In summary, the TococoNet model possesses excellent color correction capabilities, which can offer promising opportunities for clinical support and automatic diagnosis.
Author Contributions
Writing—original draft, X.C. and D.Z.; Writing—review, X.C.; Methodology, X.C. and D.Z.; Data curation, C.J. and W.Z.; Visualization, Z.Z.; Supervision, Z.Z.; Funding acquisition, X.C., Z.Z. and C.X. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The datasets used and analyzed during the current study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding Statement
This work was supported by the Key Research and Development Program of Shanxi Province [No. 202102130501011], the National Natural Science Foundation of China [No. 62204231], and the Key Research and Development Program of Hainan Province [No. ZDYF2022SHFZ304].
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Li J., Zhang Z., Zhu X., Zhao Y., Ma Y., Zang J., Li B., Cao X., Xue C. Automatic classification framework of tongue feature based on convolutional neural networks. Micromachines. 2022;13:501. doi: 10.3390/mi13040501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang H., Zhang X., Cai Y. Research on teeth marks recognition in tongue image; Proceedings of the 2014 International Conference on Medical Biometrics; Shenzhen, China. 30 May–1 June 2014; pp. 80–84. [Google Scholar]
- 3.Hu M.C., Cheng M.H., Lan K.C. Color correction parameter estimation on the smartphone and its application to automatic tongue diagnosis. J. Med. Syst. 2016;40:18. doi: 10.1007/s10916-015-0387-z. [DOI] [PubMed] [Google Scholar]
- 4.Qi Z., Tu L., Chen J., Hu X., Xu J., Zhang Z. The classification of tongue colors with standardized acquisition and ICC profile correction in traditional Chinese medicine. BioMed Res. Int. 2016;2016:3510807. doi: 10.1155/2016/3510807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li X., Zhang Y., Cui Q., Yi X., Zhang Y. Tooth-marked tongue recognition using multiple instance learning and CNN features. IEEE Trans. Cybern. 2018;49:380–387. doi: 10.1109/TCYB.2017.2772289. [DOI] [PubMed] [Google Scholar]
- 6.Zhuo L., Zhang P., Qu P., Peng Y., Zhang J., Li X. A K-PLSR-based color correction method for TCM tongue images under different illumination conditions. Neurocomputing. 2016;174:815–821. doi: 10.1016/j.neucom.2015.10.008. [DOI] [Google Scholar]
- 7.Zhuo L., Zhang J., Dong P., Zhao Y., Peng B. An SA–GA–BP neural network-based color correction algorithm for TCM tongue images. Neurocomputing. 2014;134:111–116. doi: 10.1016/j.neucom.2012.12.080. [DOI] [Google Scholar]
- 8.Finlayson G.D., Drew M.S. White-point preserving color correction; Proceedings of the Color Imaging Conference; Scottsdale, AZ, USA. 17–20 November 1997; pp. 258–261. [Google Scholar]
- 9.Wang X., Zhang D. An optimized tongue image color correction scheme. IEEE Trans. Inf. Technol. Biomed. 2010;14:1355–1364. doi: 10.1109/TITB.2010.2076378. [DOI] [PubMed] [Google Scholar]
- 10.Su Q., Cheng H., Sun W., Zhang F. A novel correction algorithm based on polynomial and TPS models; Proceedings of the 2011 International Conference of Information Technology, Computer Engineering and Management Sciences; Nanjing, China. 24–25 September 2011; pp. 52–55. [Google Scholar]
- 11.Wang X., Zhang D. A comparative study of color correction algorithms for tongue image inspection; Proceedings of the Medical Biometrics: Second International Conference (ICMB); Hong Kong, China. 28–30 June 2010; pp. 392–402. [Google Scholar]
- 12.Xu X., Cai Y., Liu C. Color cast detection and color correction methods based on image analysis. Meas. Control Technol. 2008;27:10–12. [Google Scholar]
- 13.Shukeri A.S.A., Kamarudin N.D., Makhtar S.N., Rahayu S.B., Ramli S. Preliminary Study On Colour Correction Analysis For Medical Imaging Application. Zulfaqar J. Def. Sci. Eng. Technol. 2019;2:1–8. [Google Scholar]
- 14.Chen H., Chen S., Jiang C., Zhang Y., Lin C., Lee J. Computational tongue color simulation in tongue diagnosis. Color Res. Appl. 2022;47:121–134. doi: 10.1002/col.22717. [DOI] [Google Scholar]
- 15.Zhu X., Ma Y., Guo D., Men J., Xue C., Cao X., Zhang Z. A Framework to Predict Gastric Cancer Based on Tongue Features and Deep Learning. Micromachines. 2022;14:53. doi: 10.3390/mi14010053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lu Y., Li X., Zhuo L., Zhang J., Zhang H. Dccn: A deep-color correction network for traditional Chinese medicine tongue images; Proceedings of the 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW); San Diego, CA, USA. 23–27 July 2018; pp. 1–6. [Google Scholar]
- 17.Lu Y., Li X., Gong Z., Zhuo L., Zhang H. TDCCN: A two-phase deep color correction network for traditional Chinese medicine tongue images. Appl. Sci. 2020;10:1784. doi: 10.3390/app10051784. [DOI] [Google Scholar]
- 18.Li Z., Liu F., Yang W., Peng S., Zhou J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021;33:6999–7019. doi: 10.1109/TNNLS.2021.3084827. [DOI] [PubMed] [Google Scholar]
- 19.Xin G., Zhu L., Liang H., Ding C. A fast tongue image color correction method based on gray world method; Proceedings of the Artificial Intelligence and Security: 7th International Conference (ICAIS); Dublin, Ireland. 19–23 July 2021; pp. 724–735. [Google Scholar]
- 20.Kwok N., Wang D., Jia X., Chen S., Fang G., Ha Q. Gray world based color correction and intensity preservation for image enhancement; Proceedings of the 2011 4th International Congress on Image and Signal Processing; Shanghai, China. 15–17 October 2011; pp. 994–998. [Google Scholar]
- 21.Liu Q., Huang X., Wang B., Wang Y. A method for color cast detection and color correction of tongue inspection images under natural environment. J. Xiamen Univ. (Nat. Sci.) 2016;55:278–284. (In Chinese) [Google Scholar]
- 22.Sui J., Xia C., Yang K., Zhang Y., Wang Y., Yan H., Qian P. Tongue image color correction method based on root polynomial regression; Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC); Chongqing, China. 24–26 May 2019; pp. 1337–1342. [Google Scholar]
- 23.Ronneberger O., Fischer P., Brox T. U-net: Convolutional networks for biomedical image segmentation; Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference; Munich, Germany. 5–9 October 2015; pp. 234–241. [Google Scholar]
- 24.Steffens C.R., Huttner V., Messias L.R.V., Drews P.L.J., Botelho S.S.C., Guerra R.S. CNN-based luminance and color correction for ill-exposed images; Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP); Taipei, Taiwan. 22–25 September 2019; pp. 3252–3256. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and analyzed during the current study are available from the corresponding author upon request.









