Abstract
Background
Sensitive skin (SenS) is a syndrome leading to unpleasant sensations with little visible signs. Grading its severity generally relies on questionnaires or subjective ratings.
Materials and methods
The SenS status of 183 subjects was determined by trained assessors. Answers from a four‐item questionnaire were converted into numerical scores, leading to a 0–15 SenS index that was asked twice or thrice. Parameters from hyperspectral images were used as input for a multi‐layer perceptron (MLP) neural network to predict the four‐item questionnaire score of subjects. The resulting model was used to evaluate the soothing effect of a cosmetic cream applied to one hemiface, comparing it to that of a placebo applied to the other hemiface.
Results
The four‐item questionnaire score accurately predicts SenS assessors’ classification (92.7%) while providing insight into SenS severity. Most subjects providing repeatable replies are non‐SenS, but accepting some variability in answers enables identifying subjects with consistent replies encompassing a majority of SenS subjects. The MLP neural network model predicts the SenS score of subjects with consistent replies from full‐face hyperspectral images (R 2 Validation set = 0.969). A similar quality is obtained with hemiface images. Comparing the effect of applying a soothing cosmetic to that of a placebo revealed that subjects with the highest instrumental index (> 5) show significant SenS improvement.
Conclusion
A four‐item questionnaire enables calculating a SenS index grading its severity. Objective evaluation using hyperspectral images with an MLP neural network accurately predicts SenS severity and its favourable evolution upon the application of a soothing cream.
Keywords: artificial intelligence, hyperspectral imaging, index, instrumental evaluation, multi‐layer perceptron, questionnaire, sensitive skin
1. INTRODUCTION
Sensitive skin (SenS) is a sensory syndrome in which stimuli that usually do not induce a reaction lead, in the absence of lesions, to unpleasant subjective sensations: burning, pain, pruritus, and/or tingling. 1 If irritants can cause these reactions, they can also be triggered by environmental and/or physiological factors. 2 , 3 Although the skin can appear less supple, dehydrated, more erythematous and with some telangiectasias, 4 SenS is generally not associated with any objective clinical signs. 5 Yet, epidemiological surveys show that the prevalence of self‐reported SenS varies from 23% to 92%, depending on the study. The pooled proportion reaches 71%, 6 a figure that seems to have increased in recent years. 7 , 8 , 9 , 10 Accordingly, SenS is the subject of intensive research from academics, clinicians, and the healthcare/cosmetic industry.
A major problem complicating SenS evaluation is its subjective nature and the absence of symptoms. Therefore, a widely used method to assess SenS relies on patients’ self‐reported signs. As a result, several subjective assessment questionnaires have been developed. They generally rely on yes/no answers or ratings for questions concerning self‐perceived reactions to various factors. The cumulative score is then used to classify skins into categories, generally four classes: not‐sensitive, slightly sensitive, sensitive, or very sensitive. 3 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18
Another approach to evaluate SenS is semi‐subjective and based on the reaction induced by a chemical stimulus, either vasodilation or stinging. 5 , 7 , 19 Vasodilation is explored using sodium lauryl sulfate (SLS). Yet, the test often considered as a reference assesses the stinging sensation induced by a 5–10% lactic acid solution generally applied to the nasolabial fold: the lactic acid stinging test (LAST). By scoring the intensity of the reaction, it enables differentiating between subjects who respond (“stingers”) and those who do not. LAST results globally correlate with self‐reported SenS. 20 Nevertheless, the prediction is not perfect, and patients reporting themselves as having SenS do not always respond to LAST, while individuals who profess non‐SenS can significantly react. 15 , 21 , 22 , 23 , 24
Objective measurement of epidermal biophysical properties by non‐invasive devices is also used. Given the skin barrier impairment of SenS subjects, one such approach is based on transepidermal water loss (TEWL). It shows higher values in unchallenged skin of SenS subjects and a greater increase upon stimuli. 25 Another approach is the assessment of the stratum corneum hydration, as it is lower in SenS subjects and stingers. 26 In vivo microscopy devices and amplitude‐scan ultrasound can identify specific structures and differences in epidermal thickness. 27 , 28 , 29 Finally, objective SenS evaluation can take advantage of the frequently reported skin redness. 25 In addition to visual assessment, several devices enable quantification of skin redness by determining the a* value, the erythema index or the cutaneous blood flow. Even if results are conflicting, it seems that SenS subjects would have lower baseline values for a*, the erythema index and the cutaneous blood flow. 25
Whether subjective, semi‐subjective or objective, each approach has limitations. While subjective evaluation is adapted to epidemiological studies, enabling gathering answers from large cohorts, it is also the method of choice to screen subjects for clinical studies. Yet, the four‐grade scale generally used lacks sensitivity to precisely evaluate SenS severity or monitor the efficacy of a treatment. Objective measures require devices, and results are better suited to determine a SenS/non‐Sens status than to assess severity. Semi‐subjective methods, especially LAST and SLS occlusion tests, enable fine grading of SenS. Yet, if score comparisons allow monitoring changes in SenS severity, results only partially relate to self‐reported SenS.
Analyses of SenS would benefit from new evaluation methods. These should enable reproducible evaluation and be sensitive enough to faithfully identify slight variations in SenS severity. If some of these new evaluation tools could come from a better comprehension of SenS pathophysiology, there is still room to improve existing approaches. Developing a self‐assessment questionnaire leading to an extended, almost continuous grading scale rather than a few discrete classes would be helpful. Setting a simple objective method that precisely rates SenS severity would also be useful. This study reports on a work that presents new approaches that do so.
2. MATERIALS AND METHODS
2.1. Subjects
This non‐invasive evaluation study was performed following the principles of the Declaration of Helsinki. Participants were informed of the purpose of the study, received detailed information, and all gave their written informed consent before enrolment.
A total of 183 Caucasian women living in the region of Lyon (France) were recruited and participated in the elaboration/evaluation of the questionnaire‐based SenS assessment and the construction of the instrumental SenS index. This study was carried out between January 2019 and November 2022. Among others, the exclusion criteria were severe skin alterations or skin diseases, the use of aspirin and its derivatives, antibiotics, steroids, and anti‐inflammatory or antihistaminic drugs. Subjects with excessive solar/UV exposure during the month preceding the study were also excluded.
Analysis of the effect of a soothing cosmetic cream was evaluated during the winter of 2023 on 30 subjects (mean ± SD = 45.1 ± 13.6 year old). Fifteen of these subjects participated in the elaboration/evaluation of the questionnaire‐based SenS assessment and the construction of the instrumental SenS index. The inclusion/exclusion criteria for this study were identical to those mentioned above. These subjects also had to meet the ± 3 consistency criterion upon two successive questionnaires asked a few days apart.
2.2. Questionnaire‐based assessment of SenS by a trained assessor
Upon recruitment of subjects, their SenS status (sensitive, non‐sensitive) was determined by a trained assessor based on their replies to a questionnaire. This questionnaire, used for some years by a Contract Research Organization specialized in dermatological and cosmetic evaluations (IEC, France), was established through literature searches, 3 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 practical experience, draft and trial runs. Four of its questions evaluate the intensity of adverse and repeated facial skin reactions to stimuli. These questions are: “Do you experience repeated adverse facial reactions to…” 1). “…care products?”, 2). “…hygiene products?”, 3). “…the environment?”, and 4). “…other factors?”. For these four questions, participants rate their reactions according to: no reactions, slight reactions, moderate reactions, and marked reactions.
2.3. Four‐item questionnaire assessment of SenS
Based on assessor's feedbacks, we used the same four questions they were using to classify SenS/non‐SenS subjects for our questionnaire. The intensity of the reaction reported was converted into numerical values according to the following scale: no reactions = 0, slight reactions = 1, moderate reactions = 2, and marked reactions = 3. As reactions to cosmetics are reported to be frequent inducers of SenS, answers to that question were given a weight of 2. Answers to all other questions received a weight of 1. Therefore, the numerical conversion of answers to these four questions led to a 0 (no reaction) to 15 (marked reaction to all stimuli) numerical scale.
2.4. Subject repeatability
The four‐item questionnaire was used to evaluate the repeatability/reproducibility of the replies from 181 subjects (the two subjects who answered the questionnaire only once were not included in this analysis). It was asked twice (62 subjects) or three times (119 subjects), calculating, each time, the SenS score of all subjects as well as the 300 score differences between two successive assessments (119 × 2 + 62).
These different assessments were carried out with variable delays. In 233 cases, this delay was between a few hours and 31 days (mean ± SD = 11.2 ± 7.5 days). It was between 33 and 288 days in 45 cases (mean ± SD = 87.4 ± 61.9 days). Twenty‐two assessments were performed with a delay of 965 to 1023 days (mean ± SD = 997.2 ± 19.3 days).
2.5. Hyperspectral imaging of subjects’ faces
Pictures of subjects were taken with the SpectraFace® (Newtone Technologies, France), a full‐face hyperspectral imaging device. Equipped with white and blue LEDs that provide homogeneous light and with polarizing filters avoiding specular reflection, it enables acquiring series of 31 images. Each image of the series corresponds to a specific 10 nm wide wavelength that, together, cover the entire visible spectrum (400–700 nm).
After washing their face with a gentle face cleanser, subjects were allowed to rest for 30 min in a climatized room (21 ± 12°C and 45 ± 5% relative humidity) before taking front and facial side (left and right) images. The resulting images were stored as 2048 × 2048 pixel‐large TIFF files, and the 31‐image series were used to reconstruct color pictures. 30 Various regions of interest (ROIs) were considered: the nasolabial folds, the side of the nose, the cheek, and the entire half‐face (Figure 1). These ROIs were manually drawn on the images from the first acquisition. Rigid spatial registration based on image intensity enabled precise positioning of the ROIs on the images from the second acquisition.
FIGURE 1.

Regions of interest (ROI) used to calculate the average L*, a*, b*, H76, and haemoglobin parameters from hyperspectral images: (A) nasolabial folds, (B) side of the nose, (C) cheek, and (D) entire half face.
Reconstructed colour pictures enabled extracting the average L*, a*, and b* values of each ROI. They also allowed computing the average colour homogeneity H76 parameter of the different ROIs:
In the above formula, n is the number of pixels within a ROI; L*i, a*i and b*i are the L*, a* and b* values of the pixel i; and μL*, μa*, μb* are the average L*, a*, and b* value of the ROI.
The hyperspectral image series were also processed to quantify the average amount of oxygenated and deoxygenated haemoglobin within a ROI according to previously published procedures. 31 , 32 , 33
2.6. Instrumental skin sensitivity index
The six parameters calculated from the hyperspectral images (L*, a*, b*, H76, oxygenated hemoglobin rate, deoxygenated hemoglobin rate) of the four different ROIs were used to elaborate an instrumental skin sensitivity index. The 24 resulting parameters from hyperspectral images served as input for a multi‐layer perceptron (MLP) neural network. 34 The objective was to predict the subjects’ SenS score from the four‐item questionnaire using, as ground truth, the mean of their score when evaluated twice or the median of the scores upon triple assessment. The MLP neural network was trained on a dedicated set of subjects (training set), and the performance of the resulting model was tested on a validation set.
2.7. Evaluation of the soothing effect of a cosmetic
This evaluation was conducted as a double‐blind evaluation, comparing the effect of topical application of a placebo and an active cream, each applied twice daily on a randomly selected hemiface. The INCI composition of the placebo is the following: Water/Aqua, Cetearyl Ethylhexanoate, Ethylhexyl Isononanoate, Glyceryl Stearate Citrate, Cetearyl Alcohol, Pentylene Glycol, Phenoxyethanol, Decylene Glycol, 1,2 Hexanediol, Pentylene Glycol, Dimethicone, Sodium Hydroxide, Acrylates/C10‐30 Alkyl Acrylates Crosspolymer, Xanthan Gum, Fragrance. The soothing cream had the same base but was supplemented with SymRelief 100 (Symrise, Germany).
SenS was evaluated on each hemiface using the instrumental skin sensitivity index. For this purpose, front and facial side pictures were taken using the SpectraFace (Newtone Technologies, France) at baseline (D0) and after 21 days. ROIs previously described were identified from the reconstructed color pictures, and the different parameters (average L*, a*, b*, H76, oxygenated and deoxygenated hemoglobin) were calculated for each of them. These parameters were used as input for the MLP model to compute the instrumental SenS score of each subject at D0 and D21.
2.8. Statistical analysis
Results are presented as mean and standard deviation (SD), except for the analysis of the effect of the soothing cream for which the standard error of the mean (SEM) is used. Nominal data were compared using Chi‐square tests. Depending on the results from the Shapiro‐Wilk test, the outcome of the soothing/placebo cream applications was analyzed using Student t‐tests when data had a normal distribution. The Wilcoxon test was otherwise used. For all statistical analyses, p > 0.05 was considered significant.
3. RESULTS
3.1. Study participants
The descriptive statistics of the cohort of subjects are presented in Table 1. Statistical analysis revealed no significant difference between groups or between a group and the entire cohort.
TABLE 1.
Descriptive statistics of the initial cohort of 183 subjects.
| Total cohort | SenS subjects | Non‐SenS subjects | |
|---|---|---|---|
| Number of subjects | 183 | 136 | 47 |
| Age a | 45.8 ± 14.0 | 44.7 ± 14.1 | 49.3 ± 13.4 |
| Phototype b | 3 / 54 / 87 / 39 | 3 / 41 / 66 / 26 | 0 / 13 / 21 /13 |
| Skin type c | 44 / 60 / 33 / 41 / 5 | 35 / 47 / 22 / 29 / 3 | 9 / 13 / 11 / 12 / 2 |
Mean ± SD.
Phototypes are given as: I / II / III / IV.
Skin types are provided in the following order: dry / mixed dry / normal / mixed oily / oily.
3.2. A four‐item questionnaire is sufficient to predict SenS classification by assessors
Using a 25‐item questionnaire, we recently reported that predicting the trained assessors' binary classification (SenS vs. non‐SenS) is possible by converting and combining replies into 0–10 numerical scores. 35 Even if limited, the number of questions could hinder rapid and reproducible prediction of subjects' SenS. Since assessors using this questionnaire admitted performing their evaluation primarily focusing on the four questions assessing adverse and repeated facial skin reactions to different stimuli. Therefore, we evaluated the possibility of predicting assessors' classification using a 0–15 scale, the score of which derives from the numerical conversion and combination of the replies to these four questions.
The distribution of the 183 subjects' four‐item questionnaire SenS scores at the time of their inclusion and their SenS status determined by assessors is presented in Figure 2. Among the 36 subjects with a 0 or 1 score, only one has been classified by assessors as non‐SenS. Most subjects with a score of 2 (14 out of 20, 70%) were declared SenS by assessors. For scores of 3 and higher, only six subjects were non‐SenS, while the majority (121, 95.3%) were SenS, according to assessors.
FIGURE 2.

Distribution of the SenS score of the 183 subjects at the time of their inclusion in the study and SenS status determined by assessors.
To determine how subjects presenting a score of 2 should be considered, calculations of confusion matrixes were used (Table 2). In agreement with the fact that most subjects scoring 2 are categorized as SenS, the confusion matrix considering scores of 2 and higher as SenS presents better SenS and overall categorization accuracy.
TABLE 2.
Confusion matrixes between assessors’ evaluation and SenS score calculated from the four‐item questionnaire upon the first assessment.
| Considering a score of 2 and over as SenS | |||
|---|---|---|---|
| SenS score | |||
| Non‐SenS | SenS | ||
| Assessor assessment |
Non‐SenS SenS |
35 1 |
12 135 |
| Non‐SenS accuracy | 74.5% | ||
| SenS accuracy | 99.3% | ||
| Overall accuracy | 92.9% | ||
| Considering a score of 3 and over as SenS | |||
|---|---|---|---|
| SenS score | |||
| Non‐SenS | SenS | ||
| Assessor assessment |
Non‐SenS SenS |
41 15 |
6 121 |
| Non‐SenS accuracy | 87.2% | ||
| SenS accuracy | 89.0% | ||
| Overall accuracy | 88.5% | ||
3.3. Upon repeated assessment, only very few subjects provide repeatable answers
Having an easy‐to‐use SenS evaluation questionnaire, we tested the repeatability of subjects’ answers by assessing them twice (62 subjects) or three times (119 subjects). The overall proportions of non‐SenS and SenS subjects remain stable at each assessment (Table 3), and a Chi‐square test indicates no significant variations in the frequencies of both categories (p = 0.512).
TABLE 3.
Number and proportion of non‐SenS/SenS subjects at each assessment and repartition of the SenS grades.
| Non‐SenS | SenS | |
|---|---|---|
|
Assessment 1 (183 subjects) |
36 (19.7%) 0: 19 / 1: 17 |
147 (80.3%) 2: 20 / 3: 13 / 4: 15 / 5: 10 / 6: 23 / 7: 15 / 8: 13 / 9: 13 / 10: 12 / 11: 7 / 12: 5 / 13: 0 / 14: 1 / 15: 0 |
|
Assessment 2 (181 subjects) |
39 (21.5%) 0: 19 / 1: 20 |
142 (78.5%) 2: 17 / 3: 14 / 4: 6 / 5: 19 / 6: 18 / 7: 12 / 8: 14 / 9: 13 / 10: 18 / 11: 8 / 12: 1 / 13: 1 / 14: 1 / 15: 0 |
|
Assessment 3 (119 subjects) |
13 (10.9%) 0: 6 / 1: 7 |
106 (89.1%) 2: 17 / 3: 10 / 4: 7 / 5: 9 / 6: 12 / 7: 16 / 8: 6 / 9: 9 / 10: 11 / 11: 4 / 12: 4 / 13: 1 / 14: 0 / 15: 0 |
Nevertheless, the reality behind these numbers is more complex. When considering the score difference between two successive evaluations (Figure 3), 63 out of 300 reveal no score difference. Only 40 subjects present no score variations between the first and second assessments. Among them, 27 were assessed a third time, and only 15 (55%) still had the exact same SenS score with absolutely no variations in their answers within a total time frame ranging from a few hours to 65 days.
FIGURE 3.

Distribution of score differences between an assessment (n) and the following one (n+1), and inter‐assessment score differences considered for the different panels of subjects.
Considering all subjects presenting no score variation, only two were among the 11 ones evaluated with a delay of only a few hours, which definitely corresponds to repeatability: a frequency of 18.2%. Five of them (8.9%) were assessed with a delay of more than 32 days (35–65 days, 56 subjects), which corresponds to reproducibility rather than repeatability. Most of them were non‐SenS (score of 2 or below, 15 subjects, 56.6%), and one‐third (five subjects) declared themselves not‐SenS at all (score of 0).
3.4. Repetition of the four‐item questionnaire enables identifying subjects with consistent answers
Repeated assessments with the four‐item questionnaire indicate that most subjects present score variations between evaluations. These variations can be substantial, sometimes reaching differences of 10 or more within a few days. Therefore, it is essential to identify subjects presenting reasonable score differences between assessments.
The plot of the 300 score differences obtained upon two successive assessments (first and second as well as second and third) is presented in Figure 3. One hundred ninety‐eight score differences match a ± 2 consistency criterion, 37 of whom were from subjects evaluated twice, giving only one score difference. All other score differences are from subjects assessed three times. We included them in the panel of subjects providing consistent answers only if all their score differences (between the first and second assessment, the second and third, but also between the first and third) were equal or below a ± 2 threshold. This led to the identification of 57 additional subjects. Thus, of the 181 subjects assessed several times, only 94 (37+57) can be considered as providing “reasonably” varying answers leading to consistent SenS scores. Only 38.3% (36 subjects) of them are non‐SenS. Therefore, working with these subjects increases the proportion of SenS subjects compared to the panel of subjects providing repeatable answers.
We also considered a slightly less stringent consistency criterion corresponding to a maximum ± 3 score difference between the score from the first and second assessment but also, when applicable, between those of the second and third as well as the first and third assessments. This led to the identification of an enlarged group of 117 subjects, 56 of whom (30.6%) are non‐SenS. Among them, 42 were assessed twice and 75 three times.
3.5. Repetition of the four‐item questionnaire also enables identifying subjects with acceptable answer variations
Changes in answers given by subjects, and, therefore, in their SenS scores, do not have the same impact on the non‐SenS/SenS classification if subjects have low or high SenS scores. While small changes in replies from subjects with low SenS scores can lead to a change in their non‐SenS/SenS classification, this is not the case for subjects with high SenS scores. Therefore, if the focus is to discriminate between non‐SenS and SenS subjects, it is possible to envision a less stringent type of panel: a panel of subjects providing acceptable answer variations.
Based on this rationale, we considered score differences presenting small inter‐assessment differences for low SenS scores (± 3) and an accepted score difference that increases as the SenS score rises, yet limiting the probability that subjects change from SenS to non‐SenS classification or vice‐versa. The distribution of the corresponding score differences is presented in Figure 3. Similarly to the previous panels, subjects belonging to the panel of subjects providing acceptable answer variations had to have a score difference within the accepted limits if evaluated twice and all score differences (between the first and second assessment, the second and third, but also between the first and third) when assessed three times. A total of 128 subjects meets these criteria (44 evaluated twice and 84 evaluated three times), leaving 53 subjects (29.3%) providing answers varying too much to be included.
3.6. Objective SenS evaluation can be achieved using colour and hyperspectral information of facial images
Several attempts have been made to objectively predict SenS based on one or a few skin parameters. We previously showed that a multi‐layer perceptron (MLP) neural network can do so using several facial parameters. 33 Still, this first attempt relied on 90 subjects whose SensS variation over time has not been assessed. Therefore, we performed a similar approach on the panels of subjects, providing consistent answers and the panel of subjects presenting acceptable reply variations.
To construct the most robust predictive model possible, we included as many subjects providing consistent answers as possible. Thus, we took advantage of the panel of 117 subjects whose replies vary by ± 3, rather than the 94 subjects providing ± 2 consistent answers. Ninety‐eight (82.3%) were included in the training set, and the remaining 21 subjects were included in the validation set. We also started by using, as input, the average values from the left and right ROIs: the average value from the left and right nasolabial folds of the front face image, as well as the average values of the ROIs from the left and right facial side images. In that case, the MLP neural network leads to a model in which the R2 between ground truth and the predicted instrumental SenS index is 0.971 on the training set and 0.969 on the validation set (Figure 4). On the validation set, the confusion matrix reveals a 92.9% overall precision of the non‐SenS/SenS classification. Besides, the mean relative difference between ground truth and the predicted instrumental SenS index from the validation set is 0.12, indicating a very slight tendency of the model to overestimate SenS scores.
FIGURE 4.

Correlation between the SenS score from the four‐item questionnaire and the instrumental SenS index predicted by MLP neural network model in the case of subjects whose answers vary according to the ± 3 consistency criterion. (A) Training set. (B) Validation set.
We also computed MLP neural network models considering the left or right ROIs independently. These models present characteristics very similar to those obtained when using the mean values of the left and right ROIs, yet with a slightly decreased correlation and overall precision, as well as a slight tendency to underestimate subjects’ SenS scores (Table 4).
TABLE 4.
Main parameters of MLP neural network models obtained when considering left and right ROIs independently.
| Left ROIs | Right ROIs | |||
|---|---|---|---|---|
| Training set | Validation set | Training set | Validation set | |
| R 2 a | 0.889 | 0.825 | 0.907 | 0.932 |
| Overall precision b | 88.1% | 88.1% | 88.1% | 90.8% |
| Mean relative difference c | −0.08 | −0.48 | −0.06 | −0.42 |
R 2 is between ground truth and predicted score.
Overall precision of non‐SenS/SenS prediction between ground truth and predicted score.
Mean difference between ground truth and predicted scores.
Finally, we computed an MLP neural network model using the 128 subjects from the panel of subjects presenting acceptable answer variations, 107 (83.6%) being included in the training set while the others (21, the same subjects as for the panel of subjects presenting ± 3 consistent replies) were in the validation set. This model was only calculated using the mean values of left and right ROIs. Results (Figure 5) show that if the R 2 obtained between ground truth and the predicted instrumental SenS index is 0.959 for the training set, it decreases to 0.795 for the validation set. Nevertheless, the mean relative difference between ground truth and predicted score remains low for both sets (−0.04 for the training set and 0.033 for the validation set), and the overall accuracy of the non‐SenS/ SenS classification is similarly high in both sets (93.0% for the training set and 92.7% for the validation set).
FIGURE 5.

Correlation between the SenS score from the four‐item questionnaire and the instrumental SenS index predicted by MLP neural network model in the case of the subjects presenting acceptable answer variations. (A) Training set. (B) Validation set.
3.7. The instrumental SenS index can reveal the soothing effect of a cosmetic cream
Having an objective evaluation tool for SenS, we tested if it was possible to detect the effect of a cosmetic cream claiming a soothing effect. To do so, we selected 30 subjects meeting the ± 3 consistency criterion and evaluated how topical applications of the soothing cosmetic influence their SenS, comparing this effect to that of a placebo cream.
Considering all 30 subjects (Figure 6A), the 21‐day application of the placebo cream does not lead to any significant change in the instrumental SenS index (−1%, p = 1.000), while the soothing cream decreases the SenS index by 7%, yet non‐significantly (p = 0.2621).
FIGURE 6.

Variation of the instrumental SenS index upon topical application of the placebo or soothing cream over 21 days. (A) In the case of the entire cohort analysed. (B) For subjects presenting an average left and right baseline instrumental SenS index over 5. Results are presented as mean ± SEM with ** p < 0.01.
As the baseline average left and right SenS index of the 30 subjects ranges from 1.06 to 9.53 (mean ± SD = 5.50 ± 2.77), we hypothesize that only a subset of the subjects could account for the trend in the decreased SenS. We, therefore, analyzed groups of increasing size by including subjects with increasing baseline average SenS index. Starting with a group of subjects whose baseline average SenS index ranges from 1 to 4, we analyzed groups including subjects whose baseline average SenS index ranges from 1 up to 8. None of these groups reveals any significant difference between the soothing and the placebo cream. Even excluding from the analysis non‐SenS subjects (SenS index below 2) who are not likely to perceive any change does not modify the outcome of these analyses.
Thus, we focused on subjects with high baseline average SenS index. When considering subjects with an index over 7 (11 subjects out of 30), the soothing cream induces a significant SenS index decrease (8.87 ± 0.22 at D0 vs. 7.15 ± 0.62 at D21, −24.2%, p = 0.0082), while the placebo cream does not (7.67 ± 0.21 at D0 vs. 7.20 ± 0.70 at D21, −6.6%, p = 0.5443). Similar results are achieved for the 15 subjects presenting a baseline average SenS index over 6 (−22.4%, p = 0.0084 for the soothing cream vs. −10.3%, p = 0.4887 for the placebo cream). It is also true for the 19 subjects with an average baseline SenS index over 5 (Figure 6B), for whom the soothing cream induces a 15.1% reduction in the SenS index (p = 0.0406), while the 3.6% reduction in the index achieved with the placebo cream is not significant (p = 0.8906). Including more subjects with lower baseline SenS index results in non‐significant variation between both creams.
4. DISCUSSION
While subjective assessment of SenS is a pertinent approach, many questionnaires have been elaborated to classify subjects as sensitive versus non‐sensitive. Only a few were designed to grade SenS perception, 3 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 but a single one showed it can monitor SenS changes upon repeated applications of a soothing cream. 13 Based on this analysis, we previously took advantage of a 25‐item questionnaire evaluating the intensity of reaction to different inducing factors. 35 The conversion of replies into numerical values resulted in a 0–10 SenS score that accurately predicts assessors' sensitive versus non‐sensitive binary classification and provides an improved description of SenS severity. Having access to an enlarged group of subjects, the first point was to confirm this initial finding. A few changes were brought to the approach: reducing the number of questions from 25 to four to simplify subjects' assessment and make it more robust by limiting the variability of replies, giving a weight of two to the questions relating to reactions to cosmetics due to their importance in eliciting SenS. 3 , 18 , 36 These changes led to a total score ranging from 0 to 15. With such an approach, a 92.9% overall accuracy was achieved with the SenS status defined by assessors, reaching an almost perfect classification of SenS subjects. Besides, with a threshold of 2 and higher to classify subjects as sensitive, it improves the SenS level grading.
Having an easy‐to‐use and accurate questionnaire to assess SenS, we analyzed subjects' repeatability and reproducibility. This aspect has been little studied even if self‐assessment being subjective, it can only lead to some variability. Yet, using a different self‐assessment questionnaire, a previous study showed a reasonable agreement in replies 3 weeks apart with a Cohen's kappa value of 0.7. 18 We were surprised to find that, even with the four‐item simple questionnaire, only a minority of subjects provided repeatable answers and that most of those who did were non‐SenS subjects. Therefore, a subject stating the absence of SenS is more likely to be repeatable, while the severity of the symptoms SenS subjects feel is prone to variability.
To circumvent this low repeatability, subjects providing consistent answers were identified. These subjects are those whose replies vary by a maximum score difference of ± 2 or ± 3 at any given time. Allowing such a score variability led to the inclusion of a few subjects whose status shifted from SenS to non‐SenS and vice versa. Yet, the interest is that these panels encompass a majority of SenS subjects whose scores slightly vary but who essentially remained accurately classified. This also holds for the panel of subjects showing acceptable answer variations, as greater score variability is only allowed for higher SenS. Therefore, these panels are of interest to analyze the evolution of SenS over time, whether due to the application of a soothing product or not. Indeed, working with a cohort including subjects whose replies significantly vary over time makes it difficult to highlight any change, as it can only be done if the effect largely surpasses the variability of answers. Minimizing this variability will make highlighting changes in SenS severity easier. Thus, the panel the most adapted for a study will depend on the volunteer panel size that can be screened, the cohort size to evaluate, and the expected level of the effect.
The variability inherent to the subjective assessment of SenS pinpoints the need for robust objective methods. In a preliminary work, we used parameters extracted or computed from hyperspectral images to construct a predictive model of the four‐item SenS score using an MLP neural network. 35 To increase the chance of success, this initial model used all possible color parameters from the hyperspectral images of 90 subjects whose answers variability was not evaluated. We, therefore, performed a similar experiment with the 117 subjects matching the ± 3 consistency criterion and the 128 subjects presenting acceptable answer variations, which are large enough to create robust training sets and reasonable validation sets. Only the most relevant non‐colinear parameters from hyperspectral images were used, and enlarged facial regions were considered to improve the reliability of input parameters. Using such an approach with data from the entire face of subjects matching the ± 3 consistency parameter led to an MLP neural network model presenting a higher coefficient of determination on the validation set (R 2 = 0.97) than in the previous study (R 2 = 0.81). 35 Besides, an MLP model based on full‐face data from subjects providing acceptable answer variations also led to a lower coefficient of determination of the validation set (R 2 = 0.91) than for subjects matching the ± 3 consistency parameter. These two results clearly indicate that selecting subjects whose perception of their SensS reasonably varies is essential to guarantee high‐quality results. It should also be noted that even if the best results are obtained using the entire face, those obtained with the left or right half‐face are still good enough, opening up the possibility of comparing the evolution of SenS on hemifaces, which can hardly be performed using a questionnaire‐based assessment.
Using an MLP neural network proved useful to elaborate pertinent objective predictive models of SenS. Several different parameters have been implicated as indicative of SenS. Increased TEWL and decreased skin hydration are two of them relating to the impaired skin barrier function of SenS subjects. 25 , 26 Skin color and blood flow have also been described as relevant despite conflicting results. 25 Considering the predictive power of the MLP neural network model, the parameters we used as input—color, color homogeneity, and hemoglobin levels—are sufficient. Nevertheless, this approach has a drawback. MLP neural network models being extremely complex, it is very difficult to know if the models use all parameters and what are their relative contributions. It is also almost impossible to determine how the different models relate. Therefore, such an approach gives no clue about the biological rationale and the visible signs that could help evaluate SenS and its severity from an assessor's point of view.
To test the relevance of comparing the evolution of objective assessment on both hemifaces, we compared changes in instrumental SenS index upon application of a soothing cosmetic cream on one hemiface to those occurring upon application of a placebo cream on the other hemiface. Analysis of the results on all subjects only led to a non‐significant SenS decrease with the soothing cream. Only subjects with the highest SenS present a significant improvement in their instrumental SenS index after 21 days of twice‐daily application. Yet, these results have been obtained on a limited cohort of 30 subjects and should be consolidated by performing a similar evaluation on a larger number of subjects. Besides, this limited number of subjects might also explain why the effect of the cream was highlighted in only a subset of the population analyzed. Nevertheless, results not only validate the use of the instrumental SenS index on hemifaces but also indicate that the soothing cream has a noticeable effect on highly sensitive subjects.
In conclusion, the four‐item questionnaire presented in this work not only enables predicting the SenS/non‐SenS binary classification of assessors but also enables grading its severity. Still, the success of a questionnaire‐based approach depends on subjects' replies and their variability. There is a real need for an objective assessment of SenS severity, which hyperspectral images and the MLP neural network model provide. This instrumental approach leads to continuous SenS scores accurately reflecting subjects' perceptions. This approach applies to the entire face but also enables comparing the evolution of SenS severity of both hemifaces, leading to new possibilities to highlight the effects of a soothing treatment, as we evidenced.
CONFLICT OF INTEREST STATEMENT
Juliette Rengot, Marie Cherel, and Elodie Prestat‐Marquis are full‐time employees of Newtone Technologies. Imke Meyer, Nathalie Chevrot, Marielle Le Maire, and Dominik Stuhlmann are full‐time employees of Symrise AG or Symrise SAS.
ACKNOWLEDGMENTS
The authors wish to thank IEC (Lyon, France) and Dermatech (Lyon, France) for recruiting subjects and performing the skin sensitivity assessments. They are also thankful to Dr Philippe Crouzet, PhD, Estium‐Concept, for providing scientific writing services.
The present study was co‐funded by Newtone Technologies, Symrise AG, and Symrise SAS.
Rengot J, Meyer I, Chevrot N, et al. From consistent subjective assessment of skin sensitivity severity to its accurate objective scoring. Skin Res Technol. 2024;30:e13635. 10.1111/srt.13635
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- 1. Misery L, Ständer S, Szepietowski JC, et al. Definition of sensitive skin: an expert position paper from the special interest group on sensitive skin of the international forum for the study of Itch. Acta Derm Venereol. 2017;97(1):4‐6. [DOI] [PubMed] [Google Scholar]
- 2. Misery L, Loser K, Ständer S. Sensitive skin. J Eur Acad Dermatol Venereol. 2016;30(Suppl 1): 2‐8. [DOI] [PubMed] [Google Scholar]
- 3. Duarte I, Silveira JEPS, Hafner MFS, Toyota R, Pedroso DMM. Sensitive skin: review of an ascending concept. An Bras Dermatol. 2017;92(4):521‐525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Seidenari S, Francomano M, Mantovani L. Baseline biophysical parameters in subjects with sensitive skin. Contact Dermatitis. 1998;38(6):311‐315. [DOI] [PubMed] [Google Scholar]
- 5. Draelos ZD. Sensitive skin: perceptions, evaluation, and treatment. Am J Contact Dermat. 1997;8(2):67‐78. [PubMed] [Google Scholar]
- 6. Chen W, Dai R, Li L. The prevalence of self‐declared sensitive skin: a systematic review and meta‐analysis. J Eur Acad Dermatol Venereol. 2020;34(8):1779‐1788. [DOI] [PubMed] [Google Scholar]
- 7. Loffler H, Dickel H, Kuss O, Diepgen TL, Effendy I. Characteristics of self‐estimated enhanced skin susceptibility. Acta Derm Venereol. 2001;81(5):343‐346. [DOI] [PubMed] [Google Scholar]
- 8. Farage MA, Miller KW, Wippel AM, Berardesca E, Misery L, Maibach H. Sensitive skin in the United States: survey of regional differences. Fam Med Med Sci Res. 2013;2(3):112. [Google Scholar]
- 9. Halvorsen JA, Olesen A B, Thoresen M, Holm JØ, Bjertness E, Dalgard F. Comparison of self‐reported skin complaints with objective skin signs among adolescents. Acta Derm Venereol. 2008;88(6):573‐577. [DOI] [PubMed] [Google Scholar]
- 10. Vanoosthuyze K, Zupkosky PJ, Buckley K. Survey of practicing dermatologists on the prevalence of sensitive skin in men. Int J Cosmet Sci. 2013;35(4):388‐393. [DOI] [PubMed] [Google Scholar]
- 11. Guinot C, Malvy D, Mauger E, et al. Self‐reported skin sensitivity in a general adult population in France: data of the SU.VI.MAX cohort. J Eur Acad Dermatol Venereol. 2006;20(4):380‐390. [DOI] [PubMed] [Google Scholar]
- 12. Gougerot A, Vigan M, Bourrain JL, et al. Le SIGL: un outil d’évaluation clinique des peaux réactives? Nouv Dermatol. 2007;26:13‐15. [Google Scholar]
- 13. Misery L, Jean‐Decoster C, Mery S, Georgescu V, Sibaud V. A new ten‐item questionnaire for assessing sensitive skin: the sensitive scale‐10. Acta Derm Venereol. 2014;94(6):635‐639. [DOI] [PubMed] [Google Scholar]
- 14. Buhé V, Vié K, Guéré C, et al. Pathophysiological study of sensitive skin. Acta Derm Venereol. 2016;96(3):314‐318. [DOI] [PubMed] [Google Scholar]
- 15. Pan Y, Ma X, Song Y, Zhao J, Yan S. Questionnaire and lactic acid sting test play different role on the assessment of sensitive skin: a cross‐sectional study. Clin Cosmet Investig Dermatol. 2021;14:1215‐1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Polena H, Chavagnac‐Bonneville M, Misery L, Sayag M. Burden of Sensitive Skin (BoSS) Questionnaire and current perception threshold: use as diagnostic tools for sensitive skin syndrome. Acta Derm Venereol. 2021;101:adv00606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Legeas C, Misery L, Fluhr JW, Roudot AC, Ficheux AS, Brenaut E. Proposal for cut‐off scores for sensitive skin on Sensitive Scale‐10 in a group of adult women. Acta Derm Venereol. 2021;101:adv00373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Corazza M, Guarneri F, Montesi L, Toni G, Donelli I, Borghi A. Proposal of a self‐assessment questionnaire for the diagnosis of sensitive skin. J Cosmet Dermatol. 2022;21(6):2488‐2496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Berardesca E, Farage M, Maibach H. Sensitive skin: an overview. Int J Cosmet Sci. 2013;35(1):2‐8. [DOI] [PubMed] [Google Scholar]
- 20. Darlenski R, Kazandjieva J, Fluhr JW, Maurer M, Tsankov N. Lactic acid sting test does not differentiate between facial and generalized skin functional impairment in sensitive skin in atopic dermatitis and rosacea. J Dermatol Sci. 2014;76(2):151‐153. [DOI] [PubMed] [Google Scholar]
- 21. Bowman JP, Floyd AK, Znaniecki A, et al. The use of chemical probes to assess the facial reactivity of women, comparing their self‐perception of sensitive skin. J Cosmet Sci. 2000;51(5):267‐273.12. [Google Scholar]
- 22. Cho HJ, Chung BY, Lee HB, et al. Quantitative study of stratum corneum ceramides contents in patients with sensitive skin. J Dermatol. 2012;39 (3):295‐300. [DOI] [PubMed] [Google Scholar]
- 23. Hernández‐Blanco, et al. Prevalence of sensitive skin and its biophysical response in a Mexican population. World J Dermatol. 2013;2(1):1‐7. [Google Scholar]
- 24. Ding DM, Tu Y, Man MQ, et al. Association between lactic acid sting test scores, self‐assessed sensitive skin scores and biophysical properties in Chinese females. Int J Cosmet Sci. 2019;41(4):398‐404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Richters R, Falcone D, Uzunbajakava N, Verkruysse W, van Erp P, van de Kerkhof P. What is sensitive skin? A systematic literature review of objective measurements. Skin Pharmacol Physiol. 2015;28(2):75‐83. [DOI] [PubMed] [Google Scholar]
- 26. An S, Lee E, Kim S, et al. Comparison and correlation between stinging responses to lactic acid and bioengineering parameters. Contact Dermatitis. 2007;57(3):158‐162. [DOI] [PubMed] [Google Scholar]
- 27. Farage MA, Katsarou A, Maibach HI. Sensory, clinical and physiological factors in sensitive skin: a review. Contact Dermatitis. 2006;55(1):1‐14. [DOI] [PubMed] [Google Scholar]
- 28. Zha WF, Song WM, Ai JJ, Xu AE. Mobile connected dermatoscope and confocal laser scanning microscope: a useful combination applied in facial simple sensitive skin. Int J Cosmet Sci. 2012;34(4):318‐321. [DOI] [PubMed] [Google Scholar]
- 29. Ma YF, Yuan C, Jiang WC, Wang XL, Humbert P. Reflectance confocal microscopy for the evaluation of sensitive skin. Skin Res Technol. 2017;23(2):227‐234. [DOI] [PubMed] [Google Scholar]
- 30. Vergnaud H, Cherel M, François G, et al. Lip color measurement: a new hyperspectral imaging device. Skin Res Technol. 2023;29(8):e13418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Seroul P, Hébert M, Jomier M. Hyperspectral imaging system for in‐vivo quantification of skin pigments. Proceedings of the 28th IFSCC Congress . 2014:213‐232.
- 32. Seroul P, Hébert M, Cherel M, Vernet R, Clerc R, Jomier M. Model‐based skin pigment cartography by high‐resolution hyperspectral imaging. J Imaging Sci Technol. 2016;60(6):060404‐1‐060404‐7. [Google Scholar]
- 33. Nkengne A, Robic J, Seroul P, Gueheunneux S, Jomier M, Vié K. SpectraCam®: A new polarized hyperspectral imaging system for repeatable and reproducible in vivo skin quantification of melanin, total hemoglobin, and oxygen saturation. Skin Res Technol. 2018;24(1):99‐107. [DOI] [PubMed] [Google Scholar]
- 34. Murtagh F. Multilayer perceptrons for classification and regression. Neurocomputing. 1991;2(5‐6):183‐197. [Google Scholar]
- 35. Rengot J, Stuhlmann D, Meyer I, et al. Exploring sensitive skin to design reliable measurements. Skin Res Technol. 2023;29(10):e13449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Richters RJ, Uzunbajakava NE, Hendriks JC, Bikker JW, van Erp PE, van de Kerkhof PC. A model for perception‐based identification of sensitive skin. J Eur Acad Dermatol Venereol. 2017;31(2):267‐273. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
