Abstract
Background
The Consultation and Relational Empathy Measure is an internationally used five point each and 10-item questionnaire that assesses a physician’s empathy from the patient’s perspective. A 2-item version that uses item 6 “Showing care and compassion” and item 9 “Helping you to take control” from the Japanese 10-item version without changing the text was preliminarily developed through secondary data analysis.
Objective
To examine the validity and reliability of the Japanese 2-item version.
Methods
Selectively sampled patients who visited 11 collaborating general practitioners working in primary care clinics in both urban and rural areas of the Tokai region of Japan completed the 2-item and 10-item versions, and a patient background questionnaire. Face validity, criterion validity, construct validity, and internal consistency were examined. The 10-item version was used to assess the criterion validity of the 2-item version. Inter-rater reliability was examined using generalizability theory. The correlation between each item in the 2-item version was identified. Missing data analysis was performed.
Results
Among all 349 participating patients, questionnaires completed by 347 patients who gave clear consent to participate in the study were analyzed. The 2-item version showed high face validity, with few missing and “does not apply” values. Criterion validity was identified with strong correlations between the 2-item and 10-item scores (Pearson’s correlation coefficient 0.805, P < 0.001), along with construct validity with correlations between the 2-item score and patient satisfaction (Spearman’s rho 0.583, P < 0.001). The two items showed acceptable internal consistency (Cronbach’s α 0.919). Eighty-five patients were required to ascertain inter-rater reliability of the 2-item measure. A mean score of 7.5 and a score range of 7–8 was estimated in the 2-item total. A strong positive correlation was identified between each item in the 2-item version (Pearson’s correlation coefficient 0.852, P < 0.001). Missing data analysis revealed that the 2-item version, 10-item version, and consultation characteristics were missing completely at random, except for patient characteristics.
Conclusions
Although the high number of patients required per doctor to reliably discriminate between doctors’ empathy may limit its feasibility in practice and careful interpretation is warranted, the Japanese 2-item version has high validity and internal reliability.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12875-025-03018-2.
Keywords: Empathy, Surveys and questionnaires, Japan, Primary care, General practice, Patients
Background
Empathy in healthcare has a positive effect on patients’ health [1]. Empathy has various definitions, such as the understanding and response of healthcare professionals to the thoughts and feelings of others [1]. Physician empathy is fundamental to patient–physician communication and influences patient satisfaction [2, 3]. Because physician empathy is not necessarily related to physicians’ views of themselves, it is important to evaluate it from the patient’s perspective [4, 5].
The original English version of the Consultation and Relational Empathy (CARE) Measure was developed in 2004 as a questionnaire to assess physician empathy from the patient’s perspective [6]. The questionnaire was developed as a 10-item, 50-point process measure based on a 5-point Likert scale from “Poor” to “Excellent,” originally intended for use with general practitioners (GPs) in primary care [6, 7] (Supplementary Fig. 1). The empathy that patients experience from their GP is a predictor of patient well-being [8]. GPs’ empathy can lower the incidence of coronary events in patients with type 2 diabetes [9]. Various translated versions of the CARE Measure for GPs in primary care have been developed, such as Chinese, Japanese, Croatian, Dutch, Spanish, and Arabic versions [7, 10–15]. We developed the Japanese version in 2014 and examined its inter-rater reliability in 2018 [11, 16].
As the 10-item measure is considered practical for use, simplified versions have not been developed, except for a 5-item version for children [7, 17]. However, for clinical empathy, there is a need for a simpler assessment questionnaire, from the perspectives of both the patient and the general practitioner [18]. Efforts have been made to create shorter versions to reduce respondent burden in consideration of respondents’ physical conditions [19]. Physical difficulties in performing actions such as reading a paper questionnaire and writing or ticking answers will impede completion of the questionnaire [20]. As the main recipients of primary care are older people, shorter versions of the questionnaire have also been developed considering older people’s potential limitations [21–24]. Moreover, the 10-item Japanese version has overlap between questionnaire items as indicated by a high value of Cronbach’s α (0.984) [11, 25]. As Cronbach’s α in the range of 0.70–0.95 suggests item consistency at acceptable levels, this higher value implies item redundancy, with different items testing the same factor but in different guises [25, 26]. We thus conducted a secondary data analysis of the 10-item Japanese version to explore candidate questionnaire items comprehensively for inclusion in a shorter version with a focus on two aspects: (i) correlation between the total score on each candidate questionnaire and the 10-item total score, and (ii) the values of Cronbach’s α [25]. The values of (i) and (ii) for all of the 1,023 possible combinations, obtained by decomposing the 10 items without changing the text of the 10-item questions, were examined. Expert opinion was considered in terms of their higher correlation with 10-item total scores, higher Cronbach’s alpha values, fewer question items, and validity in terms of clinical use. After that, we preliminarily found that item 6 “Showing care and compassion” and item 9 “Helping you to take control” in the 10-item Japanese version could be reasonable candidates for a 2-item version [25]. (Supplementary Fig. 1 is the complete 10-item version of the CARE measure, and items 6 and 9 in the 10-item version are candidate questions for a 2-item version.) However, to develop a shorter version, testing should be performed on the short-form candidate items in isolation to avoid any influence of the excluded questionnaire items because every completed questionnaire influences the respondent’s understanding [28].
Methods
Study aim
This study was implemented to examine the validity and reliability of the 2-item Japanese version of the CARE Measure by obtaining new data.
Study design
We conducted a cross-sectional study with a questionnaire survey.
Setting of the study
Ten clinics practicing primary care in the Tokai region of Japan were invited to participate. Six clinics agreed to collaborate. Of 17 doctors affiliated with these clinics, 11 (64.7%) became collaborating GPs. They were categorized according to the following characteristics: establishment body (public or private), location (urban or rural), sex, years of clinical experience, and qualification of family medicine specialists certified by the Japanese Primary Care Association [29].
Questionnaires were collected between January and March 2023. The first author (NT) visited all clinics to provide a study overview to the collaborating GPs and medical staff. The GPs selected patients to be candidate participants in the research after performing a consultation with the patient and determining that they matched the study’s inclusion criteria, while medical staff provided the patients with an instruction sheet and questionnaire in paper form. Exclusion criteria were patients with conditions such as dementia, psychiatric disorders, or severe acute illnesses. After completing the questionnaire, the patients submitted it to the clinic anonymously. The questionnaires were kept by the clinic and collected by the authors (NT, HN) at a later date. The patients could also initially send the questionnaires by post to the researchers’ office. However, this option was discontinued owing to the administrative issue that this would complicate the procedure of paying rewards to the participants. All distributed questionnaires answered by 349 patients were collected; 347 were collected at the clinics, while 2 were sent in by post. Two patients who did not indicate their willingness to participate were excluded. Overall, 336 patients (96.8%) completed the 2-item version. For one patient, identification of the GP was impossible.
Inclusion criteria of participants
Patients aged 18 years or older who were selectively sampled were included in this study; those who met the exclusion criteria were excluded. It was up to the GP in charge to decide which patients to contact among those who were eligible. The number of diseases afflicting the patients was not considered in the process of participant recruitment. Each patient was only allowed to submit one questionnaire.
Variables
Patients completed the 2-item and 10-item versions of the measure in that order. Responses to the 2-item version could not be modified upon starting the 10-item version. Patients then completed the patient background questionnaire, providing information such as on the doctor–patient relationship, and the patient’s sex, age, and family environment. Patients also indicated the time at which they started filling in each questionnaire. Patients were allowed to not complete the patient background questionnaire after completing the 2-item and 10-item versions of the measure.
The collected data were converted to digital data and checked by an assistant. As part of the data cleaning process, typographical errors and omissions were corrected.
Sample size
The target number of GPs was set at 10 based on the previous Japanese version of the CARE Measure study [11]. The number of questionnaires per GP was determined based on a preliminary survey of the 2-item version that estimated 45 questionnaires were needed to ensure adequate inter-rater reliability [25]. When the 10-item Japanese version was developed, all questions were answered appropriately in approximately 80% of the questionnaires [11]. We thus set a target of collecting 60 questionnaires from each GP. However, we terminated the collection for administrative reasons despite GPs not having reached the target number.
Statistical methods
Validity of the questionnaire. The “does not apply” and missing values for each item of the 2-item version were examined to assess face validity for judging the appropriateness of the questionnaire [27]. The following evaluations were conducted for the questionnaire items and patient background questionnaire.
Criterion validity is an approach to examine the correlation between an existing scale established as a gold standard and a new scale [27]. Although not established, the 10-item version of the CARE Measure was considered the gold standard for the 2-item version of the CARE Measure in this study. Criterion validity was assessed by examining the correlation between the 2-item and 10-item versions (Pearson’s correlation coefficient, two-tailed, 5% significance level). We measured both criterion and construct validities because the use of the 10-item version of the measure is not a well-established way of examining criterion validity for the development of a simplified version.
Construct validity is an approach that attempts to assess the validity of a measure by using hypotheses about its constructs in a validating manner when no other scale has been established [27]. It is well known that patients’ perception of a physician’s empathy is related to patient satisfaction [3]. Correlations with patient satisfaction have been widely used to assess construct validity in the CARE Measure [11, 17, 30]. In the present study, construct validity was evaluated using the correlation coefficients between the 2-item version and overall satisfaction or its supplementary items [number of consultations, consultation time (minutes), satisfaction with consultation time, whether the patient knew their doctor, and whether they would recommend their doctor to family or friends] (Spearman’s rho, two-tailed test, 5% significance level). Especially for the supplementary item of whether or not the patients were willing to make regular visits to the clinic, a group comparison was conducted on the 2-item version (Kruskal–Wallis test with independent samples, 5% significance level). Patient satisfaction was used because previous studies of this measure revealed a high correlation between the 10-item score and patient satisfaction [10, 11, 14, 25]. Spearman’s rho and Kruskal–Wallis test with independent samples were used because patient satisfaction scores were not normally distributed [11, 17, 25, 30].
Questionnaire reliability. Cronbach’s α of the 2-item version was examined for internal consistency. Inter-rater reliability was examined using generalizability theory. Generalizability theory was developed from classical test theory, consisting of three steps: (1) one-way analysis of variance (ANOVA) to identify all factors potentially causing measurement error, (2) Generalizability (G) study to obtain intra-cluster correlation (ICC) [i.e., generalizability (G) coefficient], and (3) Decision (D) study to estimate the number of items satisfying an adequate ICC [27]. To measure inter-rater reliability, we listed the variance component σ2GP among GPs in the 2-item version and the chance error variance σ2P generated by the patients, based on previous research on variance components [7, 16, 25]. The following equation holds for this study on generalizability theory.
![]() |
The following equation for σ2P is known to hold with σ2 as the variance within each patient’s 2-item score and n as the number of questionnaires [7].
![]() |
We used one-way ANOVA to calculate σ2GP, σ2, and the effect size (partial η2). We then used the G study to estimate the generalizability coefficient when the “n” number of questionnaires was the harmonic mean. We used the D study to estimate ICCs for various numbers of questionnaires. Based on previous studies examining the measure’s inter-rater reliability, in this study it was assumed that the number of questionnaires meeting ICC of 0.8 would ensure adequate inter-rater reliability [7, 16]. To perform interval estimation with respect to the questionnaire scores, the number of questionnaires calculated by the D study was used to calculate a score that could distinguish the top and bottom 25% of GPs [7, 16].
Factor analysis
Confirmatory factor analysis (CFA) is performed to ensure a factor structure is the same as a hypothesized one [27]. CFA requires a minimum of three questionnaire items to ensure reliable results, so it does not fit with a 2-item questionnaire [31]. Therefore, the correlation between each item in the 2-item version was identified by an alternative approach (Pearson’s correlation coefficient, two-tailed, 5% significance level) instead of performing CFA. In addition, exploratory factor analysis is usually conducted when the factor structure of the questionnaire items is not yet known [27]. The CARE Measure has been identified as having a single factor in both the 10-item English and Japanese versions [11, 30]. Two of the 10 items, questions 6 and 9, were identified as candidates for simplification without reworking the text [25]. As it was evident that the number of factors was 1, exploratory factor analysis was not conducted.
Supplementary analysis
Comparisons of the 2-item score by patient characteristics (age, sex, marital status, educational level) and GP characteristics (establishment category, location, sex, years of clinical experience, family medicine specialist certification) were performed by Mann–Whitney’s U test and Kruskal–Wallis test (both at the 5% level of significance, Bonferroni correction). The times to complete both the 2-item and 10-item versions were calculated and compared by Wilcoxon’s signed rank test (5% significance level).
For the 10 items, we calculated each GP’s score, examined the correlation coefficient between the 10-item score and overall satisfaction, and examined inter-rater reliability, and used the same method as for the 2-item version.
The correlation coefficient between the 2-item score and the scores of questions 6 and 9 from the 10-item version was determined (Pearson’s correlation coefficient, two-tailed, significance level 5%). The overall agreement was calculated after creating a cross-tabulation table.
Statistical analyses were conducted using SPSS Ver. 28 (IBM, Armonk, NY, USA) for descriptive statistics, G-string IV for generalizability theory [32]. Missing data were excluded from the analysis only if they were missing for each variable. For the 2-item and 10-item versions of the questionnaire, the data with all items in place were adopted. Missing data analysis was performed by Little’s missing completely at random (MCAR) test (significance level 5%) [33] for the 2-item and 10-item versions, and by patient and consultation characteristics. Missing data analysis was performed using R (version 4.4.1), with programming script writing assisted by ChatGPT (4o, OpenAI).
Results
Among all 349 participating patients, the questionnaires completed by 347 patients who gave clear consent to participate in the study were analyzed.
Patient characteristics
Patients’ mean age was 67.41 years [standard deviation (SD) 14.55], 55.6% were women, 63.4% were married, 42.4% were high school graduates, and 42.1% were in employment (Table 1).
Table 1.
Characteristics of 347 patients included in the study
| Age (years) | n | % | |
|---|---|---|---|
| ≤ 39 | 18 | 5.2 | |
| 40–69 | 137 | 39.5 | |
| > 69 | 187 | 53.9 | |
| Missing | 5 | 1.4 | |
| Sex | |||
| Male | 150 | 43.2 | |
| Female | 193 | 55.6 | |
| Missing | 4 | 1.2 | |
| Marital status | |||
| Single | 52 | 15.0 | |
| Married | 220 | 63.4 | |
| Separated | 2 | 0.6 | |
| Divorced | 10 | 2.9 | |
| Widowed | 53 | 15.3 | |
| Missing | 10 | 2.9 | |
| Educational level | |||
| Junior high school | 52 | 15.0 | |
| High school | 147 | 42.4 | |
| Vocational college | 33 | 9.5 | |
| Junior college | 25 | 7.2 | |
| University | 73 | 21.0 | |
| Graduate school | 6 | 1.7 | |
| Others | 2 | 0.6 | |
| Missing | 9 | 2.6 | |
| Employment status | |||
| Employed (full- or part-time, including self-employed) | 146 | 42.1 | |
| Unemployed or looking for work | 34 | 9.8 | |
| Unable to work because of long-term sickness or disability | 3 | 0.9 | |
| Retired from paid work | 59 | 17.0 | |
| Looking after one’s home/family | 64 | 18.4 | |
| At school or in full-time education | 3 | 0.9 | |
| Others | 9 | 2.6 | |
| Missing | 29 | 8.4 | |
Consultation characteristics
Patient responses regarding their consultations are shown in Table 2. Thirty-four patients (9.8%) consulted with the GP for the first time, 96.2% of patients indicated that they visited their GP regularly or had already seen them several times, 82.7% of patients reported that the consultation lasted less than 15 min, and 82.5% of patients were very/completely satisfied with the consultation’s duration. Meanwhile, 57.7% of patients indicated that they knew their GP well/very well, and 83.9% of patients indicated that they would recommend their GP to family or friends. No one was dissatisfied regarding their overall satisfaction, which had a mean of 3.16 and SD of 0.58.
Table 2.
Characteristics of consultations with general practitioners by 347 patients
| n | % | ||
|---|---|---|---|
| Number of consultations with today’s doctor | |||
| First time | 34 | 9.8 | |
| 2–4 times | 36 | 10.4 | |
| 5 times or more | 220 | 63.4 | |
| Missing | 57 | 16.4 | |
| Preference for seeing today’s doctor as your usual doctor | |||
| Yes | 287 | 82.7 | |
| No | 2 | 0.6 | |
| I have previously seen today’s doctor | 47 | 13.5 | |
| Missing | 11 | 3.2 | |
| Duration of consultation | |||
| Less than 5 min | 110 | 31.7 | |
| 5–10 min | 136 | 39.2 | |
| 10–15 min | 41 | 11.8 | |
| 15–20 min | 23 | 6.6 | |
| 20 min or more | 24 | 6.9 | |
| Missing | 13 | 3.7 | |
| Patient satisfaction with the length of time with the doctor | |||
| Dissatisfied (1) | 0 | 0 | |
| Fairly satisfied (2) | 55 | 15.9 | |
| Very satisfied (3) | 215 | 62 | |
| Completely satisfied (4) | 71 | 20.5 | |
| Missing | 6 | 1.7 | |
| How well the patient knew the doctor | |||
| Not at all (1) | 22 | 6.3 | |
| Somewhat (2) | 41 | 11.8 | |
| Fairly well (3) | 72 | 20.7 | |
| Well (4) | 154 | 44.4 | |
| Very well (5) | 46 | 13.3 | |
| Missing | 12 | 3.5 | |
| Would you recommend the doctor to your family or friends? | |||
| Definitely not (1) | 0 | 0 | |
| Probably not (2) | 2 | 0.6 | |
| Not sure (3) | 46 | 13.3 | |
| Probably (4) | 207 | 59.7 | |
| Definitely (5) | 84 | 24.2 | |
| Missing | 8 | 2.3 | |
| Patient satisfaction with the consultation with the doctor | |||
| Dissatisfied (1) | 0 | 0 | |
| Fairly satisfied (2) | 33 | 9.5 | |
| Very satisfied (3) | 218 | 62.8 | |
| Completely satisfied (4) | 89 | 25.6 | |
| Missing | 7 | 2 | |
The 2-item Japanese version of the CARE measure
The distribution of scores was skewed toward higher values, but there was no ceiling effect (Table 3). The rates of “Does not apply” and “Missing” were also low for all questions (Does not apply: 0.3%, 0.9%, Missing: 2.0%, 1.4%, for questions 6 and 9, respectively). The mean score for the 2-item version was 7.5 (SD 1.67) out of a possible maximum score of 10. There was a strong correlation between the 2-item version and 10-item version of the CARE Measure in terms of total scores (Pearson’s correlation coefficient 0.805, P < 0.001). Scores for each GP are listed in Table S1.
Table 3.
Distributions of the 2-item version of the Japanese consultation and relational empathy (CARE) measure scores (n = 347)
| 1 Poor |
2 Fair |
3 Good |
4 Very Good |
5 Excellent |
Does not apply | Missing | Mean score | SD | |
|---|---|---|---|---|---|---|---|---|---|
|
6. Showing care and compassion (seeming genuinely concerned, connecting with you on a human level; not being indifferent or “detached”) |
5 | 10 | 104 | 152 | 68 | 1 | 7 | 3.79 | 0.85 |
| % | 1.4 | 2.9 | 30 | 43.8 | 19.6 | 0.3 | 2.0 | ||
|
9. Helping you to take control (exploring with you what you can do to improve your health yourself; encouraging rather than “lecturing” you) |
7 | 15 | 116 | 136 | 65 | 3 | 5 | 3.7 | 0.90 |
| % | 2 | 4.3 | 33.4 | 39.2 | 18.7 | 0.9 | 1.4 |
Abbreviation: SD, standard deviation
A strong correlation between the 2-item total score and overall satisfaction was identified (Spearman’s rho 0.583, P < 0.001). Spearman’s rho values of correlations between the 2-item total scores and the number of consultations, consultation time, satisfaction with consultation time, whether the patient knew the doctor, and whether the patient would recommend the doctor to family or friends were statistically significant at 0.269 (P < 0.001), 0.169 (P < 0.01), 0.551 (P < 0.001), 0.248 (P < 0.001), and 0.517 (P < 0.001), respectively. There was no statistically significant difference in the score regarding whether patients had a preference for seeing the particular doctor that they saw.
Cronbach’s α for the 2-item version was 0.919. For the inter-rater reliability study based on generalizability theory, one-way ANOVA revealed σ2GP of 0.127, σ2 of 2.678, and a moderate effect size (partial η2 = 0.069). The harmonic mean was 18.01, and G study revealed ICC of 0.461. Meanwhile, D study revealed that ICC exceeded 0.8 for 85 patients (Table 4). Interval estimation was then performed using 85 patients and revealed a mean score of 7.5 and a score range of 7–8 (Fig. 1). A strong positive correlation was identified between each item in the 2-item version (Pearson’s correlation coefficient 0.852, P < 0.001).
Table 4.
Generalizability theory decision study results for the 2-item version
| Number of patients per GP | ICC |
|---|---|
| 1 | 0.045 |
| 10 | 0.322 |
| 30 | 0.587 |
| 50 | 0.703 |
| 70 | 0.768 |
| 80 | 0.791 |
| 84 | 0.799 |
| 85 | 0.801 |
| 90 | 0.810 |
The analysis examined the number of patients with intra-cluster correlation (ICC) of > 0.8 (85).
Abbreviations: GP, general practitioner ICC, intra-cluster correlation
Fig. 1.

Interval estimates for the score of each GP on the validated 2-item Japanese CARE Measure
The vertical axis shows each general practitioner (GP)’s scores on the validated 2-item version of the Japanese version of the Consultation and Relational Empathy (Japanese CARE) Measure. The total score is 10. The horizontal axis represents each GP. Plots are interval estimates of the mean at n = 85 and whiskers indicate 95% confidence intervals. N = 85 was calculated using the generalizability theory Decision study. The broken lines show the standard range (7–8 points)
Supplementary analysis revealed no differences in 2-item total scores by patient characteristics (data not shown). There were also no significant differences in the 2-item total score by GP characteristics, such as the type of establishment, location, and sex. However, the group with 10 to 20 years of clinical experience and the group with family medicine specialist certification scored significantly higher than the other groups (Table S2). The median time to complete the 2 items was 1 min (interquartile range: 1–2 min) and that of the 10 items was 2 min (interquartile range: 1–3 min), which were significantly different (P < 0.001). The mean of the 10-item score was 38.4 (SD 7.35) (Table S1). There was a strong correlation between the 10 items and overall satisfaction (Spearman’s rho, 0.580, P < 0.001, n = 317). The results of the inter-rater reliability study for the 10 items are shown in Table S3. The 2-item total scores and total scores for items 6 and 9 among the 10 items were strongly positively correlated (Pearson’s correlation coefficient 0.768, P < 0.001). The overall agreement of these total scores was 56.7% (cross-tabulation table: Table S4).
Missing data analysis by Little’s MCAR test revealed that the 2-item version, 10-item version, and consultation characteristics were missing completely at random (P = 0.735, P = 0.363, and P = 0.06, respectively), although the patient characteristics were considered to be missing not at random (P = 0.01).
Discussion
Summarized key results
This study examined the validity and reliability of the 2-item Japanese version of the CARE Measure in primary care. The 2-item version showed high face validity as indicated by few missing or “does not apply” values, high construct validity as indicated by a strong correlation with overall satisfaction, and high criterion validity as indicated by a strong correlation with total 10-item score. The 2-item version also exhibited moderate internal consistency. However, inter-rater reliability study revealed that 85 patients would be needed to reliably discriminate between doctors’ mean CARE scores.
Validity of the 2-item questionnaire
The rates of missing and “does not apply” values were similar to those of the previous 10-item version administered to GPs in primary care, which ranged from 0 to 10% [7, 10–15]. The correlations between the total 2-item score and overall satisfaction were similar to those of the English 10-item version and the 5-item version for children (Spearman’s rho: 0.6 and 0.5968, respectively) [17, 30], although the correlations were lower than for the 10-item Japanese version (Spearman’s rho: 0.740) [11]. Therefore, the 2-item version may not adequately reflect patient satisfaction compared with the Japanese 10-item version. However, this supplementary analysis revealed that the correlations between the Japanese 10-item version and patient satisfaction decreased, even though the means of the 10-item score (38.41, SD 8.60, n = 272) and patient satisfaction (3.10, SD 0.73) in the previous study were similar to those here [11]. This study had greater proportions of elderly, high school graduates, and men the previous Japanese 10-item study, and these differences may have influenced the identified correlations.
The strong correlation between the scores of the 2-item and 10-item versions indicated that the 2-item scores were a reasonable predictor of the 10-item scores. However, the correlation coefficient between the two versions was lower than predicted [25]. Two possible reasons for this were proposed: (i) the two items in the preliminary study were included in the 10 items and (ii) respondents were unable to correct the response in the 2-item version, even when they noticed errors.
Reliability of the 2-item questionnaire
With regard to internal consistency, α values in this study were comparable to those estimated in the 2-item preliminary study and for the English 10-item version (Cronbach α: 0.920 and 0.92, respectively) [7, 25]. In addition, the correlation coefficients between the two items were also highly positive, suggesting that the two items measured the same characteristic. This characteristic was considered to be empathy, as in the originals [6, 11]. In the 10-item versions, items 1–6 reflect emotive components, while items 7–9 reflect cognitive/behavioral components [6]. Thus, items 6 and 9 used in the 2-item version are representative of the emotive and cognitive/behavioral components of empathy, respectively.
This study found that 85 patients completing the 2-item version are required to ensure an appropriate level of inter-rater reliability. This is more than the 50 patients for the original 10-item English version and 38 patients for the 10-item Japanese version [7, 16]. This result suggests that the 2-item version may have limited practical feasibility to distinguish individual GPs’ empathy. This may limit the use of the 2-item measure in ‘high-stakes assessment’ such as appraisal or revalidation of individual doctors, but this may be less of an issue if the measure is used to gain an overview of patients’ views on consultations within a particular clinic, rather than at the individual doctor level.
Generalizability and interpretation of the 2-item questionnaire
As the 2-item version created here was developed via a Japanese preliminary survey, it cannot be guaranteed that the findings of this study can be extrapolated to other language versions [25]. However, each translated version has a trend toward greater uniformity than the original English version [10, 13, 14, 34, 35]. On one hand, simplifying the questionnaire has the benefit of increasing the opportunities for its use on a daily basis as a screening tool while still conveying the key features of the full version of the questionnaire [6, 19, 24]. Reducing the number of questionnaire items may also reduce response burden, especially for older patients, and has the potential to increase the response rate, even if the reduction in time required to complete the questionnaire is marginal [36, 37]. On the other hand, this study revealed that the 2-item measure has the disadvantage of more questionnaires being needed per doctor to reliably discriminate between doctors’ empathy, which may increase the clinical burden associated with offering the questionnaire. Although the appropriate balance between patient burden and clinical burden in this context should be considered, this study has succeeded in showing high validity and reliability of the 2-item questionnaire and demonstrates the potential value of simplifying the measure to increase its utility.
Little’s MCAR test for missing data was statistically significant for patient characteristics, indicating that these characteristics were not missing at random. One possible explanation for this is that patients were unwilling to answer sensitive questions - for example, data were missing at a rate of 8.4% for the question asking about the patients’ employment, whereas the rates of missing values for other patient characteristics were less than 5%. However, we do not consider this to be problematic given that up to 10% missing values is generally considered acceptable [38].
Limitations
There are several limitations of this study. First, some GPs collected fewer questionnaires answered by patients than expected, which may have affected the robustness of the results. Second, selection bias may have occurred, with GPs who routinely provide empathetic care being more prone to participate. This bias may also have occurred among the study participants because the GPs did not consecutively invite every patient who was eligible; they could select patients with the expectation that they would receive a favorable score in the questionnaire. In particular, as this research did not collect the numbers of patients who were invited to participate by the GPs, the agreement rate of participating in the study was not clear. However, the 10-item total scores in this supplementary study were almost the same as in a previous study [11], which should assuage concerns about bias. This study was conducted on patients who visited a GP in a primary care setting in Japan. Regarding their characteristics, the patients’ age and sex ratio were more similar to those of outpatients in Japan in 2020 (41.5% above 70 years old, 42.7% men and 57.3% women) than in previous research in Japan [11, 39]. Third, this study did not show superiority of using the 2-item version over the 10-item version because we used the 10-item version as a gold standard when developing the shorter version. Although statistical significance was reached, there was only a small difference in the time required to complete the two questionnaires. The duration for completing the 10-item questionnaire in this study appeared to be shorter than the usual 5–10 min [12]. Further study comparing the 2-item and 10-item versions is needed. Finally, it should be noted that the two items selected for the short measure were decided by doctors, not by patients. Further work is required to see whether patients would select the same two items as being most important, and whether these choices may differ by patients’ age, sex, health conditions, and socioeconomic status, as has been recently done on the original English version of the CARE Measure [40].
Conclusions
The 2-item Japanese version of the CARE Measure was shown to be valid and reliable as a questionnaire for Japanese primary care, although careful interpretation of the research is warranted. As the 2-item version needs a large number of patients per doctor to reliably differentiate between GPs in terms of their empathy, the feasibility of using this version is limited, especially in a clinical context.
Supplementary Information
Acknowledgements
We would like to thank Ms. Atsuko Matsuda, Department of Education for Community-Oriented Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan, for checking the anonymized input data. We also thank Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.
Abbreviations
- ANOVA
Analysis of variance
- CFA
Confirmatory factor analysis
- D study
Decision study
- GP
General practitioner
- G study
Generalizability study
- ICC
Intra-cluster correlation
- MCAR
Missing completely at random
- SD
Standard deviation
- The CARE Measure
The Consultation and Relational Empathy Measure
Authors’ contributions
NT, TM, KT, MA, TK, MS, KM and NB contributed to conception and design of the work. NT, TM, HN, KT and JS contributed to data acquisition and analysis. NT, TM, HN, KT, MS, KM, MK, JS, SWM and NB contributed to interpretation of data. NT, TM, KT and SWM have drafted the work or substantively revised it. All authors approved the final version of the manuscript.
Funding
This work was supported by Japan Society for the Promotion of Science KAKENHI grants (JP20K10375, JP24K13338). The funding body had no involvement in the study design, data collection, administration, interpretation of the data, or writing of the paper.
Data availability
The data underlying this article will be shared on reasonable request to the corresponding author.
Declarations
Ethics approval and consent to participate
This study was conducted with the approval of the Ethics Review Committee to which the first author belongs (approval number: 2022 − 0355). Written informed consent to participate was obtained from all of the participants in the study. Patients expressed their willingness to participate in the study by checking a box on the front of the submission envelope. Patients received a 500-yen coupon as a reward. At the request of our institution, the participants were asked to sign their names on the receipts related to the payment of rewards to ensure the correct receipt of the rewards. The questionnaires and receipts were kept separately and were never linked.
Consent for publication
Not applicable.
Competing interests
NT, MS, and KM report that their affiliated institution was established by donations from Aichi Prefecture and Nagoya City, Japan. NT has received grants and personal fees outside the submitted work from Novartis Pharma K.K. These institutions had no involvement in the study design, data collection, administration, interpretation of the data, or writing of the paper. The remaining authors declare no conflicts of interest.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Nembhard IM, David G, Ezzeddine I, Betts D, Radin J. A systematic review of research on empathy in health care. Health Serv Res. 2023;58:250–63. 10.1111/1475-6773.14016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Derksen F, Bensing J, Lagro-Janssen A. Effectiveness of empathy in general practice: a systematic review. Br J Gen Pract. 2013;63:e76–84. 10.3399/bjgp13X660814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kim SS, Kaplowitz S, Johnston MV. The effects of physician empathy on patient satisfaction and compliance. Eval Health Prof. 2004;27:237–51. 10.1177/0163278704267037. [DOI] [PubMed] [Google Scholar]
- 4.Bernardo MO, Cecílio-Fernandes D, Costa P, Quince TA, Costa MJ, Carvalho-Filho MA. Physicians’ self-assessed empathy levels do not correlate with patients’ assessments. PLoS ONE. 2018;13(5):e0198488. 10.1371/journal.pone.0198488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hermans L, Olde Hartman TC, Dielissen PW. Differences between GP perception of delivered empathy and patient-perceived empathy: a cross-sectional study in primary care. Br J Gen Pract. 2018;68:e621–6. 10.3399/bjgp18X698381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mercer SW, Maxwell M, Heaney D, Watt G. The consultation and relational empathy (CARE) measure: development and preliminary validation and reliability of an empathy-based consultation process measure. Fam Pract. 2004;21:699–705. 10.1093/fampra/cmh621. [DOI] [PubMed] [Google Scholar]
- 7.Mercer SW, McConnachie A, Maxwell M, Heaney D, Watt GC. Relevance and practical use of the consultation and relational empathy (CARE) measure in general practice. Fam Pract. 2005;22:328–34. 10.1093/fampra/cmh730. [DOI] [PubMed] [Google Scholar]
- 8.Mercer SW, Higgins M, Bikker AM, Fitzpatrick B, McConnachie A, Lloyd SM, et al. General practitioners’ empathy and health outcomes: a prospective observational study of consultations in areas of high and low deprivation. Ann Fam Med. 2016;14:117–24. 10.1370/afm.1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dambha-Miller H, Feldman AL, Kinmonth AL, Griffin SJ. Association between primary care practitioner empathy and risk of cardiovascular events and all-cause mortality among patients with type 2 diabetes: a population-based prospective cohort study. Ann Fam Med. 2019;17:311–8. 10.1370/afm.2421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fung CS, Hua A, Tam L, Mercer SW. Reliability and validity of the Chinese version of the CARE measure in a primary care setting in Hong Kong. Fam Pract. 2009;26:398–406. 10.1093/fampra/cmp044. [DOI] [PubMed] [Google Scholar]
- 11.Aomatsu M, Abe H, Abe K, Yasui H, Suzuki T, Sato J, et al. Validity and reliability of the Japanese version of the CARE measure in a general medicine outpatient setting. Fam Pract. 2014;31(1):118–26. 10.1093/fampra/cmt053. [DOI] [PubMed] [Google Scholar]
- 12.Hanzevacki M, Jakovina T, Bajic Z, Tomac A, Mercer S. Reliability and validity of the Croatian version of consultation and relational empathy (CARE) measure in primary care setting. Croat Med J. 2015;56:50–6. 10.3325/cmj.2015.56.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.van Dijk I, Scholten Meilink Lenferink N, Lucassen PL, Mercer SW, van Weel C, Olde Hartman TC, et al. Reliability and validity of the Dutch version of the consultation and relational empathy measure in primary care. Fam Pract. 2017;34:119–24. 10.1093/fampra/cmw116. [DOI] [PubMed] [Google Scholar]
- 14.García del Barrio L, Rodríguez-Díez C, Martín-Lanas R, Costa P, Costa MJ, Díez N. Reliability and validity of the Spanish (Spain) version of the consultation and relational empathy measure in primary care. Fam Pract. 2020;38:353–9. 10.1093/fampra/cmaa135. [DOI] [PubMed] [Google Scholar]
- 15.Al-Habbal K, Djoundourian A, Nassar E, Tayara Z, Mercer SW, Abi-Habib R. Reliability and validity of the Arabic version of the consultation and relational empathy (CARE) measure. Fam Pract. 2022;39:1176–82. 10.1093/fampra/cmac047. [DOI] [PubMed] [Google Scholar]
- 16.Matsuhisa T, Takahashi N, Aomatsu M, Takahashi K, Nishino J, Ban N, et al. How many patients are required to provide a high level of reliability in the Japanese version of the CARE measure? A secondary analysis. BMC Fam Pract. 2018;19:138. 10.1186/s12875-018-0826-2. [DOI] [PMC free article] [PubMed]
- 17.Arigliani M, Castriotta L, Pusiol A, Titolo A, Petoello E, Brun Peressut A, et al. Measuring empathy in pediatrics: validation of the visual CARE measure. BMC Pediatr. 2018;18:57. 10.1186/s12887-018-1050-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bennett-Weston A, Howick J. Patient and practitioner perspectives on the definition and measurement of therapeutic empathy: qualitative study. J Particip Med. 2025;17:e71610. 10.2196/71610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Broadbent E, Petrie KJ, Main J, Weinman J. The brief illness perception questionnaire. J Psychosom Res. 2006;60:631–7. 10.1016/j.jpsychores.2005.10.020. [DOI] [PubMed] [Google Scholar]
- 20.Ampt ES. Respondent burden. In: Jones P, Stopher PR, editors. Transport survey quality and innovation. Leeds: Emerald Group Publishing Limited; 2003. pp. 507 – 21. http://doi.10.1108/9781786359551-030.
- 21.World Health Organization. The Global Health Observatory: Explore a world of health data. Accessed December 31. 2023; Available from: https://www.who.int/data/gho/data/themes/topics/topic-details/GHO/world-health-statistics
- 22.Boeckxstaens P, De Graaf P. Primary care and care for older persons: position paper of the European forum for primary care. Qual Prim Care. 2011;19:369. [PubMed] [Google Scholar]
- 23.Hoyl MT, Alessi CA, Harker JO, Josephson KR, Pietruszka FM, Koelfgen M, et al. Development and testing of a five-item version of the geriatric depression scale. J Am Geriatr Soc. 1999;47:873–8. 10.1111/j.1532-5415.1999.tb03848.x. [DOI] [PubMed] [Google Scholar]
- 24.Gornemann I, Zunzunegui MV, Martínez C, del Carmen Onís M. Screening for impaired cognitive function among the elderly in spain: reducing the number of items in the short portable mental status questionnaire. Psychiatry Res. 1999;89:133–45. [DOI] [PubMed] [Google Scholar]
- 25.Takahashi N, Matsuhisa T, Takahashi K, Aomatsu M, Mercer SW, Ban N. A 2-item version of the Japanese consultation and relational empathy measure: a pilot study using secondary analysis of a cross-sectional survey in primary care. Fam Pract. 2022;39:1169–75. 10.1093/fampra/cmac034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Tavakol M, Dennick R. Making sense of cronbach’s alpha. Int J Med Educ. 2011;2:53–5. 10.5116/ijme.4dfb.8dfd. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Streiner DL, Norman GR, Cairney J. Health measurement scales: A practical guide to their development and use. 5th ed. New York, NY: Oxford University Press; 2015. [Google Scholar]
- 28.Smith GT, McCarthy DM, Anderson KG. On the sins of short-form development. Psychol Assess. 2000;12(1):102–11. 10.1037//1040-3590.12.1.102. [DOI] [PubMed]
- 29.Japan Primary Care Association. About the Japan Primary Care Association (April 2023). Accessed December 27. 2023; Available from: https://www.primarycare-japan.com/assoc/about/ab_index/#detail
- 30.Mercer SW, Murphy DJ. Validity and reliability of the CARE measure in secondary care. Clin Govern. 2008;13:269–83. 10.1108/14777270810912969. [Google Scholar]
- 31.DiStefano C, McDaniel HL, Zhang L, Shi D, Jiang Z. Fitting large factor analysis models with ordinal data. Educ Psychol Meas. 2019;79:417–36. 10.1177/0013164418818242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.MacMaster U. MacMaster Education Research, Innovation and Theory (MERIT) Program, Generalizability theory tool. Accessed December 31, 2023. Available from: https://merit.healthsci.mcmaster.ca/research/generalizability-theory-tool/
- 33.Little RJ. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc. 1988;83:1198–202. [Google Scholar]
- 34.Mercer SW, Fung CS, Chan FW, Wong FY, Wong SY, Murphy D. The Chinese-version of the CARE measure reliably differentiates between Doctors in primary care: a cross-sectional study in Hong Kong. BMC Fam Pract. 2011;12:1–9. 10.1186/1471-2296-12-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Park K-Y, Shin J, Park H-K, Kim YM, Hwang SY, Shin J-H, et al. Validity and reliability of a Korean version of the consultation and relational empathy (CARE) measure. BMC Med Educ. 2022;22:1–8. 10.1186/s12909-022-03478-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Crosta Ahlforn K, Bojner Horwitz E, Osika W. A Swedish version of the consultation and relational empathy (CARE) measure. Scand J Prim Health Care. 2017;35:286–92. 10.1080/02813432.2017.1358853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rolstad S, Adler J, Rydén A. Response burden and questionnaire length: is shorter better? A review and meta-analysis. Value Health. 2011;14:1101–8. 10.1016/j.jval.2011.06.003. [DOI] [PubMed] [Google Scholar]
- 38.Marino M, Lucas J, Latour E, Heintzman JD. Missing data in primary care research: importance, implications and approaches. Fam Pract. 2021;38:199–202. 10.1093/fampra/cmaa134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ministry of Health, Labour and Welfare. Patient survey overview in 2020. Accessed December 31. 2023. Available from: https://www.mhlw.go.jp/toukei/saikin/hw/kanja/20/index.html
- 40.Ng L, Sweeney KD, Mercer SW. Challenges in reducing the 10-item CARE Measure to a two-item version: comparison of patients’ preferences with psychometric evaluation in a cross-sectional survey in Scotland. BJGP Open. 2025. 10.3399/BJGPO.2025.0085. [DOI] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article will be shared on reasonable request to the corresponding author.


