Abstract
Objective:
To (1) characterize the agreement between patient and proxy responses on a multidimensional computerized adaptive testing (MCAT) measure of function, and to (2) determine whether patient, proxy, or MCAT score characteristics identify when a proxy report can be used as a substitute for patient report in clinical decision making.
Design:
A psychometric study of the Functional Assessment in Acute Care MCAT (FAMCAT) and its three scales (Applied Cognition, Daily Activity, and Basic Mobility).
Setting:
An Upper Midwestern quaternary academic medical center
Participants.
A total of 300 pairs of patients [average age 60.9 years (range 19 to 89)] hospitalized on general medical services or readmitted to surgical services for postoperative complications and their proxies [average age 60.5 years (range 20–88].
Intervention:
Not applicable.
Main Outcome Measures:
There were three outcomes:. 1) Agreement between patient and proxy scores on the FAMCAT domains, as well as age and gender, analyzed with univariate and multivariate analysis of variance (MANOVA); 2) Associations of patient-proxy relationship and FAMCAT score characteristics with patient-proxy score agreement; and 3) Presence of psychometrically significant intra-dyad differences in FAMCAT scores.
Results:
The results of the MANOVA and follow-up ANOVAs indicated that there were no statistically significant differences in FAMCAT scale scores between patient and proxy estimates for either the Daily Activity or Basic Mobility scales. There were significant differences for the Applied Cognition scale (p < .005) between mean patient and proxy scores, with proxies rating patients as functioning at a higher level (mean = 0.42) than patients did themselves (mean = 0.00). However, psychometrically significant intra-dyadic Applied Cognition score differences occurred in only 14% of dyads, compared to 25% in the other two scales. Gender and age were associated with patient-proxy agreement, but the patterns were not sufficiently consistent to permit generalizations regarding the likely validity of a proxy’s scores.
Conclusions:
Patient and proxy FAMCAT Daily Activity and Basic Mobility scores did not differ significantly, and proxy reporting offers a credible surrogate for patient report on these domains. Low rates of psychometrically significant intra-dyadic score differences suggest that proxy report may serve as a low resolution screen for functional deficits in all FAMCAT domains. Approximately half the proxies provided multi-domain profile ratings on the three scales that did not differ significantly from these of the associated patients, but more research is needed to identify situations in which proxy profiles could be used in place of those provided by patients.
Keywords: Outcomes Assessment (Health Care), Item-response theory, Proxy, FAMCAT
The administration and interpretation of patient-reported outcome measures (PROMs) are common elements of outpatient encounters and have become an important component of clinic-based care delivery. Despite this, their use in the inpatient setting, even when clinician rated as with the 6-Clicks,1–3 remains limited. This disjuncture is striking and potentially important as PROM-based assessments might be able to provide useful information that is currently not being systematically captured or requires a significant amount of clinical effort and cost to capture.
Clinical workflows on most hospital wards are saturated, affording little time for additional data collection or incentive for clinicians to adopt novel, potentially time intensive modes of gathering data.4 This reality partially explains why, despite wide acknowledgement that PROMs can accurately assess important determinants of health outcomes such as mood and function,5,6 they are inconsistently used in acute care. Pain appears to be a noteworthy exception. The current practice of using a simple 0 to 10 numerical rating scale to assess patients’ pain was adopted only on a widespread basis following a Joint Commission on Accreditation of Healthcare Organization’s mandate. The result has been improved rates of pain management and treatment, though challenges persist.7
Limited extension of PROMs to the hospital-based assessment of key functional domains—mobility, daily activities, and cognition—might be partially due to the fact that these domains, unlike pain, often have objective aspects that have long been measured by well accepted, but often time/resource intensive, clinician ratings such as the 6-Minute Walk Test.8 Further, there are compelling reasons to question the accuracy of a hospitalized patient’s rating of their functional status given the high prevalence of delirium, encephalopathy, and pharmacologically-induced sedation. Efforts to advance the use of PROMs in the inpatient setting must confront the reality that a patient’s ability to interpret questions and formulate responses might be blunted. Rather than developing means to circumvent this limitation, hospital-based approaches have reflexively relied on clinician report and, consequently, might have undervalued the patient’s perspective.
The development of the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT) is a recent effort to promote efficient and precise PROM-based functional assessment in acute care settings. The development of the FAMCAT has been detailed elsewhere.9 It was derived from the Activity Measure for Post-Acute Care (AM-PAC) and designed to concurrently measure the Applied Cognition, Daily Activity, and Basic Mobility domains among hospitalized patients. Several strategies were considered in its development to address the problem of inaccurate patient self-reporting, including use of proxy reports. Administering the FAMCAT to patient proxies was considered as a potentially viable adjunct to patient report since hospitalized patients are frequently accompanied by family and friends who might be uniquely positioned to observe their physical and mental activities prior to and during hospitalization. Proxy reports have been validated as viable surrogates for patient report, particularly for physical function and other domains with objective dimensions.10–12 Given the negative consequences of relying on inaccurate proxy reports, investigators have sought to characterize instances of greater or lesser patient-proxy agreement.13,14 High caregiver burden, for example, has emerged as an assessable marker of potential patient-proxy disagreement.15
The use of proxy reporting has salient advantages in acute care settings, in part, because patients might be unavailable for prolonged periods due to testing or treatment. To explore the feasibility of FAMCAT proxy reporting, a proxy-focused study was embedded in the larger FAMCAT validation effort.9 This study assessed two patient-proxy related factors: 1) the agreement between scores resulting from patient and proxy ratings for each of the three FAMCAT domains, and 2) the association of proxy and patient score agreement with patient, proxy, or FAMCAT score characteristics.
METHODS
Participants.
This study was reviewed and approved by the Mayo Clinic’s (at Rochester, Minnesota) Internal Review Board. A total of 300 dyads of adult patients hospitalized on the medical services of a large quaternary medical center were recruited, along with their proxies. Patients and proxies provided oral consent to participate in the study. Since no protected health information was collected from proxies, they were not required to sign a HIPAA form.
Methods to identify and recruit participants in the FAMCAT validation studies, as well as specific descriptions of the sample of patients, have been described elsewhere.9 In brief, patients were identified using an electronic health record (EHR) search algorithm. Patient participants were required to be at least 18 years of age, to have been hospitalized on a medical service or re-hospitalized on a surgical service, and to have at least one chronic condition. While purposive sampling was used to recruit the cohorts and to calibrate and validate the FAMCAT, this approach was not used to enroll the subset of patients who participated in the proxy study. Following manual chart review of potentially eligible patients, their nurses were queried to see if a patient had received sedating medications or distressing news, was scheduled for an impending off-ward test or treatment, or was experiencing intense symptoms. Patients who met these criteria were considered temporarily ineligible and were approached at a later time.
Data collection.
Patients and their proxies were encouraged to complete the FAMCAT using tablet computers within ≤ 2 hours of each other. In three cases, due to clinical interruptions, the patient was unable to complete the FAMCAT on the same date as their proxy but did so within 24 hours. These cases were reviewed for major clinical or therapeutic events within the 24-hour interval. Finding none, these three dyads were retained and included in the analyses. Patient participants’ demographic and clinical characteristics were electronically abstracted from the Epic® EHR. Proxies were asked how long they had known the patient and in what capacity (e.g., spouse, relative, friend, roommate). Additionally, proxies were asked about the duration of their co-residence and when they last co-resided with the patient. Patients and proxies were instructed not to discuss the FAMCAT items until both had completed the assessment. They were observed by a research assistant while completing the FAMCAT. FAMCATs were administered between May 2016 and June 2017.
Instrument.
All patients and their proxies completed a tablet-based version of the FAMCAT, described elsewhere in detail.16, 17 The FAMCAT is a PROM that concurrently estimates the Applied Cognition (AC), Daily Activity (DA), and Basic Mobility (BM) functional domains. FAMCAT scoring was developed to assign ability-matched mobility/autonomy preservation plans for hospitalized patients across the full trait ranges of the three domains. The FAMCAT was derived from the Activity Measure for Post-Acute Care (AM-PAC) for use in acute care settings by refining item saliency and domain coverage for hospitalized patients, recalibrating the expanded item banks based on responses from medically ill patients or those hospitalized for post-operative complications, and parameterizing the CAT algorithm for multidimensional assessment.9 The MCAT algorithm selects for each patient, in real time, the optimal set of items for measuring that patient. Item banks for each of the three scales had over 100 items (in the form of 4-point ordinal, adjectival ratings) from which to select. Items are selected using multivariate item information with content balancing across the three scales. Patients’ trait levels and their standard errors are estimated using a Bayesian procedure with an informative prior. The MCAT is terminated when (1) the multivariate precision of the estimates reaches a pre-specified minimum, or (2) when a patient’s stabilizes, or (3) when 60 items (20 per scale) have been administered. Details of the development of the FAMCAT are presented elsewhere.17,18
Data Analysis.
Scale Score Analysis.
The purpose of the first analysis was to determine whether FAMCAT scores [i.e., IRT θ (trait level) estimates] from the proxies were significantly different from those obtained from patients. A repeated measures multivariate analysis of variance (MANOVA) with a significance level of α = .05 was implemented with gender and the age of the patient, as well as the patient versus proxy variable, as independent variables. Age was discretized into four equally-spaced quartiles. The eleven dependent variables were the three θ estimates obtained from the patients (, ,) the observed standard errors of measurement for the three θ estimates [, ,], the determinant of the Fisher information matrix, the number of items administered for each subscale (n1,n2,n3), and the time used to complete the FAMCAT. The same set of independent variables was collected from the proxies and was used as the repeated measure. Termination reason for the variable-length MCATs was also available as a dependent variable but was not analyzed because 95% of the patients’ and 93% of the proxies’ FAMCAT MCATs were terminated by the determinant criterion (multidimensional composite of standard errors), with the remaining numbers being too small for a meaningful analysis.19 Variables were assessed for skewness, and log transformations were performed for varaiables with skew greater than 1.0. Univariate ANOVAs were conducted post-hoc for statistically significant (p < .05) effects to examine which dependent variables differed for the independent variables, using Bonferroni adjusted significance levels of .05 / 11 = .005. All analyses were performed using either IBM SPSS 25.0a software or the e1071 R package.b,c
Psychometric Significance of Patient-Proxy Dyad Profiles.
The second analysis examined the similarity of the profiles of θ estimates obtained from the patients and their proxies. This was done by using the three hypothesis testing methods proposed in Wang and Weiss.20 Although the methods were designed to test whether the latent trait profile of an individual (i.e., their θs) has changed significantly over two measurement occasions, in this application the measurements for the first “occasion” was based on the FAMCAT administered to the patient and the second set of θs was the θ estimates obtained from their proxy. Other than the usual assumptions required for the use of IRT and MCAT, the methods assume that the two sets of measurements are on the same scales. Because both the patient and their proxy responded to the same instrument, which placed their scores on the same scales, the methods were deemed to be appropriate for this application.
Details for the calculation of three indices reflecting the psychometric significance of the two sets of score profiles within a patient-proxy dyad are in the Supplementary File. Three indices—a likelihood ratio test (LRT), a multivariate Wald test (MWT), and a score test (ST)—were computed using the data from each dyad and tested for psychometric significance. A significant result indicated that the two sets of θ estimates were reliably different from each other; a non-significant result indicated that the two sets of PRO scores did not differ from each other in a psychometrically significant manner.
Three variables were analyzed to investigate the factors relating to differences in psychometric significance within the dyads: (1) the proxy’s relationship to the patient, (2) the duration of co-residence, and (3) the interval since the proxy last co-resided with the patient.
RESULTS
Participants.
Patient and proxy age is summarized in Table 1. The patient group consisted of 133 women and 162 men with an average age of 60.9 (range 19 to 89) years. The proxy group was weighted more heavily toward women and consisted of 175 women and 119 men but had a similar mean age (60.5 years) and range (20–88) as the patient group. (A female proxy erroneously reported her age as 352 years old; this data point was excluded in Table 1), Figure 1S (“S” indicates that the figure is available in this paper’s Supplementary Materials File) shows the complete distribution of patient and proxy ages by gender.
Table 1.
Descriptive Statistics of the Age of Patients and Proxies, By Gender
| Patients | Mean | SD | Min | Max | Skewness |
|---|---|---|---|---|---|
|
| |||||
| Female (N = 133) | 58.06 | 13.65 | 22 | 84 | −0.56 |
| Male (N = 162) | 63.27 | 16.02 | 19 | 89 | −0.56 |
| Total | 60.92 | 14.97 | 19 | 89 | −0.62 |
| Proxies | |||||
| Female (N = 175) | 60.03 | 12.91 | 20 | 88 | −0.57 |
| Male (N = 119) | 61.19 | 15.02 | 26 | 84 | −0.54 |
| Total | 60.50 | 13.79 | 20 | 88 | −0.55 |
Four patients and three proxies dropped out either because they later decided not to participate or because an activity in the hospital interrupted their participation. The data for the 295 pairs with FAMCAT scores from both the patient and the proxy were used for analyses. The majority, N = 253 (86%), of the proxies were spouses of the patients; the remaining proxies were parents, N=17 (6%); children, N=15 (5%); significant others, N=10 (3%); roommates, N=2 (0.7%); and second-degree relatives or in-laws, N=2 (0.7%). Figure 2S (Panel A) displays the distribution of number of years that the proxy co-resided with the patient. The modal number was zero, indicating a lack of co-residence for about 30 proxies; for the remaining proxies the distribution ranged from 3 to over 60 years. As illustrated in Figure 2S (Panel B), 92% of the proxies co-resided with their patient within the last 20 days, with the modal response of only a few days.
Inter-domain correlations among participant FAMCAT variables.
Supplementary Tables 1S and 2S provide the correlation matrices of the dependent variables for the patients and the proxies, respectively. For the patients, the three θ estimates correlated from 0.32 (Applied Cognition and Basic Mobility) to 0.85 (Daily Activity and Basic Mobility), with correlations of −0.17 to 0.71 among their SEs. The number of items administered for the three scales were moderately to highly correlated (−0.42 to 0.93). Testing time correlations with the SEs were low (0.01 to 0.12), but moderate with the estimates (0.19 to 0.28) and the number of items (−0.16 to 0.41). The proxies’ dependent variables displayed a similar pattern of intercorrelation.
FAMCAT score characteristics and distributions.
Table 2 displays descriptive statistics for patient and proxy θ estimates, SEs, number of items administered, and time used to complete the FAMCAT (in seconds). Several variables (SEs, n, and time used) had high degrees of skewness in both groups (except for n1 in the proxy group), and log transformations were performed on , , , n1,n2,n3 and testing time. Table 3 tabulates the means and standard deviations for the three FAMCAT θ estimates cross-classified by age and gender (means and standard deviations of the SEs are in Table 3S).
Table 2.
Descriptive statistics for patient and proxy θ estimates, standard errors of measurement, number of items administered, and total testing times, for all FAMCAT Scales
| Patient | Proxy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mean | SD | Min | Max | Skewness | Mean | SD | Min | Max | Skewness | |
| Theta Estimate | ||||||||||
| Applied Cognition | 0 | 1.24 | −3.24 | 3.71 | 0.33 | 0.42 | 1.41 | −3.3 | 3.73 | 0.17 |
| Daily Activity | −0.37 | 1.19 | −3.71 | 4 | 0.26 | −0.58 | 1.2 | −2.88 | 3.71 | 0.56 |
| Mobility | −1.45 | 1.04 | −3.34 | 4 | 0.07 | −0.29 | 1.08 | −3.67 | 3.59 | 0.25 |
| Standard Error of Measurement | ||||||||||
| Applied Cognition | 0.58 | 0.17 | 0.28 | 1.38 | 2.06 | 0.66 | 0.23 | 0.34 | 1.39 | 1.26 |
| Daily Activity | 0.35 | 0.08 | 0.22 | 1.34 | 6.36 | 0.32 | 0.07 | 0.21 | 1.07 | 4.14 |
| Mobility | 0.24 | 0.07 | 0.15 | 1.22 | 10.15 | 0.23 | 0.07 | 0.14 | 0.92 | 4.98 |
| Number of items administered | ||||||||||
| Applied Cognition | 7.66 | 3.97 | 5 | 21 | 1.4 | 7.77 | 4.11 | 5 | 21 | 0.22 |
| Daily Activity | 8.07 | 3.81 | 5 | 20 | 1.97 | 8.8 | 4.05 | 5 | 20 | 1.85 |
| Mobillity | 8.45 | 4 | 5 | 21 | 1.75 | 9.1 | 4.3 | 5 | 21 | 1.41 |
| Time used (secs.) | 364.1 | 170.05 | 88 | 1309 | 1.93 | 312.15 | 191.57 | 81 | 2230 | 4.41 |
Table 3.
Means and standard deviations for the Three FAMCAT scales, cross-classified by age and gender
| Age quartile | Male | Female | ||
|---|---|---|---|---|
| Patient | Proxy | Patient | Proxy | |
| Applied Cognition mean (SD) | ||||
|
| ||||
| Min – Q1 | 0.74 (1.46) | 1.20 (1.54) | 0.34 (1.40) | 1.15 (1.59) |
| Q1 – Q2 | 0.11 (1.18) | 0.59 (1.68) | 0.11 (1.34) | 0.28 (1.12) |
| Q2 – Q3 | −0.04 (0.99) | 0.20 (1.17) | −0.30 (1.17) | 0.07 (0.90) |
| Q3 – Max | −0.59 (1.08) | −0.40 (1.12) | −0.37 (0.70) | 0.28 (1.09) |
| Total | −0.01 (1.25) | 0.34 (1.49) | 0.01 (1.24) | 0.51 (1.31) |
| Daily Activity mean (SD) | ||||
|
| ||||
| Min – Q1 | 0.11 (1.60) | −0.26 (1.69) | −0.69 (1.09) | −0.77 (1.11) |
| Q1 – Q2 | 0.26 (1.12) | −0.27 (1.27) | −0.60 (1.14) | −0.70 (1.13) |
| Q2 – Q3 | −0.32 (1.17) | −0.44 (1.28) | −0.98 (0.82) | −1.00 (0.99) |
| Q3 – Max | −0.15 (0.91) | −0.61 (1.06) | −0.94 (1.04) | −0.78 (0.70) |
| Total | −0.03 (1.20) | −0.40 (1.31) | −0.79 (1.03) | −0.81 (1.02) |
| Basic Mobility mean (SD) | ||||
|
| ||||
| Min – Q1 | 0.22 (1.43) | 0.02 (1.30) | −0.37 (1.10) | −0.43 (1.05) |
| Q1 – Q2 | 0.34 (0.91) | 0.00 (1.08) | −0.16 (0.96) | −0.21 (0.98) |
| Q2 – Q3 | −0.01 (0.97) | −0.26 (1.30) | −0.69 (0.90) | −0.65 (1.09) |
| Q3 – Max | −0.01 (0.75) | −0.33 (0.83) | −0.82 (0.70) | −0.60 (0.57) |
| Total | 0.13 (1.01) | −0.15 (1.13) | −0.48 (0.97) | −0.46 (0.98) |
Comparison of patient and proxy FAMCAT scores and test characteristics.
Results of the repeated measures MANOVA that included all FAMCAT domains are shown in Table 4S. There were significant (p < .05) proxy effects and proxy × gender interaction, indicating that at least one dependent variable had significant differences between the patients and the proxies. Follow-up univariate ANOVAs (Table 5S) shows significant effects at p < .005 for Applied Cognition and time used. Specifically, as shown in Table 2, for Applied Cognition, mean proxy scores were significantly higher (mean = 0.42 versus 0.00) indicating that proxy scores reflected significantly less reported impairment than patient scores. Proxies required less time to complete the FAMCAT (mean = 312.15 seconds) than the patients (mean = 364.14 seconds). Statistically significant differences between patients and proxies were also observed for the SE for Applied Cognition and Daily Activity . For the Applied Cognition SE, Table 2 shows that patients had a lower mean whereas for Daily Activity patients’ SEs were significantly higher.
Using a significance level of .05, Table 4S shows that the (within subjects) proxy effect did not differ by age (p = .090) but it differed by gender (p = .011). Table 6S shows the univariate ANOVA results for each of the dependent variables associated with the within-subjects dyads for the proxy × gender interaction. As the table shows, none of the univariate ANOVAs was statistically significant at the p = .005 level. The three-way interaction was not significant (p = .655).
Psychometric Profile Differences.
It was found that 47%, 48% and 49% of the patients’ profiles, which included all three , were significantly different from their proxies’ using the LRT, MWT, and ST indices, respectively. Two methods were said to agree when they both rejected or failed to reject the hypothesis that there was a significant difference between the two profiles of scores. The three methods showed strong agreement with each other, as the proportion agreement ranged from 0.89 to 0.95.
Association of proxy-patient relationship characteristics with intra-dyad agreement.
Based on the high level of agreement among the three indices, values of the LRT index were used to determine whether patient-proxy relationship and FAMCAT θ estimates were related to intra-dyad rating differences. All non-spouse categories were combined since their frequencies were very low. Spouses’ scores were slightly more often not significantly different from patients’ profiles, N = 135 (54%) versus N = 114 (46%), whereas for non-spouses’ frequencies were essentially equal [N = 22 (48%) Not Significant and N = 24 (52%) Significant] but this difference did not reach statistical significance (p = 0.43).
Duration of co-residence (in years) was compared between the Significant and Not Significant groups using a two-sample t-test. Results showed that the Not Significant group (i.e., more similar intra-dyad ratings) had a slightly longer period of co-residence (31.4 years, SD 18.4), versus 29.8 years (SD 18.7), but this difference was not statistically significant (p = 0.46). The mean number of days since last co-residence was lower for the Not Significant group (mean 3.21 days, SD 2.9) than the Significant group (mean 3.61, SD 6.14), but this was also not significant (p = 0.47).
Patient-proxy agreement as function of θ and SE estimates.
Patient-proxy agreement was evaluated as both a function of multi- and single-domain agreement. The former simultaneously evaluated agreement across all three domains: Applied Cognition, Daily Activity, and Basic Mobility. The degree of overall profile agreement—concordance of all three θ estimates within dyads—appeared to be unrelated to θ for any of the three scales, as the LRT index was stable across the range of each domain, as shown in Figure 1, Panel A. Patient-proxy overall agreement was also unrelated to the observed standard error of measurement for any domain, as illustrated in Figure 1, Panel B.
Figure 1.
LRT values for patient-proxy dyads across all three domains as a function of θ estimates, Panel A, and standard error of measurement, Panel B.
The relative proportion of significant patient-proxy θ difference was markedly lower when estimates were examined at the single domain, versus profile, level: Applied Cognition N=41 (14%), Daily Activity N=73 (25%), and Basic Mobility N=74 (25%). Applied Cognition was the only scale in which proxies rated patients significantly higher than patients rated themselves, 30/41 (73%). For the Daily Activity and Basic Mobility domains, roughly 1/3 of proxies rated the patient significantly higher. These results, however, suggest greater similarity (lack of significant intra-dyad differences) for single scales than for the three-variable profile of θ estimates.
Multi-domain Proxy-patient Profile Concordance.
Within-dyad profile differences were graphically represented for both the Significant and Not Significant groups. Figure 2, Panel A, shows the dyad profiles for the ten patients whose LRT p-values were the highest, reflecting non-significant differences. As the figure shows, for this group of patients, the two sets of scores were virtually identical, with only very minor differences between patients and proxies. Among the 138 dyads in the Significant group, in 51 dyads patients had significantly higher scores than their proxy, while in 46 dyads proxy scores were significantly higher. Examples of both of these cases are illustrated in Figure 2, Panel B.
Figure 2.
Example dyadic profiles in the group that did not differ significantly, Panel A, and the group that did, Panel B.
DISCUSSION
This study analyzed concordance between patient and non-clinical proxy functional domain scores as assessed by the FAMCAT in acute care settings. The ability to rely on proxy assessments of hospitalized patients’ functional status is desirable because patients might be unavailable due to diagnostic testing and treatment requirements, as well as pharmaceutically-induced sedation. The findings based on analyses within patient-proxy dyads include that: 1) mean patient and proxy scores differed significantly only in the Applied Cognition domain; however, significant psychometric intra-dyadic differences were least likely to occur in the Applied Cognition domain (14%); 2) when within-dyad differences were significant, proxy scores were more likely to be higher than those of patients only in the Applied Cognition domain; and 3) patient-proxy relationship characteristics, and θ and SE estimates, were not significantly associated with level of dyadic agreement.
Agreement between patients and their non-clinical proxies has been examined in a range of contexts;21,22 however, PROM performance in general, and patient-proxy concordance in particular, are under-researched in acute care settings. Reports from epidemiological and quality of life (QoL) studies are encouraging. Specifically, agreement between non-clinical proxies’ and elderly patients’ assessments of the patients’ dependency status has exceeded 90% in some studies.23,24 Reasonable agreement has also been reported between patients and their non-clinical proxies despite the presence of chronic illness and disability on the part of the patient,22,25–27 and mild cognitive impairment among both proxies and patients.21 Many of these reports focus on QoL PROMs. These PROMs include a limited number of items related to physical and cognitive functioning, and their patient-proxy concordances should be generalized cautiously to measures of function. Among patients with dementia, reports indicate the patient-proxy agreement is moderate to good for some domains, including physical function, but poor for others,28,29 suggesting that patient and proxy, as well as proxy-by-domain interactions, should be considered when gauging the accuracy and precision of a PROM score.
In this study, mean patient-proxy scores differed significantly only for the Applied Cognition domain. Specifically, the results of the MANOVA and follow-up ANOVAs indicated that at the FAMCAT scale score level there were no significant differences between mean patient and proxy θ estimates for either the Daily Activity or Basic Mobility scales. There were significant differences for the Applied Cognition scale where proxies rated patients as functioning at a higher level than the patients rated themselves. However, the proportion of significant psychometric patient-proxy score differences was lowest in the Applied Cognition domain (14%). This discrepancy highlights the limitations of frequentist statistical tests as, unlike psychometric tests, they do not take into account measured precision of the scale scores at the individual patient level. For Daily Activity, proxies provided non-significantly lower mean scores than the patients, whereas for Basic Mobility proxy scores were non-significantly higher.
Statistically significant patient-proxy differences were also observed for measurement precision, as reflected in the psychometric standard errors for both the Applied Cognition and Daily Activity scales. For Applied Cognition, scores generated by proxies were more precise (lower standard error of measurement) than those of the patients, whereas for Daily Activity, patient scores had a significantly lower standard error of measurement than those of their proxies. Taken together, these results suggest that proxies underestimated the reported Applied Cognition impairment level of their patients (i.e., provided higher mean ratings) but with a higher level of precision, whereas patients provided more precise estimates of their Daily Activity level but their ratings were, on average, not different in level than those of the proxies. There were no significant differences in either level or precision for the FAMCAT’s Basic Mobility scale.
The within-dyad profile analysis (all three FAMCAT domains) yielded a complex set of findings. For half of the dyads, the profiles of FAMCAT scores of the patients and the proxies were not psychometrically significantly different. For this group of patient-proxy dyads, proxy ratings could reliably be used in place of patient ratings. However, analyses of patient-proxy relationship characteristics--proxy relationship (spouse or non-spouse), duration of co-residence, and interval since last co-residence--that might separate the 50% of the dyads from those that were psychometrically significant, did not yield any statistically significant or clinically useful results. Among the dyads with significant profile differences between the patients and their proxy, a majority differed because patients’ scores were higher for 53% of the dyads. Proxies’ scores were higher for the remaining 47% of the dyads.
A strength of this study is its novelty, as agreement between patients and their non-clinical proxies has not been previously assessed among a hospital-based cohort, or with an IRT-modeled, MCAT-administered measure of function. Another novelty lies in the methodology in which both traditional statistical significance testing (i.e., MANOVA) and within-dyad psychometric significance were evaluated. The advantage of the latter analysis is it takes into account measured precision of the scale scores at the individual patient level whereas the former analysis focuses only on mean score differences ignoring psychometric precision. The study sample was large and highly representative of patients hospitalized for medical conditions and post-operative complications.
Limitations
Sample generalizability is always a concern. Our goal was to assess a large and representative sample of hospitalized patients (those hospitalized on general medical services or rehospitalized on surgical services following surgical complications). As aging, frailty, and chronic disease account for a large portion of U.S. health care spending, we also wished to focus on those more at risk for the well-recognized problem of a hospitalization leading to an often irreversible acceleration of functional loss in these groups. 30–32 Hence, we added the requirement that patients have at least one chronic medical condition. While it can be argued that recruitment from a single quaternary medical center is a limitation, we feel that this concern is lessened due to our focus on a large group of patients common to all hospitals—those on general medical services or re-hospitalized following surgical complications. Cognitive interviewing and collection of qualitative data to identify determinants of patient-proxy agreement was beyond the scope of this study. Although this deficit could be considered a limitation, we believe that it represents an important direction for future study.
CONCLUSIONS
Basic Mobility and Daily Activity FAMCAT scores did not differ significantly between patients and their proxies, with patients being more likely to rate function higher than their proxy for Daily Activity and lower for Basic Mobility. In contrast, patient-proxy scores differed significantly for the Applied Cognition domain with proxies generating higher scores.
Supplementary Material
Acknowledgments
This research was supported by the Eunice Kennedy Shriver National Institutes of Child Health and Human Development of the National Institutes of Health under Award Number R01HD079439 to the Mayo Clinic in Rochester Minnesota through subcontracts to the University of Minnesota and the University of Washington. The content is solely the responsibility of the authors and does not necessarily reflect the official views of the National Institutes of Health.
Abbreviations:
- AM-PAC
Activity Measure for Post-Acute Care
- ANOVA
Analysis of Variance
- EHR
Electronic Health Record
- FAMCAT
Functional Assessment in Acute Care MCAT
- HIPAA
Health Insurance Portability and Accountability Act
- IRT
Item Response Theory
- LRT
Likelihood Ratio Test
- MANOVA
Multivariate Analysis of Variance
- MCAT
Multidimensional Computerized Adaptive Testing
- MWT
Multivariate Wald Test
- n1, n2, n3
Number of Items Administered by the FAMCAT for Each Scale
- PROM
Patient-Reported Outcome Measure
- PROMIS
Patient-Reported Outcome Measure Information System
- QoL
Quality of Life
- SEM
Standard Error of Measurement (standard error of the θ estimate for each of the three FAMCAT scales)
- ST
Score Test
- θ
IRT Estimated Scale Score
Footnotes
Conflict of interest information
The authors report no conflicts of interest.
IBM SPSS 25.0.a (IBM Corp, 2017). IBM SSIBM Passport Advantage Online (PAO)
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2019). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7–1. https://CRAN.R-project.org/package=e1071.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Jette DU, Stilphen M, Ranganathan VK, Passek SD, Frost FS, Jette AM. AM-PAC “6-Clicks” functional assessment scores predict acute care hospital discharge destination. Phys Ther 2014;94:1252–61. [DOI] [PubMed] [Google Scholar]
- 2.Menendez ME, Schumacher CS, Ring D, Freiberg AA, Rubash HE, Kwon YM. Does “6-Clicks” Day 1 Postoperative Mobility Score Predict Discharge Disposition After Total Hip and Knee Arthroplasties? J Arthroplasty 2016;31:1916–20. [DOI] [PubMed] [Google Scholar]
- 3.Covert S, Johnson JK, Stilphen M, Passek S, Thompson NR, Katzan I. Use of the Activity Measure for Post-Acute Care “6 Clicks” Basic Mobility Inpatient Short Form and National Institutes of Health Stroke Scale to Predict Hospital Discharge Disposition After Stroke. Phys Ther 2020;100:1423–33. [DOI] [PubMed] [Google Scholar]
- 4.Field J, Holmes MM, Newell D. PROMs data: can it be used to make decisions for individual patients? A narrative review. Patient Relat Outcome Meas 2019;10:233–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kroenke K, Stump TE, Chen CX, et al. Responsiveness of PROMIS and Patient Health Questionnaire (PHQ) Depression Scales in three clinical trials. Health Qual Life Outcomes 2021;19:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Crins MHP, van der Wees PJ, Klausch T, van Dulmen SA, Roorda LD, Terwee CB. Psychometric properties of the PROMIS Physical Function item bank in patients receiving physical therapy. PLoS One 2018;13:e0192187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cohen MZ, Easley MK, Ellis C, et al. Cancer pain management and the JCAHO’s pain standards: an institutional challenge. J Pain Symptom Manage 2003;25:519–27. [DOI] [PubMed] [Google Scholar]
- 8.Agarwala P, Salzman SH. Six-Minute Walk Test: Clinical Role, Technique, Coding, and Reimbursement. Chest 2020;157:603–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cheville A Design and methods to develop and validate the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT) to Improve Delivery of Function-Directed Care in Hospitals Archives of Rehabilitation Research and Clinical Translation. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Alvarez-Nebreda ML, Heng M, Rosner B, et al. Reliability of Proxy-reported Patient-reported Outcomes Measurement Information System Physical Function and Pain Interference Responses for Elderly Patients With Musculoskeletal Injury. J Am Acad Orthop Surg 2019;27:e156–e65. [DOI] [PubMed] [Google Scholar]
- 11.Howland M, Allan KC, Carlton CE, Tatsuoka C, Smyth KA, Sajatovic M. Patient-rated versus proxy-rated cognitive and functional measures in older adults. Patient Relat Outcome Meas 2017;8:33–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sonder JM, Bosma LV, van der Linden FA, Knol DL, Polman CH, Uitdehaag BM. Proxy measurements in multiple sclerosis: agreement on different patient-reported outcome scales. Mult Scler 2012;18:196–201. [DOI] [PubMed] [Google Scholar]
- 13.Sonder JM, Balk LJ, Bosma LV, Polman CH, Uitdehaag BM. Do patient and proxy agree? Long-term changes in multiple sclerosis physical impact and walking ability on patient-reported outcome scales. Mult Scler 2014;20:1616–23. [DOI] [PubMed] [Google Scholar]
- 14.Ediebah DE, Reijneveld JC, Taphoorn MJ, et al. Impact of neurocognitive deficits on patient-proxy agreement regarding health-related quality of life in low-grade glioma patients. Qual Life Res 2017;26:869–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sonder JM, Holman R, Knol DL, Bosma LV, Polman CH, Uitdehaag BM. Analyzing differences between patient and proxy on Patient Reported Outcomes in multiple sclerosis. J Neurol Sci 2013;334:143–7. [DOI] [PubMed] [Google Scholar]
- 16.Weiss DJ, Wang C, Suen K, Basford J, Cheville AL. Mode Effects in the Functional Assessment in Acute Care Multidimensional Computer Adaptive Test (FAMCAT) Among Patients Hospitalized with Medical Conditions. Archives of Physical Medicine and Rehabilitation; In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang C, Weiss D, Cheville A. Multidimensional Computerized Adaptive Testing for Efficient and Precise Assessment of Applied Cognition, Daily Activity, and Mobility for Hospitalized Patients.. Archives of Physical Medicine and Rehabilitation In review. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cheville AL, Wang C, Yost KJ, et al. Improving the Delivery of Function-Directed Care During Acute Hospitalizations: Methods to Develop and Validate the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT). Arch Rehabil Res Clin Transl 2021;3:100112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang C, Weiss D, Shang Z. Variable-length stopping rules for multidimensional computerized adaptive testing. Psychometrika 2019;84:749–71. [DOI] [PubMed] [Google Scholar]
- 20.Wang C, Weiss DJ. Multivariate hypothesis testing methods for evaluating significant individual change.,. Applied Psychological Measurement 2018;42:221–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kamitani H, Umegaki H, Okamoto K, et al. [Agreement in the responses to self-reported and proxy-reported versions of QOL-HC: a new quality-of-life scale for patients receiving home-based medical care]. Nihon Ronen Igakkai Zasshi 2018;55:98–105. [DOI] [PubMed] [Google Scholar]
- 22.Hilari K, Owen S, Farrelly SJ. Proxy and self-report agreement on the Stroke and Aphasia Quality of Life Scale-39. J Neurol Neurosurg Psychiatry 2007;78:1072–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bouscaren N, Dartois L, Boutron-Ruault MC, Vercambre MN. How do self and proxy dependency evaluations agree? Results from a large cohort of older women. Age Ageing 2018;47:619–24. [DOI] [PubMed] [Google Scholar]
- 24.Maxwell CA, Dietrich MS, Minnick AF, Mion LC. Preinjury Physical Function and Frailty in Injured Older Adults: Self- Versus Proxy Responses. J Am Geriatr Soc 2015;63:1443–7. [DOI] [PubMed] [Google Scholar]
- 25.Jones CA, Feeny DH. Agreement between patient and proxy responses of health-related quality of life after hip fracture. J Am Geriatr Soc 2005;53:1227–33. [DOI] [PubMed] [Google Scholar]
- 26.Davis JC, Hsiung GY, Bryan S, et al. Agreement between Patient and Proxy Assessments of Quality of Life among Older Adults with Vascular Cognitive Impairment Using the EQ-5D-3L and ICECAP-O. PLoS One 2016;11:e0153878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pickard AS, Johnson JA, Feeny DH, Shuaib A, Carriere KC, Nasser AM. Agreement between patient and proxy assessments of health-related quality of life after stroke using the EQ-5D and Health Utilities Index. Stroke 2004;35:607–12. [DOI] [PubMed] [Google Scholar]
- 28.Boyer F, Novella JL, Morrone I, Jolly D, Blanchard F. Agreement between dementia patient report and proxy reports using the Nottingham Health Profile. Int J Geriatr Psychiatry 2004;19:1026–34. [DOI] [PubMed] [Google Scholar]
- 29.Romhild J, Fleischer S, Meyer G, et al. Inter-rater agreement of the Quality of Life-Alzheimer’s Disease (QoL-AD) self-rating and proxy rating scale: secondary analysis of RightTimePlaceCare data. Health Qual Life Outcomes 2018;16:131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Davydow DS, Hough CL, Levine DA, Langa KM, Iwashyna TJ. Functional disability, cognitive impairment, and depression after hospitalization for pneumonia. Am J Med 2013;126:615–24 e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sager MA, Franke T, Inouye SK, et al. Functional outcomes of acute medical illness and hospitalization in older persons. Arch Intern Med 1996;156:645–52. [PubMed] [Google Scholar]
- 32.Boyd CM, Ricks M, Fried LP, et al. Functional decline and recovery of activities of daily living in hospitalized, disabled older women: the Women’s Health and Aging Study I. J Am Geriatr Soc 2009;57:1757–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


