Reliability |
Test-retest |
Stability of scores over time when no change has occurred in the concept of interest |
Does the PRO instrument reliably measure the concepts it was designed to measure? |
|
Internal consistency |
Whether the items in a domain are intercorrelated, as evidenced by an internal consistency statistic (e.g., coefficient alpha) |
Were appropriate reliability tests conducted? |
|
Inter-interviewer reproducibility (for interviewer-administered PROs only) |
Agreement between responses when the PRO is administered by two or more different interviewers |
What was the quality of the evidence of reliability? |
Validity |
Content-related |
Whether items and response options are relevant and are comprehensive measures of the domain or concept |
Do items in the verbatim copy of the PRO instrument appear to measure the concepts they are intended to measure in a useful way? |
|
|
|
Have patients similar to those participating in the clinical trial confirmed the completeness and relevance of all items? |
|
Ability to measure the concept (also known as construct-related validity; can include tests for discriminant, convergent, and known-groups validity) |
Whether relationships among items, domains, and concepts conform to what is predicted by the conceptual framework for the PRO instrument itself and its validation hypotheses. |
Do observed relationships between the items and domains confirm the hypotheses in the conceptual framework? Do results compare favorably with results from a similar but independent measure? |
|
|
|
Do results distinguish one group from another based on a prespecified variable that is relevant to the concept of interest? |
|
Ability to predict future outcomes (also known as predictive validity) |
Whether future events or status can be predicted by changes in the PRO scores |
Do PRO scores predict subsequent events or outcomes accurately? |
Ability to detect change |
Includes calculations of effect size and standard error of measurement among others |
Whether PRO scores are stable when there is no change in the patient, and the scores change in the predicted direction when there has been a notable change in the patient as evidenced by some effect size statistic. Ability to detect change is always specific to a time interval. |
Has ability to detect change been demonstrated in a comparative trial setting, comparing mean group scores or proportion of patients who experienced a response to the treatment? |
|
|
|
Has ability to detect change been assessed for the time interval appropriate to study? |
Interpretability |
Smallest difference that is considered clinically important; this can be a specified difference (the minimum important difference (MID)) or, in some cases, any detectable difference. The MID is used as a benchmark to interpret mean score differences between treatment arms in a clinical trial |
Difference in mean score between treatment groups that provides convincing evidence of a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. The definition of an MID using a clinical anchor is sometimes called an MCID. |
The FDA is specifically requesting comment on appropriate review of derivation and application of an MID in the clinical trial setting. |
|
Responder definition – used to identify responders in clinical trials for analyzing differences in the proportion of responders between treatment arms |
Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. |
The FDA is specifically requesting comment on appropriate review of derivation and application of responder definitions when used in clinical trials. |