. 2006 Oct 11;4:79. doi: 10.1186/1477-7525-4-79

Table 4.

Measurement Properties Reviewed for PRO Instruments Used in Clinical Trials

Measurement Property	Test	What is Assessed	FDA Review Considerations
Reliability	Test-retest	Stability of scores over time when no change has occurred in the concept of interest	Does the PRO instrument reliably measure the concepts it was designed to measure?
	Internal consistency	Whether the items in a domain are intercorrelated, as evidenced by an internal consistency statistic (e.g., coefficient alpha)	Were appropriate reliability tests conducted?
	Inter-interviewer reproducibility (for interviewer-administered PROs only)	Agreement between responses when the PRO is administered by two or more different interviewers	What was the quality of the evidence of reliability?
Validity	Content-related	Whether items and response options are relevant and are comprehensive measures of the domain or concept	Do items in the verbatim copy of the PRO instrument appear to measure the concepts they are intended to measure in a useful way?
			Have patients similar to those participating in the clinical trial confirmed the completeness and relevance of all items?
	Ability to measure the concept (also known as construct-related validity; can include tests for discriminant, convergent, and known-groups validity)	Whether relationships among items, domains, and concepts conform to what is predicted by the conceptual framework for the PRO instrument itself and its validation hypotheses.	Do observed relationships between the items and domains confirm the hypotheses in the conceptual framework? Do results compare favorably with results from a similar but independent measure?
			Do results distinguish one group from another based on a prespecified variable that is relevant to the concept of interest?
	Ability to predict future outcomes (also known as predictive validity)	Whether future events or status can be predicted by changes in the PRO scores	Do PRO scores predict subsequent events or outcomes accurately?
Ability to detect change	Includes calculations of effect size and standard error of measurement among others	Whether PRO scores are stable when there is no change in the patient, and the scores change in the predicted direction when there has been a notable change in the patient as evidenced by some effect size statistic. Ability to detect change is always specific to a time interval.	Has ability to detect change been demonstrated in a comparative trial setting, comparing mean group scores or proportion of patients who experienced a response to the treatment?
			Has ability to detect change been assessed for the time interval appropriate to study?
Interpretability	Smallest difference that is considered clinically important; this can be a specified difference (the minimum important difference (MID)) or, in some cases, any detectable difference. The MID is used as a benchmark to interpret mean score differences between treatment arms in a clinical trial	Difference in mean score between treatment groups that provides convincing evidence of a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. The definition of an MID using a clinical anchor is sometimes called an MCID.	The FDA is specifically requesting comment on appropriate review of derivation and application of an MID in the clinical trial setting.
	Responder definition – used to identify responders in clinical trials for analyzing differences in the proportion of responders between treatment arms	Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches.	The FDA is specifically requesting comment on appropriate review of derivation and application of responder definitions when used in clinical trials.