. 2013 Mar 11;11:38. doi: 10.1186/1477-7525-11-38

Table 1.

Review criteria based on FDA requirements for PROs to support label claims

Instrument property	Description from FDA guidance[5]	Review criteria
Instrument property	Description from FDA guidance[5]	Notation	Detail
Conceptual Framework	An explicit description or diagram of the relationships between items in a PRO instrument and the concepts measured, developed from empiric evidence to support item grouping and scores.	✓	Published conceptual framework.
Conceptual Framework		✘	No published conceptual framework.
Validity: Content – Patient Input	Evidence that the instrument measures the concept of interest including evidence from qualitative studies that the items and domains of an instrument are appropriate and comprehensive relative to its intended measurement concept, population and use. Item generation should include input from the target population.	✓	Patient involvement in concept elicitation/item generation AND conduct of cognitive debrief with patients.
		Partial	Some patient involvement in concept elicitation/item generation or cognitive debrief but not both.
		✘	No patient involvement in instrument development and evaluation of content.
Validity: Content – Literature & Clinician Input	See Validity: Content (above). In addition to focus groups and individual interviews with patients and family members, PRO instrument items can be generated from literature reviews, interviews with clinicians and other sources.	✓	Use of literature to guide instrument/item development OR involvement of clinical experts to guide instrument/item development or evaluate content validity (not necessary to have both).
Validity: Content – Literature & Clinician Input		✘	No use of literature to guide instrument/item development AND no involvement of clinical experts to guide instrument/item development or evaluate content validity.
Validity: Construct	Evidence that relationships among items, domains, and concepts conform to a priori hypotheses concerning logical relationships that should exist with other measures (discriminant and convergent validity) or characteristics of patients and patient groups (known groups validity).	✓	Clear hypotheses for relationships with measures (PRO or clinical) of related concepts tested with hypothesised relationships found (convergent validity). Can be supported by evidence that measures of concepts that should not be related show hypothesised lack of correlation (discriminant validity). Can also be supported by known groups validity (hypothesised, tested and found) but known groups validity alone is not sufficient evidence of construct validity.
		Partial	Construct validity tested without clear hypotheses OR mixed results in terms of the extent to which observed relationships match those hypothesised OR limited number of tests undertaken i.e. if instrument is correlated against one other PRO and that’s the extent of the testing, then this is only partial evidence of construct validity.
		✘	Construct validity tested but observed relationships do not match those hypothesised.
		–	Construct validity (convergent, discriminant and known groups validity) not tested.
Reliability: Test-retest	Stability of scores over time when no change is expected in the concept of interest.	✓	Correlations ≥0.7 for all scores (including domain scores)
		Partial	Correlations for some scores <0.7 OR good test-retest reliability found for total score but domain scores not evaluated.
		✘	Correlations <0.7 for all scores evaluated.
		–	Not tested for any scores.
Reliability: Internal consistency	The extent to which items comprising a scale measure the same concept, intercorrelation of items that contribute to a score.	✓	Cronbach’s Alpha ≥0.8 for all scores (including domain scores).
		Partial	Cronbach’s Alpha for some scores <0.8 OR good internal consistency found for total score but domain scores not evaluated.
		✘	Cronbach’s Alpha <0.8 for all scores evaluated.
		–	Not tested for any scores.
Ability to detect change	Evidence that a PRO instrument can identify within person changes over time in individuals or groups (similar to those in clinical trials) who are known to have changed with respect to the measurement concept.	✓	Specific aim of analysis was to test within-group responsiveness to change (e.g. set criteria for change e.g. effect sizes), tested and met criteria for all scores (including domain scores). Of key importance is clear evidence/reason to believe that change has occurred in a group (e.g. clinical outcome, anchor-based approach) and that the PRO instrument scores detect this change.
		Partial	Within-group sensitivity to change criteria met for some but not all scores OR criteria met for a total score but responsiveness of domain scores not tested.
		✘	Within group sensitivity to change tested but criteria not met.
		–	Not tested for any scores. This includes claims of instruments sensitivity to change based on between group change (e.g. difference in change between different arms of clinical trial) and observed change in a group without clear evidence/reason to believe that change has occurred in the group or without the clear aim of evaluating sensitivity to change (e.g. observed change from baseline within one arm of a clinical trial when not evaluated in relation to observed clinical change).
Interpretation of change	The MID is the smallest change in score that can be regarded as important [23]. The FDA guidance uses the term ‘responder definition’ rather than MID to denote the change in individual PRO score that indicates a treatment benefit. Responder definitions are trial/treatment specific and should be derived empirically using anchor-based methods (clinical anchors or patient ratings of change). Statistically derived responder definitions (e.g. distribution-based methods commonly used to establish MID) can be used to support anchor-based approaches but are not appropriate as the sole basis for determining a responder definition.	✓	Published values for interpretation of change for all scores (including domain scores). Methodological details about how values were derived e.g. statistically, using anchor-based methods, provided and discussed in results text.
		Partial	Values for interpretation of change for total score but not domain scores. Methodological details about how this was derived e.g. statistically, using anchor-based methods, provided and discussed in results text.
		–	No published evidence for interpretation of change.

Glossary of Terms:

Cognitive debrief: a qualitative research tool used to determine whether concepts and items are understood by patients in the same way that instrument developers intend.

Concept: the specific measurement goal (i.e. the thing that is to be measured by the PRO instrument).

Item: an individual question, statement or task (and its standardized response options) that is evaluated by the patient to address a particular concept.

Reliability: the ability of a PRO instrument to yield consistent, reproducible estimates of true treatment effect.

Responder definition: a score change in a measure, experienced by an individual patient over a predetermined time period that has been demonstrated in the target population to have significant treatment benefit.