Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 1.
Published in final edited form as: Dis Colon Rectum. 2020 Aug;63(8):1156–1167. doi: 10.1097/DCR.0000000000001717

Patient Reported Outcomes Measures in Colon and Rectal Surgery: A Systematic Review and Quality Assessment

Alexander T Hawkins 1, Russell L Rothman 2, Timothy M Geiger 1, Juan Canedo 2, Kamren Edwards-Hollingsworth 1, David C LaNeve 1, David F Penson 3
PMCID: PMC8029646  NIHMSID: NIHMS1676536  PMID: 32692077

Abstract

BACKGROUND:

There is growing interest in using patient-reported outcome measures to support value-based care in colorectal surgery. To draw valid conclusions regarding patient-reported outcomes data, measures with robust measurement properties are required.

OBJECTIVE:

To assess the use and quality of patient-reported outcome measures in colorectal surgery.

DATA SOURCES:

Three major databases were searched for studies using patient-reported outcome measures in the context of colorectal surgery.

STUDY SELECTION:

Articles that used patient-reported outcome measures as outcome for colorectal surgical intervention in a comparative effectiveness analysis. Exclusion criteria included: articles older than 11 years, non-English language, age less than 18, fewer than 40 patients, case reports, review articles and studies without comparison were excluded.

MAIN OUTCOME MEASURES:

Quality assessment using a previously reported checklist of psychometric properties.

RESULTS:

From 2007–2018, 368 studies were deemed to meet inclusion criteria. These studies used 165 distinct patient-reported outcome measures. Thirty were used five or more times and were selected for quality assessment. Overall, the measures were generally high quality, with 21 (70%) scoring 14 or greater on an 18-point scale. Notable weaknesses included management of missing data (14%), and description of literacy level (0%).

LIMITATIONS:

Use of original articles for quality assessment. Measures were selected for quality analysis based on frequency of use rather than other factors such as impact of the article or number of patients in the study

CONCLUSIONS:

Patient-reported outcome measures are widely used in colorectal research. There was a wide range of measures available and many were used only once. The most frequently used measures are generally high quality, but a majority lack details on how to deal with missing data and information on literacy levels. As the use of patient-reported outcome measures to assess colorectal surgical intervention increases, researchers and practitioners need to become more knowledgeable about the measures available and their quality.

Keywords: Colorectal surgery, Patient reported outcomes

INTRODUCTION

Determining success of colorectal surgery is difficult. Traditional methods such as survival and 30-day outcomes fail to tell the complete story of the effect of an operative intervention on a patient. This is true not only for malignant conditions, but for benign colorectal disease as well. As such, the use of Patient Reported Outcomes Measures (PROMs) has gained in popularity over the past decade.1,2

A comprehensive assessment of the value of clinical care requires patients to describe their own experience regarding their symptoms and feelings. These “patient-reported outcomes” can pertain to anything a patient may experience: pain, bowel dysfunction, reductions in the activities of daily living, stress/anxiety, financial toxicity, etc. Patient reported outcome measures (PROMs) are the tools or instruments used to measure these outcomes. PROMs use validated questionnaires to turn symptoms into a numerical score that can be used longitudinally to assess either the natural history of a disease process or the result of a surgical intervention. They can be collected on paper, via telephone or face-to-face encounters, or through a number of e-platforms. They must be tested using rigorous psychometric methods to ensure their validity and reliability before use in a clinical or research setting. PROMs may include tools or instruments that measure function, health related quality of life, symptom or symptom burden, personal experience of care, and health-related behaviors such as anxiety and depression.2 They can seek to capture overall health or health states from specific diseases. PROMs are critical in identifying the benefit of a surgical outcome. They capture the very reason that most patients seek care: to address a troublesome symptom, limited function, or poor overall health.

Recently, a National Institutes of Health (NIH)/Food and Drug Administration (FDA) committee cited three categories: feeling, function, and survival as primary patient centered outcomes to be addressed and included into all clinical trials that seek FDA approval.3 The goal then becomes to seek out and utilize high quality PROMs that accurately measure effectiveness, facilitate decision making and inform health policy.4 As the focus has shifted to patient centered care, PROMs have proliferated.5 But their testing, quality and intended use can vary widely.6 This creates a problem, as both researchers and clinicians traditionally lack training in the methodology used to create and validate PROMs and may be unable to assess the quality of tools available for use in both the clinical and research setting. The aim of this study is to determine which PROMs are being utilized in colorectal research and perform a quality assessment of the most widely used instruments. To accomplish this, a systematic review was performed of all of colorectal surgery studies that have utilized a PROM over the past 11 years. A quality assessment of the most widely used PROMs was performed using a strategy developed and published by Francis et al.7

METHODS

Literature search

This systematic review was conducted by following the preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) guidelines.8 The protocol was reviewed and deemed ineligible by the International Prospective Register of Systematic Reviews (140186) given that we were not using a specific pre-specified primary outcome. A comprehensive and contemporary search of PubMed, Embase, and Web of Science Core Collection was conducted from January 2007 through September 2018. The systematic search strategies are available in Appendix 1.

Inclusion/Exclusion

All articles were included that used PROMs as outcome for colorectal surgical intervention in a comparative effectiveness analysis. These included observational studies with a comparison group, case-control studies and randomized controlled trials. As we were interested in contemporary utilization, we excluded articles older than 11 years. It should be noted that the PROMs themselves could be older than 11 years. Articles needed to be written in the English language and include human subjects over the age of 18. To ensure the articles were impactful, we exclude any article with fewer than 40 patients. PROMs were excluded from analysis if they were created solely for the study without any evidence of psychometric testing. All case reports, review articles and studies without comparison were excluded.

Data Extraction

Abstracts from identified studies were read by three authors (ATH, KEH, DCL) and any discrepancies on inclusion were adjudicated by the first author (ATH) in consultation with the senior author (DFP). A full examination of all included articles was then conducted (ATH, DCL, KEH). A data extraction tool was created using Google Documents (Mountain View, CA) to standardize abstracted information regarding date of publication, study type, organ involved (colon, rectum, anus), disease process, surgical intervention, and PROM used. Extraction was performed by two independent researchers (DCL, KEH) with any discrepancies resolved by a senior author (ATH). PROMs were then tallied by use in each of the eligible articles.

Quality Assessment

We performed a quality assessment on all PROMs that had 5 or more uses in eligible studies. This number was chosen as it represented broad dissemination of the PROM. This was consistent with the goal of focusing on PROMs with wide-ranging use. Quality of the PROM was based on the initial article describing the PROM as well as any available supplemental manuscripts by the original authors describing psychometric properties. If any elements were missing, they were assumed to not to be collected or assessed. Authors were not contacted for clarification of missing elements. The quality assessment was based on a checklist developed by Francis et al that analyzes PROMS with 16 elements over 6 domains – conceptual model, content validity, reliability, construct validity, scoring & interpretation and respondent burden & presentation.7 Each element is scored either present (1 point) or absent (0 points) with a total possible score of 18. Higher numbers represent higher quality. The full checklist is available in Appendix 2. Quality assessment was performed by two independent researchers (ATH and JC) with any discrepancies resolved by a senior author (DFP). As one weakness of this approach is that excellent, but newly developed PROM would not be identified by wide spread utilization, we also report on two PROMs that were not identified by this analysis but deemed important for readers to be aware of by the authors. SAS statistical software (version 9.2; SAS Institutes Inc., Cary NC, USA) was used for all quantitative analysis.

RESULTS

The initial systematic search revealed 1968 studies. After review and exclusion, 368 studies met inclusion criteria. (Figure 1) Of these, 329 (89%) were observational studies and 39 (11%) were randomized controlled studies. There were no case control studies. The median patients in each study was 107 (IQR 68–205). These studies used 165 distinct PROMs. (Figure 2) Utilization by anatomic location and disease process is presented in Table 1. Thirty PROMS were used 5 or more times and were selected for quality assessment. (Supplemental Table 1) Overall, the PROMS were generally high quality, with 21 (70%) scoring 14 or greater on an 18-point scale. Items with fewer than 75% compliance include: evidence that members of the respondent population were involved (66%), description of question development methodology (72%), description of change over time psychometrics (72%), documentation on how to score the measure (55%), management of missing data (14%), interpretation of PROM score (69%) and description of literacy level (0%) (Table 2). A brief description, overall quality score and summary of each PROM can be found in Table 3.839

Figure 1-.

Figure 1-

PRISMA flow diagram.

Figure 2-.

Figure 2-

Histogram of PROM utilization among the 368 studies.

Table 1-.

PROM Utilization by Anatomic Location and Disease Process

n=165
Anatomic Location
 Colon 114 (69%)
 Rectum 141 (85%)
 Anus 25 (15%)
Disease Process
 Neoplasia 95 (58%)
 IBD 32 (19%)
 Pelvic Floor 22 (13%)
 Anorectal 20 (12%)
 Diverticular Disease 14 (8%)
 Constipation 15 (9%)
 Other 60 (36%)

IBD- Inflammatory Bowel Disease

Table 2-.

Mean Quality Assessment Scores of Most Widely Used PROMS

Concepts Instrument Mean Score
Conceptual Model Has the PRO construct to be measured been specifically defined? 1
Has the intended respondent population been described? 1
Does the conceptual model address whether a single scale or multiple subscales are expected? 0.97
Content Validity Is there evidence that members of the respondent population were involved in the development of the PRO measure? 0.66
Is there evidence that content experts were involved in development of the PRO measure? 1
Is there a description of the methodology by which items/questions were derived? 0.72
Reliability Is there evidence that the reliability of the PRO measure was tested (e.g., test-retest, internal consistency)? 0.93
Are reported indices of reliability adequate (e.g., ideal: r>=0.80; adequate r>=0.70; or lower if justified)? 0.93
Construct Validity Is there reported mathematical justification that a single scale or multiple subscales exist in the PRO measure (e.g., factor analysis, item response theory)? 0.86
Is the PRO measure intended to measure change over time? 0.72
Are there findings supporting expected correlations with existing PRO measures or other clinical data? 0.86
Are there findings supporting expected differences in scores between known groups? 0.93
Scoring & Interpretation Is there documentation how to score the PRO measure? 0.55
Has a plan for managing and/or interpreting missing responses been described? 0.14
Is there information provided on how to interpret the PRO measure scores? 0.69
Respondent Burden & Presentation Is time to complete reported and reasonable? If not, are number of questions appropriate for the intended application? 0.9
Is there a description of the literacy level of the PRO measure? 0
Is the entire PRO measure available for public viewing? 1
Total 13.79

Table 3-.

Details and Summary of Most Widely Used PROMS

Focus Disease Name Number of Studies Francis Score Number of Items Domains/Sub Scales Number of Subjects used in Development Scoring Notes
Global Short Form 36 (SF-36) aka Rand 36, MOS369 83 15 36 8 N/A Scale between 0–100 with 100 being the best possible score Contains eight domains: physical, role and social functioning, mental health, patient health, perceptions, vitality, body pain and change in health; designed to be a shorter and less complex version of the SF-36
Short Form 12-General Health Status Survey (SF-12)10 8 15 12 8 232 Scale between 0–100 with 100 being the best possible score Designed to be a shorter and less complex version of the SF-36
The European Organization for Research and Treatment of Cancer Quality of Life Questionnaire – EORTC QLQ-CR3011 66 14 30 14 305 All scales range from 0 to 100. A high score for functional scales represents a high level of functioning; a high score for a symptom scale represents a high level of symptomatology. Five functional scales (physical, role, emotional, cognitive, and social functioning), three symptom scales, and six single items (fatigue, nausea and vomiting, pain, dyspnea, insomnia, loss of appetite, constipation, diarrhea, and financial difficulties)
European Quality of Life -EQ5D12 15 13 6 5 592 5-digit number can be converted into a preference weight which is also referred to as a single weighted index score 100-point visual analog scale create a health state based on the weighted time trade-off method that then be used for cost effectiveness analyses
Cleveland Global QoL – CGQL13 12 13 3 N/A 977 Responses range from 0 to 10 with higher values implying a better quality of life. Scores are calculated by dividing all three item scores by 30 for a final range between 0 and 1 Originally designed to measure QoL after restorative proctocolectomy, but has evolved to become a validated and widely used QoL score for colorectal disease
Gastrointestinal Quality of Life Index-GIQLI14 37 16 36 i 661 Each question is scored from 0 to 4 with higher numbers representing a more desirable health state. The overall score ranges from 0 to 144 Focuses on symptoms experienced in the past 2 weeks and their effect on overall health
Memorial SloanKettering Cancer Center Bowel Function Instrument - MSKCC-BFI15 8 16 18 3 198 Each subscale was scored by summing up the responses to the items in the subscale with a global score was calculated by adding the subscale scores. Three subscales (Frequency, Dietary, and Urgency/Soilage) with a 4 week recall period
Colorectal Functional Outcome (COREFO)16 7 16 27 N/A 179 All questions can be answered by choosing from five response options; No; Never; Yes, less than once a week; Yes, 1– 2 days per week; Yes, 3–5 days per week; Yes, 6–7 days per week. Scoring is 1–5 for each question with a total score ranging from 27–135 with lower scores indicating better function. 2-week recall period; analyzes colorectal function rather than overall quality of life
Disease Specific Colorectal Cancer European Organization for Research and Treatment of Cancer Colorectal cancer-specific quality of life questionnaire module– EORTC QLQ-CR3817 44 16 38 16 354 All scale and singleitem scores are linearly transformed to a 0–100 scale with lower scores representing lower functioning. Five functional domain scales: Physical (P), Role (R), Emotional (E), Social (S) and Cognitive (C); two items evaluate global health status (GHS); in addition three Symptom scales assess: Fatigue, Pain and Emesis; and six single items assess further symptoms.; clinically valid supplementary questionnaire for assessing specific QoL issues not covered by the QLQ-C30 in patients with colorectal cancer
EORTC QLQ-CR2918,19 13 16 29 4 79 All scale and singleitem scores are linearly transformed to a 0–100 scale with lower scores representing lower functioning. Refined to four scales assessing urinary frequency, fecal seepage, stool consistency and body image and single items assessing other common problems following treatment for colorectal cancer
Functional Assessment of Cancer Therapy-Colorectal Questionnaire (FACT-C)20 14 15 9 5 245 Scores can be produced as a combined total of all domains (FACT-C total), the Colorectal Cancer Score (CCS) and a Treatment Outcome Index (TOI) can be calculated by summing the FACT-G physical and functional domains and the CCS. Subscales for physical well-being, social-family well-being, emotional well-being, functional well-being, and colorectal cancer-specific concerns
Fecal Incontinence The Wexner Incontinence Scale aka Cleveland Clinic Incontinence Score- CCI, WCCS, CCF-F21 53 7 5 N/A N/A Scale of 0 (never) to 4 (always). Overall, scores range between 0 (no incontinence) and 20 (complete incontinence). Original manuscript provided little data on psychometric properties, but subsequent articles have demonstrated evidence for reliability,22 validity23 and sensitivity.24
Fecal Incontinence Quality of Life- FIQOL, FIQL25 29 16 29 4 190 Scored by averaging responses to items in each domain, producing four domain-specific scores. Lower scores indicate lower quality of life. Assesses four domains of incontinence: lifestyle (10 items), coping (9 items), depression (7 items), and embarrassment (3 items).
Fecal Incontinence Severity Index (FISI)26 53 15 4 N/A 34 Total score is the sum of the item scores, and the patient-based score ranges from 0 (no incontinence) to 61 (severe incontinence). Examines the frequency of four types of fecal incontinence (gas, mucus, liquid, and solid); two sets of scoring are available: a patient-based system and a surgeon-based system, with the patient-based system is more widely used
Low Anterior Resection Syndrome score – LARS27 11 14 5 N/A 961 Individual score values are summed to form the LARS score, which was divided into "no LARS" (0–20), "minor LARS" (21–29) and "major LARS" (30–42). For assessing bowel function after sphincter-preserving surgery with or without radiotherapy for rectal cancer
Vaizey scale aka St. Mark's incontinence score27 7 14 8 N/A 33 Overall scores run from 0 (perfect continence) to 24 (total incontinence). Modified the Wexner scale by including assessment of the ability to defer bowel movements and utilization of constipating medications
Sexual Function International Index of Erectile Function/Dysfunction- IEF-5, IIED, sexual health inventory for men28 36 16 15 5 278 Each question scored 0–5. Higher scores represent better function. Brief, selfadministered measure of erectile dysfunction with five subsets (erectile function, orgasmic function, sexual desire, intercourse satisfaction, and overall satisfaction)
Female Sexual Function Index – FSFI29 27 16 19 6 259 Scoring is achieved by summing individual items that comprise the subscale and multiplying the sum by a factor. Comprises six domains: desire [two items], arousal [four items], lubrication [four items], orgasm, satisfaction, pain [three items each].
Constipation Knowles-Eccersley-Scott Symptom Questionnaire KESS30 10 14 11 N/A 91 The KESS uses four- to five-point Likert scales that are scored on an unweighted linear integer scale. Total scores can range from 0 (no symptoms) to 39 (highest constipation). A threshold score of > 11 denotes constipation. Advantage of distinguishing between various subtypes of constipation
Wexner/Cleveland Clinic Constipation Score -CCCS, CCF-CS, Agachan-Wexner Score22 11 7 7 232 Scored using a five-point Likert scale that ranges from 0 (none of the time) to 4 (all of the time) and one item that is rated on a 0–2 scale.26 Scores range from 0 to 30, with 0 indicating normal and 30 indicating severe constipation. A cutoff score of 15 suggests constipation.
Patient Assessment of Constipation Quality of Life- PAC-QOL31 9 14 27 4 223 The scores for each question are recoded as scores of 0–4, with lower scores indicating fewer problems. Symptom scores and sub-scores are then calculated, as averages of the relevant questions, and symptoms, and the overall score computed as the average score across the 12 symptoms. Four subscales (worries and concerns, physical discomfort, psychosocial discomfort, and satisfaction); developed to evaluate the effect of constipation on quality of life over time
Urinary Function International Prostatic Symptoms Score- IPSS32 20 14 8 N/A 158 Each answer is scored from 0 to 5 for a maximum score of 35 points. Higher scores indicate worse symptoms. Originally designed to assess symptoms of benign prostatic hyperplasia
Ostomy City of Hope Quality of Life-Ostomy Questionnaire- COHQOL-OQ33 11 13 90 2 1512 Subscale scores are produced by adding the scores on each item with the subscale and then dividing by the number of items in that subscale. A total QOL score is obtained by adding the scores on all 10-point items and dividing by the total number of items. Higher scores indicate better QoL. First component consists of 47 forced-choice and open-ended items that relate to patient sociodemographic characteristics as well as work-related items, health insurance, sexual activity, psychological support, clothing, diet, and daily ostomy care
Inflammatory Bowel Disease Inflammatory Bowel Disease Questionnaire (IBDQ)34 9 15 32 4 77 Graded responses for each item of 1 (poorest) to 7 (best)
Short Inflammatory Bowel Disease Questionnaire (SIBDQ)35 6 14 10 4 299 All scores were reported with a 7point scale. The total score is divided by 10 for a range from 1 to 7 with 1 being poor and 7 being optimum health.
Anxiety & Depression Hospital Anxiety and Depression Scale- HADS36 7 13 14 N/A 100 Each item on the questionnaire is scored from 0–3 and this means that a person can score between 0 and 21 for either anxiety or depression. A cut-off point of 8/21 for anxiety or depression has been described.37 Used to assess anxiety and depression symptoms over the past week.
Pelvic Floor Pelvic Floor Distress Inventory- PFDI-2038 6 12 20 3 100 Scoring is on a 4-point scale. Overall scoring for each subset is the mean value of all questions multiplied by 25 (range 0–100). The summary score is the sum of subscales (range 0–300) with higher scores representing greater perceived impact of pelvic floor dysfunction. Measures the degree of disruption to quality of life caused by pelvic floor symptoms
Pouch Function modified Oresland score39 6 16 12 N/A 100 Item ratings are summarized to produce a score ranging from 0 to 15 (15 is worst). A functional score of 8 or more is associated with an impaired health - related quality of life (HRQL) Validated tool for the assessment of pouch function after restorative proctectomy

Other Potentially Useful PROMs

Certain important colorectal PROMs were not included in the quality assessment due to the infrequency of their utilization over the study period. This is likely due to their more recent development. From authors consensus, we identified two PROMs of note. They are reported here, but are not officially part of the review and therefore this description is not endorsement. The Patient-Reported Outcomes Measurement Information System (PROMIS®) (Francis score=N/A) is a set of patient-centered measures that described and tracks physical, mental, and social health developed by the National Institutes of Health.40 It can be used with both healthy patients and those with chronic disease states. PROMIS was established to meet a gap in the clinical research setting for a high-quality instrument to measure such PROs as pain, fatigue, physical functioning, emotional distress, and social role participation that have a major impact on quality-of-life across a range of chronic diseases.41,42

PROMIS is a free to use collection of rigorously developed, accurate measures of patient-reported health status for physical, mental, and social well-being. These measures can be leveraged to describe health symptoms and health-related quality of life domains such as pain, fatigue, depression, and physical function, which are important for both research and clinical cancer in a variety of chronic diseases, including numerous colorectal diseases. One of the goals of PROMIS was to establish a comprehensive set of tools to broadly capture PROs, as there had previously been little standardization among existing tools.

PROMIS has developed over 70 domains that capture pain, fatigue, depression, anxiety, sleep disturbance, physical function, social function, and sexual function that are important in the assessment of disease states and treatment success. To promote broad adoption, the tests have been translated into over 40 different languages commonly used in the United States and across the globe. PROMIS measures can be captured in a number of different ways, including paper versions, web-based versions suitable for tablets and smartphones and integration in the electronic medical record. Utilization in both the US and internationally has been steadily increasing, both for research and use in the clinical realm. As there are multiple sub measures and psychometric testing is widely dispersed, it is impossible to assign a Francis score.

The Diverticulitis Quality of Life- DV-QOL (Francis score=15) is another, newer PROM designed to measure QoL in patients with diverticulitis.43 It is a 17-item instrument with scales in physical, concerns, emotion and behavioral changes. All scale scores and sub-scale scores are converted linearly along a 10-point scale, where lower scores denote improved HRQOL.

Disease Specific PROMs

To facilitate potential reader selection on PROM selection by disease, we present a brief description of PROMs in each disease category where more than one PROM was identified.

Colorectal Cancer

We identified three potential PROMs for utilization in patients with colorectal cancer. These include the EORTC QLQ-CR38, EORTC QLQ-CR29, and the FACT-C. All scored highly in quality assessment. The EORTC QLQ-CR38 was the most widely used but has the highest number of items. The EORTC QLQ-CR29 focuses mainly on bowel function, while the EORTC QLQ-CR38, and the FACT-C include domains and subscales for emotional and social well-being. All appear appropriate for use in both clinical and research settings.

Fecal Incontinence

Five PROMs assessed fecal incontinence. All scored highly in quality assessment with the exception of the Wexner Incontinence Scale aka Cleveland Clinic Incontinence Score (Francis Score 7). This was due to the original article providing little data on psychometric properties. Subsequent articles have demonstrated evidence for reliability, validity and sensitivity. Both the FISI and the Wexner Incontinence Scale were the most widely used. All appear appropriate for use in both clinical and research settings. The FISI and the Wexner Incontinence Scale focus purely on function and have the advantage of the lowest number of items and associated decreased respondent burden. The FI-QOL adds domains for the effect on lifestyle, coping, depression and embarrassment and would be a more appropriate measure for assessing the impact of the change in function. The LARS was designed to assess patients who have undergone sphincter preserving surgery.

Sexual Function

Two PROMs were identified that measure sexual function- the International Index of Erectile Function/Dysfunction- IEF-5, IIED and the Female Sexual Function Index – FSFI. Both scored highly on the Francis scale. Both are gender specific with the IIED assessing male function and the FSFI assessing female function. Because of this, selection depends on the gender of the study population. They can be used in tandem.

Constipation

Constipation was assessed with three identified PROMs- the Knowles-Eccersley-Scott Symptom Questionnaire -KESS, the Wexner/Cleveland Clinic Constipation Score -CCCS, CCF-CS, Agachan-Wexner Score, and the Patient Assessment of Constipation Quality of Life- PAC-QOL. The KESS and PAC-QOL both scored highly on the Francis scale while the CCCS scored lower due to lack of data on psychometric properties. The KESS has the advantage of being able to distinguish between various subtypes of constipation. While the CCCS focuses solely on function the PAC-QOL was developed to evaluate the effect of constipation on quality of life.

Inflammatory Bowel Disease

The review of the literature identified two PROMs that were widely used for Inflammatory Bowel Disease- the IBDQ and the abbreviated version, the SIBDQ. As they have similar psychometric qualities, the SIBDQ may be a better choice as its lower items represents less of a response burden.

Discussion

This study conducted a systematic review for studies assessing colorectal surgery utilizing PROMS according to PRISMA-P guidelines. Over 300 studies were identified that used 165 distinct PROMs. Thirty PROMs (18%) were found to be used more than 5 times, while 102 (62%) were used only once. The majority of PROMs (58%) were used in studies investigating neoplasia. A quality assessment of the 30 PROMS used 5 times or more found the majority of the PROMs to be of high quality.

This study demonstrates the wide range of PROMs that researchers have at their disposal for assessment of colorectal surgery. It also suggests a burden that clinicians have in interpreting studies that use different PROMs. This comes both from understanding the data that the PROMs collect as well as the psychometric qualities. By restricting studies to larger numbers and surgical interventions, this study attempts to capture only the most widely used and applied PROMs. Even with this exclusion criteria, 368 studies and 165 PROMs were identified. These assessed a wide range of colorectal surgical experiences, from cancer resection to fecal incontinence. One of the aims of this study was to provide synthesis and quality assessment of the daunting variety of PROMs.

While this is the first systematic analysis of all colorectal PROMs, a review of the literature reveals other studies that have systematically analyzed PROMs in specific colorectal issues. McNair et al examined PROMs in colorectal cancer surgery. Similar to this study, they found the EORTC QLQ-C30 to be a popular PROM and that most PROMs (69%) were used in only one study.44 Fiore et al studied PROMs in abdominal surgery and found that limited evidence supports the measurement properties of existing PROMs used in the context of recovery after abdominal surgery.45 Andeweg et al performed a metanalysis of PROMS comparing conservative or surgical management of diverticulitis.46 They noted that patients have better QOL and fewer symptoms after laparoscopic surgery vs conservative treatment. This study differs from those mentioned above by providing a broad overview of PROMs in colorectal surgery and provides a dedicated quality assessment of the most popular ones. A strength of our review was the use of the Francis checklist to critically appraise methodological quality of the most widely used PROMS.7 The Francis approach provides a structured protocol for assessing studies on measurement properties and has been cited in the literature to improve the selection of PROMs.4749 In addition, this review was conducted and reported according to the PRISMA guidelines to reduce reporting and publication bias.

In examining the most widely used PROMs as a group, a number of areas of weakness were identified by the Francis score. Items with fewer than 75% compliance include: evidence that a that members of the respondent population were involved, description of question development methodology, description of change over time psychometrics, documentation on how to score the measure, management of missing data, and interpretation of PROM score. Additionally, none of the index papers offered a description of literacy level. While subsequent studies may add further information, management of missing data and literacy levels must be acknowledged. Missing data is a byproduct of real-world data collection. Its management, especially related to test validity, is important to researchers using the measure. Predetermined literacy levels will help ensure that subjects are able to understand and accurately reply to questionnaires. Researchers and clinicians using these measures, as well as developers of future PROMs, should be aware of the shortcomings. Improving these areas should be a priority of future research.

While this study has a number of strengths, including broad assessment and dedicated analysis by authors with training in psychometric properties (ATH, RR, JC & DFP), there are limitations to be acknowledged. First, the study used the original articles to base the quality assessment. Ancillary analyses can address initial missing psychometric items such as validity, test-retest properties and readability that this analysis may not capture. However, the bulk of PROMs we assessed were already high quality. Second, although the Francis checklist is comprehensive and is the result of expert consensus, it has yet to be externally validated.

Third, PROMs were selected for quality analysis based on frequency of use rather than other factors such as impact of the article or number of patients in the study. This approach may result in some high quality and useful PROMs being excluded from the quality assessment. An attempt was made to address this via the description of two PROMs- PROMIS and DV-QOL, that the authors viewed as relevant. We cannot exclude that this review is subject to language bias as non-English language studies were not included. Additionally, authors were not contacted for additional information. Fourth, it should be noted that while a construct can be defined by a specification equation, a PROM is simply a set of relevant items. Several instruments may have the same or similar content, but if these are presented in different ways or with different response formats, they may assess alternative constructs. As such, most PROMS are not truly comparable. Fifth, we chose the Francis scale to perform our quality assessment. There are other models and criteria available to evaluate the quality of PROMS (Rasch model, COSMIN criteria). Use of these tools may results in different conclusions. Finally, the quality assessment observed somewhat of a celling effect, with most PROMs clustered together with high scores. This would suggest a threshold level of quality rather than a spectrum distribution.

Going forward, this study demonstrates the wide range of PROMs used in colorectal surgery research. Traditionally, development of PROMs relied on expert opinion and common sense rather than rigorous psychometric analysis.50 The use of PROMs has evolved in recent years after the US FDA51 and the European Medicines Agency (EMA)52 published standards for regulatory approvals based on data from PROMs. The EMA emphasized the need for adequate measurement properties in PROMs while the US FDA has been more expansive in terms of requirements. They have focused on requiring a protocol-driven process of PROM development specific to the patient population or disease process.51 This has resulting in a staggering number and variety of quality in PROMs used to assess colorectal surgery. To address this, the authors call on societies, such as the American Society of Colon and Rectal Surgeons, American College of Surgeons, the Association of Academic Surgeons, and other to develop guidelines to assess and recommend PROMs for both clinical and research adoption. In addition, PROMs instrument developers should focus on producing conceptual models of the outcomes they need to measure and employ response models that generate unidimensional measurement. This will ensure the use of high-quality PROMs in future research that will in turn guide clinical decision making in the years to come.

CONCLUSION

PROMs are widely used in colorectal research. There was a wide range of PROMS available and many were used only once. The most frequently used PROMs are generally high quality, but a majority lack details on how to deal with missing data and information on literacy levels. As the use of PROMS to assess colorectal surgical intervention increases, researchers and practitioners need to become more familiar with assessment of quality and increase their knowledge of these important and impactful tools.

Supplementary Material

Appendix 1
Appendix 2
1

ACKNOWLEDGMENT

The authors thank Rachel Walden for help with the search terms, David Francis, MD, MPH & Irene Feuhre, PhD for study design, Melissa McPheeters PhD, MPH for her review of the final manuscript, and Ricky Shinall, MD for his assistance with revision.

PROMIS, Patient-Reported Outcomes Measurement Information System, and the PROMIS logo are marks owned by the U. S. Department of Health and Human Services.

Funding/Support: Dr Hawkins work on this manuscript was supported by the National Institute of Diabetes and Digestive and Kidney Disease of the National Institutes of Health under grant number K23DK118192. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

Financial Disclosures: None reported.

REFERENCES

  • 1.Basch E Patient-reported outcomes - harnessing patients’ voices to improve clinical care. N Engl J Med. 2017;376:105–108. [DOI] [PubMed] [Google Scholar]
  • 2.Black N Patient reported outcome measures could help transform healthcare. BMJ. 2013;346:f167. [DOI] [PubMed] [Google Scholar]
  • 3.Patrick DL, Burke LB, Powers JH, et al. Patient-reported outcomes to support medical product labeling claims: FDA perspective. Value Health. 2007;10(suppl 2):S125–S137. [DOI] [PubMed] [Google Scholar]
  • 4.Reeve BB, Wyrwich KW, Wu AW, et al. ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Qual Life Res. 2013;22:1889–1905. [DOI] [PubMed] [Google Scholar]
  • 5.Johnston BC, Ebrahim S, Carrasco-Labra A, et al. Minimally important difference estimates and methods: a protocol. BMJ Open. 2015;5:e007953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21:651–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Francis DO, McPheeters ML, Noud M, Penson DF, Feurer ID. Checklist to operationalize measurement characteristics of patient-reported outcome measures. Syst Rev. 2016;5:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shamseer L, Moher D, Clarke M, et al. ; PRISMA-P Group. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350:g7647. [DOI] [PubMed] [Google Scholar]
  • 9.Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30:473–483. [PubMed] [Google Scholar]
  • 10.Ware J Jr, Kosinski M, Keller SDA. A 12-item short-form health survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34:220–233. [DOI] [PubMed] [Google Scholar]
  • 11.Aaronson NK, Ahmedzai S, Bergman B, et al. The European Organization for Research and Treatment of Cancer QLQ-C30: a quality-of-life instrument for use in international clinical trials in oncology. J Natl Cancer Inst. 1993;85:365–376. [DOI] [PubMed] [Google Scholar]
  • 12.EuroQol Group. EuroQol--a new facility for the measurement of health-related quality of life. Health Policy. 1990;16:199–208. [DOI] [PubMed] [Google Scholar]
  • 13.Fazio VW, O’Riordain MG, Lavery IC, et al. Long-term functional outcome and quality of life after stapled restorative proctocolectomy. Ann Surg. 1999;230:575–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Eypasch E, Williams JI, Wood-Dauphinee S, et al. Gastrointestinal Quality of Life Index: development, validation and application of a new instrument. Br J Surg. 1995;82:216–222. [DOI] [PubMed] [Google Scholar]
  • 15.Temple LK, Bacik J, Savatta SG, et al. The development of a validated instrument to evaluate bowel function after sphincter-preserving surgery for rectal cancer. Dis Colon Rectum. 2005;48:1353–1365. [DOI] [PubMed] [Google Scholar]
  • 16.Bakx R, Sprangers MA, Oort FJ, et al. Development and validation of a colorectal functional outcome questionnaire. Int J Colorectal Dis. 2005;20:126–136. [DOI] [PubMed] [Google Scholar]
  • 17.Sprangers MA, te Velde A, Aaronson NK; European Organization for Research and Treatment of Cancer Study Group on Quality of Life. The construction and testing of the EORTC colorectal cancer-specific quality of life questionnaire module (QLQ-CR38). Eur J Cancer. 1999;35:238–247. [DOI] [PubMed] [Google Scholar]
  • 18.Gujral S, Conroy T, Fleissner C, et al. ; European Organisation for Research and Treatment of Cancer Quality of Life Group. Assessing quality of life in patients with colorectal cancer: an update of the EORTC quality of life questionnaire. Eur J Cancer. 2007;43:1564–1573. [DOI] [PubMed] [Google Scholar]
  • 19.Whistance RN, Conroy T, Chie W, et al. ; European Organisation for the Research and Treatment of Cancer Quality of Life Group. Clinical and psychometric validation of the EORTC QLQ-CR29 questionnaire module to assess health-related quality of life in patients with colorectal cancer. Eur J Cancer. 2009;45:3017–3026. [DOI] [PubMed] [Google Scholar]
  • 20.Ward WL, Hahn EA, Mo F, Hernandez L, Tulsky DS, Cella D. Reliability and validity of the Functional Assessment of Cancer Therapy-Colorectal (FACT-C) quality of life instrument. Qual Life Res. 1999;8:181–195. [DOI] [PubMed] [Google Scholar]
  • 21.Agachan F, Chen T, Pfeifer J, Reissman P, Wexner SD. A constipation scoring system to simplify evaluation and management of constipated patients. Dis Colon Rectum. 1996;39:681–685. [DOI] [PubMed] [Google Scholar]
  • 22.Vaizey CJ, Carapeti E, Cahill JA, Kamm MA. Prospective comparison of faecal incontinence grading systems. Gut. 1999;44:77–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rothbarth J, Bemelman WA, Meijerink WJ, et al. What is the impact of fecal incontinence on quality of life? Dis Colon Rectum. 2001;44:67–71. [DOI] [PubMed] [Google Scholar]
  • 24.Pucciani F, Iozzi L, Masi A, Cianchi F, Cortesini C. Multimodal rehabilitation for faecal incontinence: experience of an Italian centre devoted to faecal disorder rehabilitation. Tech Coloproctol. 2003;7:139–147. [DOI] [PubMed] [Google Scholar]
  • 25.Rockwood TH, Church JM, Fleshman JW, et al. Fecal Incontinence Quality of Life Scale: quality of life instrument for patients with fecal incontinence. Dis Colon Rectum. 2000;43:9–16. [DOI] [PubMed] [Google Scholar]
  • 26.Rockwood TH, Church JM, Fleshman JW, et al. Patient and surgeon ranking of the severity of symptoms associated with fecal incontinence: the fecal incontinence severity index. Dis Colon Rectum. 1999;42:1525–1532. [DOI] [PubMed] [Google Scholar]
  • 27.Emmertsen KJ, Laurberg S. Low anterior resection syndrome score: development and validation of a symptom-based scoring system for bowel dysfunction after low anterior resection for rectal cancer. Ann Surg. 2012;255:922–928. [DOI] [PubMed] [Google Scholar]
  • 28.Rosen RC, Riley A, Wagner G, Osterloh IH, Kirkpatrick J, Mishra A. The international index of erectile function (IIEF): a multidimensional scale for assessment of erectile dysfunction. Urology. 1997;49:822–830. [DOI] [PubMed] [Google Scholar]
  • 29.Rosen R, Brown C, Heiman J, et al. The Female Sexual Function Index (FSFI): a multidimensional self-report instrument for the assessment of female sexual function. J Sex Marital Ther. 2000;26:191–208. [DOI] [PubMed] [Google Scholar]
  • 30.Knowles CH, Eccersley AJ, Scott SM, Walker SM, Reeves B, Lunniss PJ. Linear discriminant analysis of symptoms in patients with chronic constipation: validation of a new scoring system (KESS). Dis Colon Rectum. 2000;43:1419–1426. [DOI] [PubMed] [Google Scholar]
  • 31.Marquis P, De La Loge C, Dubois D, McDermott A, Chassany O. Development and validation of the Patient Assessment of Constipation Quality of Life questionnaire. Scand J Gastroenterol. 2005;40:540–551. [DOI] [PubMed] [Google Scholar]
  • 32.Barry MJ, Fowler FJ Jr, O’Leary MP, Bruskewitz RC, Holtgrewe HL, Mebust WK; Measurement Committee of The American Urological Association. Measuring disease-specific health status in men with benign prostatic hyperplasia. Med Care. 1995;33(suppl):AS145–AS155. [PubMed] [Google Scholar]
  • 33.Grant M, Ferrell B, Dean G, Uman G, Chu D, Krouse R. Revision and psychometric testing of the City of Hope Quality of Life-Ostomy Questionnaire. Qual Life Res. 2004;13:1445–1457. [DOI] [PubMed] [Google Scholar]
  • 34.Guyatt G, Mitchell A, Irvine EJ, et al. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology. 1989;96:804–810. [PubMed] [Google Scholar]
  • 35.Irvine EJ, Zhou Q, Thompson AK. The Short inflammatory bowel disease questionnaire: a quality of life instrument for community physicians managing inflammatory bowel disease. CCRPT Investigators. Canadian Crohn’s relapse prevention trial. Am J Gastroenterol. 1996;91:1571–1578. [PubMed] [Google Scholar]
  • 36.Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr Scand. 1983;67:361–370. [DOI] [PubMed] [Google Scholar]
  • 37.Bjelland I, Dahl AA, Haug TT, Neckelmann D. The validity of the Hospital Anxiety and Depression Scale. An updated literature review. J Psychosom Res. 2002;52:69–77. [DOI] [PubMed] [Google Scholar]
  • 38.Barber MD, Kuchibhatla MN, Pieper CF, Bump RC. Psychometric evaluation of 2 comprehensive condition-specific quality of life instruments for women with pelvic floor disorders. Am J Obstet Gynecol. 2001;185:1388–1395. [DOI] [PubMed] [Google Scholar]
  • 39.Oresland T, Fasth S, Nordgren S, Hultén L. The clinical and functional outcome after restorative proctocolectomy. A prospective study in 100 patients. Int J Colorectal Dis. 1989;4:50–56. [DOI] [PubMed] [Google Scholar]
  • 40.Measures H. Available at: http://www.healthmeasures.net/explore-measurement-systems/promis. Accessed 3/28/19.
  • 41.Cella D, Riley W, Stone A, et al. ; PROMIS Cooperative Group. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidepmiol. 2010;63:1179–1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Liu H, Cella D, Gershon R, et al. Representativeness of the patient-reported outcomes measurement information system internet panel. J Clin Epidemiol. 2010;63:1169–1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Spiegel BM, Reid MW, Bolus R, et al. Development and validation of a disease-targeted quality of life instrument for chronic diverticular disease: the DV-QOL. Qual Life Res. 2015;24:163–179. [DOI] [PubMed] [Google Scholar]
  • 44.McNair AG, Whistance RN, Forsythe RO, et al. ; CONSENSUS-CRC (Core Outcomes and iNformation SEts iN SUrgical Studies - ColoRectal Cancer) Working Group. Synthesis and summary of patient-reported outcome measures to inform the development of a core outcome set in colorectal cancer surgery. Colorectal Dis. 2015;17:O217–O229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Fiore JF Jr, Figueiredo S, Balvardi S, et al. How do we value postoperative recovery? A systematic review of the measurement properties of patient-reported outcomes after abdominal surgery. Ann Surg. 2018;267:656–669. [DOI] [PubMed] [Google Scholar]
  • 46.Andeweg CS, Berg R, Staal JB, ten Broek RP, van Goor H. Patient-reported outcomes after conservative or surgical management of recurrent and chronic complaints of diverticulitis: systematic review and meta-analysis. Clin Gastroenterol Hepatol. 2016;14:183–190. [DOI] [PubMed] [Google Scholar]
  • 47.Poku E, Aber A, Phillips P, et al. Systematic review assessing the measurement properties of patient-reported outcomes for venous leg ulcers. BJS Open. 2017;1:138–147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Nowinski CJ, Miller DM, Cella D. Evolution of patient-reported outcomes and their role in multiple sclerosis clinical trials. Neurotherapeutics. 2017;14:934–944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Lamarche L, Tejpal A, Mangin D. Self-efficacy for medication management: a systematic review of instruments. Patient Prefer Adherence. 2018;12:1279–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.McKenna SP. Measuring patient-reported outcomes: moving beyond misplaced common sense to hard science. BMC Med. 2011;9:86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Administration USDoHaHSFaD. Guidance for industry. Patient-Reported Outcome measures: use in medical product development to support labeling claims. Available at: http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf. Accessed May 28, 2019. [DOI] [PMC free article] [PubMed]
  • 52.Venkatesan P New European guidance on patient-reported outcomes. Lancet Oncol. 2016;17:e226. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 1
Appendix 2
1

RESOURCES