Abstract
BACKGROUND
Accurately predicting the risk of no-show for a scheduled colonoscopy can help target interventions to improve compliance with colonoscopy, and thereby reduce the disease burden of colorectal cancer and enhance the utilization of resources within endoscopy units.
OBJECTIVES
We aimed to utilize information available in an electronic medical record (EMR) and endoscopy scheduling system to create a predictive model for no-show risk, and to simultaneously evaluate the role for natural language processing (NLP) in developing such a model.
DESIGN
This was a retrospective observational study using discovery and validation phases to design a colonoscopy non-adherence prediction model. An NLP-derived variable called the Non-Adherence Ratio (“NAR”) was developed, validated, and included in the model.
PARTICIPANTS
Patients scheduled for outpatient colonoscopy at an Academic Medical Center (AMC) that is part of a multi-hospital health system, 2009 to 2011, were included in the study.
MAIN MEASURES
Odds ratios for non-adherence were calculated for all variables in the discovery cohort, and an Area Under the Receiver Operating Curve (AUC) was calculated for the final non-adherence prediction model.
KEY RESULTS
The non-adherence model included six variables: 1) gender; 2) history of psychiatric illness, 3) NAR; 4) wait time in months; 5) number of prior missed endoscopies; and 6) education level. The model achieved discrimination in the validation cohort (AUC= =70.2 %). At a threshold non-adherence score of 0.46, the model’s sensitivity and specificity were 33 % and 92 %, respectively. Removing the NAR from the model significantly reduced its predictive power (AUC = 64.3 %, difference = 5.9 %, p < 0.001).
CONCLUSIONS
A six-variable model using readily available clinical and demographic information demonstrated accuracy for predicting colonoscopy non-adherence. The NAR, a novel variable developed using NLP technology, significantly strengthened this model’s predictive power.
Electronic supplementary material
The online version of this article (doi:10.1007/s11606-014-3165-6) contains supplementary material, which is available to authorized users.
KEY WORDS: adherence, cancer screening, colorectal cancer, quality improvement, prediction rules
INTRODUCTION
Colorectal cancer (CRC) remains the third most commonly diagnosed cancer and the second most common cause of cancer-related deaths in the United States.1 Colonoscopy can reduce CRC-related mortality by up to 90 %, and is cost-effective.2–6
Despite clear benefits, screening colonoscopy is underutilized, and rates of non-adherence with scheduled colonoscopy and flexible sigmoidoscopy are high, ranging from 5 to 40 %.7–12 Consistent identification of patients at risk for non-adherence with scheduled colonoscopy is a critical first step towards improving adherence. A number of studies have identified predictors of non-adherence with colonoscopy scheduling and completion.9–11,13–17 However, no study has described a validated multivariable model for predicting non-adherence with scheduled colonoscopy. Furthermore, no automated method to assess a patient’s prior record of non-adherence has been developed, an essential step for any implementation strategy involving large patient populations.
Natural Language Processing (NLP) systems—which enable the accurate extraction and interpretation of information from free text electronic documents—may facilitate the incorporation of clinically important predictive factors that are not readily available in structured databases. NLP has been used to facilitate numerous assessments of health care quality and safety.18–22
This project had two major aims: 1) to develop an automated method utilizing NLP that evaluates patients’ prior non-adherence with health care services; 2) to use this adherence metric and readily available clinical, demographic, and scheduling data to develop and validate a prediction model that accurately identifies individuals at high risk for non-adherence with scheduled colonoscopy.
METHODS
Study Design
Our study included a discovery phase and a validation phase. The discovery phase consisted of a matched case–control analysis. Using results from this analysis, we built a logistic regression model for predicting non-adherence with scheduled diagnostic or screening colonoscopy. We then validated the model’s ability to predict adherence with scheduled colonoscopy in a distinct, randomly selected patient cohort.
Per standard practice at this institution, all patients scheduled to undergo outpatient colonoscopy received a phone call reminder about their upcoming colonoscopy approximately 1 week prior. Patients who scheduled their appointment within 7 days of the procedure date did not always receive this reminder.
This project was designed and carried out from 2009 to 2011 at an academic medical center (AMC) in the Northeastern United States and was approved by this AMC’s Institutional Review Board (IRB).
Patients
We used IDX, the gastroenterology department’s ambulatory scheduling database (IDX systems, Burlington, VT; USA), to identify all patients with scheduled outpatient colonoscopies between 1 December 2009 and 15 October 2010 (“index colonoscopy”). The discovery cohort included 110 patients from this list who were non-adherent with scheduled diagnostic or screening colonoscopy (“cases”) and 220 adherent patients (“controls”), matched for procedure date. The validation cohort consisted of 1,200 randomly selected individuals with scheduled outpatient colonoscopies at our institution between 1 December 2010 and 31 November 2011 (“index colonoscopy”). Exclusion criteria included: 1) cancellation of scheduled colonoscopy; 2) receipt of inpatient colonoscopy in place of outpatient colonoscopy; 3) matching of a control to two cases (in such instances, the control was not double-counted); 4) primary residence outside of the United States; and 5) incomplete chart review.
The Non-Adherence Ratio and Non-Adherence Query
The Queriable Patient Inference Dossier,23 or QPID, is a natural language search engine developed at our institution. QPID allows the development and deployment of regular-expression based NLP queries, which allows users to search free text in a patient’s Electronic Medical Record (EMR) for specific words, phrases, data, and combinations thereof. Using QPID, we developed a unique chart review variable, which we named the Non-Adherence Ratio (NAR).24 The NAR includes an automated assessment of a patient’s prior non-adherence with health care services, called the Non-Adherence Query (NAQ). The NAQ searches a patient’s EMR for phrases that clinicians commonly use to document non-adherence, including “no show,” “did not present,” “failed to attend,” and “missed appointment.” The NAQ also looks for pre-specified combinations of these phrases. The development and validation of the NAQ and NAR are described in eMethods 1 (see electronic supplementary material).
Discovery Cohort
Data Collection
Chart reviews were performed by a research assistant and two physicians who were blinded to case/control status. Chart reviewers used standardized methods and definitions. Chart review variables were chosen based on prior evidence of an association with colonoscopy non-adherence, a plausible association with colonoscopy non-adherence, and ease of consistent, objective, measurement.8–10,25,26 Chart review variables included: age, gender, insurance status (Private, Medicare, Mass Health, Medicaid, or None), education level, personal history of bright red blood per rectum (BRBPR) or positive fecal occult blood test (FOBT), history of CRC in a first-degree relative, history of colon polyps, and history of psychiatric illness. History of psychiatric illness was defined as documentation in the medical record of a mood, anxiety, and/or psychotic disorder. Education level was stratified into “no more than high school” education and “college or graduate school” education (e.g., anyone with at least some college education). Data about the number of missed endoscopies prior to the index colonoscopy, the index colonoscopy booking date, and the procedure date were obtained from IDX. A missed endoscopy was defined as non-attendance at a scheduled outpatient esophagogastroduodenoscopy, endoscopic ultrasound, endoscopic retrograde cholangiopancreatography, flexible sigmoidoscopy, or colonoscopy. Chart review data were collected and managed using REDCap electronic data capture tools (Vanderbilt University; Nashville, TN).27
Statistical Analysis
Univariate and unconditional multivariate logistic regression analyses both with and without the NAR were performed using SAS Analytics Software (version 9.3, SAS Institute Inc.; Cary, North Carolina). Model building was guided by clinical judgment and results of univariate and multivariate regression analyses of potential predictors. The non-adherence model’s output, called the “Non-Adherence Score,” was a decimal, which, when converted to a percentage, represented the predicted probability of non-adherence with scheduled colonoscopy. With α = 0.05 and β = 0.20, we calculated that we would need 107 cases and 214 controls to correctly identify a variable associated with a 100 % difference in the odds of colonoscopy adherence. When setting our validation cohort’s size, we attempted to balance the benefits of a large sample size with the significant time commitment required to perform chart reviews. We considered a two-tailed p value of 0.05 or less to be statistically significant.
Validation Phase
Data Collection
Chart reviews were performed for each patient in the validation cohort to gather information for demographic and clinical variables included in the discovery analysis (listed above). Data on booking date, scheduling date, and number of missed endoscopies were obtained from IDX. NAR and wait time were calculated for each patient.
Statistical Analysis
We compared no-shows and attendees in the discovery and validation cohorts using chi-square tests for categorical variables and t-tests for differences of means for continuous variables. Positive predictive value (PPV) and negative predictive value (NPV) for the model were estimated (Note: PPV refers to the likelihood of non-adherence with colonoscopy, while NPV refers to the likelihood of adherence). Area under the receiver operator characteristic (AUC) curve and net reclassification index were used to compare models with and without NAR. Each patient’s Non-Adherence Score was assessed using the non-adherence model designed in the discovery cohort (with discovery cohort regression coefficients and intercept). Patients whose non-adherence scores were higher than a threshold value (identified below) were deemed to be at high risk for non-adherence. Sensitivity analyses assessed how the population no-show rate influenced the model’s PPV and NPV, and the model’s performance in the validation cohort after excluding patients with a prior history of CRC and/or endoscopy.
RESULTS
Characteristics of Discovery Cohort
The final discovery cohort included 318 patients: 107 cases and 211 controls (Table 1). Twelve patients were excluded from the discovery cohort analysis (eFigure 1a, see electronic supplementary material). Cases were significantly more likely than controls to have public health insurance (OR 3.11, p < 0.001 for Medicare vs. Private; OR 3.66, p < 0.001 for Mass Health, Medicaid, or None vs. Private), a history of BRBPR or a (+) FOBT (OR 1.86, p = 0.03), and a history of psychiatric illness (OR 2.25, p < 0.001) (Table 2). Forty-one percent and 66.8 % of cases and controls attended college, respectively (OR 0.35 for college attendance, p < 0.001). The mean NAR for no-shows was 7.2 %, compared with 2.0 % for controls (p < 0.001) (Table 1). A 10 % increase in the NAR was associated with a 297 % increase in non-attendance (p < 0.001) (Table 2). Missing one prior endoscopy appointment was associated with a 160 % increase in no-show risk (p < 0.001).
Table 1.
Characteristic | Discovery cohort | Validation cohort | OR or difference of means (validation vs. discovery cohort)h | |||
---|---|---|---|---|---|---|
No-shows (N = 107) | Attendees (N = 211) | No-shows (N = 89) | Attendees (N = 1025) | No shows | Attendees | |
Age—mean (yrs) | 57 | 57 | 57 | 57 | −0.15 (−3.18, 2.87) | −0.69 (−2.12, 0.73) |
Gender—% male | 51 | 49 | 54 | 47 | 0.81 (0.46, 1.42) | 0.92 (0.69, 1.24) |
Insurance—% | ||||||
Private | 42 | 71 | 52 | 68 | 0.68 (0.39, 1.20) | 1.15 (0.83, 1.59) |
Medicare | 26 | 14 | 25 | 22 | 0.77 (0.36, 1.63) | 1.60 (1.04, 2.53)* |
Mass health/medicaid/nonea | 32 | 15 | 23 | 10 | 0.60 (0.29, 1.26) | 0.71 (0.45, 1.14) |
Education level—% | ||||||
No more than high schoolb | 59 | 33 | 51 | 26 | 1.4 (0.80, 2.47) | 1.42 (1.04, 1.96)* |
College or graduate schoolc | 41 | 67 | 49 | 74 | 0.71 (0.41, 1.26) | 0.70 (0.51, 0.97)* |
Hx. colon polypsj—% | 14 | 16 | 26 | 41 | 2.14 (1.04, 4.41)* | 3.57 (2.42, 5.26)*** |
Hx. BRBPR or (+) FOBTd,j—% | 29 | 18 | 19 | 30 | 0.58 (0.30, 1.14) | 1.93 (1.32, 2.81)*** |
Personal Hx. CRCe,j—% | 3 | 2 | 3 | 3 | 0.83 (0.16, 4.20) | 0.71 (0.25, 2.06) |
CRC in first degree relativee—% | 8 | 8 | 16 | 16 | 2.03 (0.83, 4.95) | 2.39 (1.40, 4.08)** |
Hx. psychiatric illnessj—% | 53 | 34 | 52 | 42 | 0.94 (0.53, 1.65) | 1.44 (1.05, 1.96)* |
Prior missed endoscopies—% | ||||||
0 | 76 | 93.4 | 98 | 99.7 | 0.08 (0.01, 0.34)i,*** | 0.04 (0.01, 0.16)i,*** |
1 | 9 | 5.2 | 1 | 0.3 | 13.3 (2.91, 83.8)i,*** | 24.2 (6.46, 107)i,*** |
2 | 6 | 1.4 | – | – | ||
3 | 3 | – | 1 | – | ||
4 | 3 | – | – | – | ||
5 | – | – | – | – | ||
6 | 1 | – | – | – | ||
7 | – | – | – | – | ||
8 | 2 | – | – | – | ||
Mean NARf—% | 7.2 | 2 | 5.9 | 1.8 | 0.13 (−0.10, 0.35) | 0.03 (−0.03, 0.09) |
Mean wait time—monthsg | 1.8 | 1.5 | 1.6 | 1.6 | 0.24 (−0.16, 0.64) | −0.12 (−0.33, 0.09) |
Mean number of prior missed endoscopies | 0.6 | 0.1 | 0.04 | 0.00 | 0.54 (0.26, 0.83)*** | 0.08 (0.03, 0.12)*** |
aOne attendee from discovery cohort lacked health insurance
bCompleted grade school but no high school, some high school, and high school graduates
cCompleted some college, college graduates, and those who have attended graduate school
d BRBPR bright red blood per rectum; FOBT fecal occult blood test
e CRC colorectal cancer
f NAR non-adherence ratio
g1 month = 30 days
h p values for categorical variables are based on chi-square tests of odds ratios. p values for continuous variables are based on two-way t-tests for differences of means. Odds ratios were calculated as: odds X in discovery cohort/odds X in validation cohort
iOdds of at least one missed endoscopy in discovery cohort/odds of at least one missed endoscopy in validation cohort
jHx. = “History of”
* p < 0.05; ** P < 0.01; *** p < 0.001
Table 2.
Characteristic | Univariate regression | Multivariate regression | ||
---|---|---|---|---|
Odds ratio (95 % CI) | p value | Odds ratio (95 % CI) | p value | |
Age—per year | 1.01 (0.98, 1.03) | 0.62 | 0.98 (0.94,1.01) | 0.16 |
Gender* | ||||
Female | 1.00 | 1.00 | ||
Male | 1.11 (0.70, 1.77) | 0.66 | 1.81 (1.03,3.18) | 0.04 |
Insurance type | ||||
Private carrier | 1.00 | <0.001 | 1.00 | 0.10 |
Medicare | 3.11 (1.68, 5.75) | <0.001 | 2.23 (0.97,5.12) | 0.06 |
MassHealth/medicaid/none | 3.66 (2.03, 6.59) | <0.001 | 1.74 (0.86,3.54) | 0.13 |
Education level* | ||||
No more than high school† | 1.00 | 1.00 | ||
College or graduate school‡ | 0.35 (0.21, 0.56) | <0.001 | 0.65 (0.35,1.19) | 0.16 |
History of colon polyps | 0.85 (0.44, 1.64) | 0.63 | 0.79 (0.34,1.82) | 0.60 |
History of BRBPR or (+) FOBT§ | 1.86 (1.08, 3.20) | 0.03 | 1.58 (0.81, 3.08) | 0.18 |
CRC in first degree relative‖ | 1.12 (0.48, 2.62) | 0.80 | 0.73 (0.25, 2.17) | 0.57 |
History of psychiatric illness* | 2.25 (1.40, 3.61) | <0.001 | 1.90 (1.08, 3.36) | 0.03 |
NAR—per 10 % increase¶,* | 3.97 (2.48, 6.35) | <0.001 | 2.79 (1.68, 4.64) | <.0.001 |
Wait time—per month#,* | 1.14 (0.98, 1.32) | 0.09 | 1.28 (1.07, 1.52) | 0.006 |
Number of prior missed endoscopies—per missed endoscopy* | 2.60 (1.60, 4.24) | <0.001 | 1.78 (1.07, 2.95) | 0.03 |
This table presents results of univariate and multivariate logistic regression analyses of all potential predictors of colonoscopy non-adherence evaluated for inclusion in the non-adherence model. Analyses were performed in the discovery cohort
*Included in final non-adherence prediction model
†Completed grade school but no high school, some high school, and high school graduates
‡Completed some college, college graduates, and those who have attended graduate school
§ BRBPR bright red blood per rectum; FOBT fecal occult blood test
‖ CRC colorectal cancer
¶ NAR non-adherence ratio
#1 month = 30 days
Characteristics of Validation Cohort
The final validation cohort included 1114 patients: 89 no-shows and 1,025 adherent patients. Eighty-six patients were excluded from the initial 1,200 patient validation cohort (eFigure 1b, see electronic supplementary material). No-shows were more likely than attendees to have public insurance (Medicare insurance: 24.7 % vs. 21.9 %; Mass health or Medicaid: 23.6 % vs. 9.9 %; Table 1). Approximately 50 % of no-shows and 26 % of attendees had no more than a high school education. A prior history of colon polyps was noted in 25.8 and 40.7 % of no-shows and attendees, respectively, while 19.1 % of no-shows and 29.8 % of attendees had a history of BRBPR or (+) FOBT. The mean NAR was 5.9 and 1.8 % for non-adherent and adherent patients, respectively.
Comparison of Discovery and Validation Cohorts
No-shows in the validation cohort were more likely to have a history of polyps, and less likely to have missed a prior endoscopy (OR 0.54, p < 0.001) (Table 1). Moreover, attendees in the validation cohort were significantly more likely to have attended college, and less likely to have a history of polyps (OR 3.57, p < 0.001), BRBPR or a (+) FOBT (OR 1.93, p < 0.001), CRC in a first degree relative (OR 2.39, p < 0.001), or psychiatric illness (OR 1.44, p < 0.05) (Table 1).
Development of Non-Adherence Model
Five variables were significantly associated with colonoscopy non-adherence in multivariate regression analysis: 1) male gender (OR 1.81, p = 0.04); 2) history of psychiatric illness (OR 1.90, p = 0.03), 3) NAR (OR 2.79 for 10 % increase in NAR, p < 0.001); 4) wait time in months (OR 1.28 for each month, p = 0.006); and 5) number of prior missed endoscopies (OR 1.78 per missed endoscopy, p = 0.03) (Table 2). The final non-adherence model included these five variables and education level (Table 3). Education level was added to this model based on clinical judgment, and observed associations between this variable, CRC screening, and colonoscopy adherence.28–30 The final model’s AUC in the discovery cohort was 75.5 % (95 % CI 70–80.2 %) (Fig. 1a). Logistic regression coefficients for these variables, the model coefficient, and the final model equation are presented in Table 3. The model intercept yields a non-adherence score equal to the predicted probability of non-adherence in a population with the same prevalence as our discovery cohort, but the absolute magnitude of the non-adherence score does not affect its operating characteristics.
Table 3.
Parameter | Coefficient estimate | SE | OR | 95 % CI |
---|---|---|---|---|
Intercept | −1.89 | 0.35 | N/A | N/A |
Male gender | 0.55 | 0.28 | 1.73 | 1.00–3.01 |
Education level | ||||
No more than high school* | 0 | N/A | 1.00 | N/A |
College or graduate school† | −0.60 | 0.29 | 0.54 | 0.31–0.96 |
History of psychiatric illness | 0.80 | 0.28 | 2.26 | 1.30–3.90 |
Number of prior missed endoscopies (per missed endoscopy) | 0.63 | 0.25 | 1.87 | 1.14–3.08 |
Wait time (per month)‡ | 0.23 | 0.09 | 1.26 | 1.07–1.49 |
NAR (per 10 % increase)§ | 1.03 | 0.26 | 2.82 | 1.70–4.66 |
Calculating non-adherence score | ||||
Non-adherence score‖ = 1/(1+ 1/exp (0.55 *G−0.60*E+0.80*P +0.63* Q+0.23 * W+1.03* N−1.89)) | ||||
Key for variables | ||||
G = Gender ( 1 if male; 0 if female) | ||||
E = Education level (1 if completed some college, college graduate, and/or attended graduate school; 0 if no more than high school graduate) | ||||
P = History of psychiatric illness (1 if history of psychiatric illness present; 0 if no history of psychiatric illness) | ||||
Q = Number of missed endoscopies | ||||
W = Wait time in months (wait time in days divided by 30) | ||||
N = Patient’s NAR/10 %¶ (thus, if NAR = 20 %, N = 2) |
SE standard error; OR odds ratio; CI confidence interval
The estimate refers to the estimate of the logistic regression coefficient for each variable, and the SE represents this coefficient’s standard error
*Completed grade school but no high school, some high school, and high school graduates
†Completed some college, college graduates, and those who have attended graduate school
‡1 month = 30 days
§ NAR non-adherence ratio
‖Format for Non-Adherence Score equation: Score = 1/(1 + 1/exp(m1 × 1 + … m6 × 6 + b))
¶Coefficients and odds ratios for NAR are based on 10 % change in NAR. Thus, NAR must be divided by 10 % before being incorporated into the Non-Adherence Score calculation
Validation of Non-Adherence Model
The non-adherence model’s AUC in the validation cohort was 70.2 % (CI: 64.4–76.1 %) (Fig. 1b). At a Non-Adherence Score of 0.46, the model’s sensitivity (Sn) and specificity (Sp) were 33 and 92 %, respectively. At this threshold, the model correctly classified 29 of 89 no-shows, and 940 of 1,025 attendees, yielding a PPV and NPV of 26 and 94 %, respectively (Table 4). Assuming a constant Sn of 33 % and Sp of 92 %, an increase in the population non-adherence rate from 8 % (the rate in the validation cohort) to 20 % resulted in a PPV of 50 %, and an NPV of 85 % (eTable 1, see electronic supplementary material). Removing the NAR from the model reduced the AUC from 70.2 to 64.3 % (Difference = 5.9 %, p < 0.001). In sensitivity analyses, the model’s AUC was 70 % (95 % CI: 64–76 %) in validation cohort patients with no prior history of CRC (N = 1084), and 73 % (95 % CI: 65–80 %) in validation cohort patients with no prior history of endoscopy or CRC (N = 487).
Table 4.
Non-adherence score* | Classified correctly by model | Sn (%) | Sp (%) | PPV (%) | NPV (%) | |
---|---|---|---|---|---|---|
No shows (N = 89) | Attendees (N = 1025) | |||||
0.08 | 89 | 14 | 100 | 1.4 | 8 | 100 |
0.10 | 88 | 71 | 99 | 7 | 8 | 99 |
0.12 | 86 | 119 | 97 | 12 | 9 | 98 |
0.14 | 86 | 194 | 97 | 19 | 9 | 99 |
0.16 | 77 | 291 | 87 | 28 | 10 | 96 |
0.18 | 74 | 397 | 83 | 39 | 11 | 96 |
0.20 | 70 | 471 | 79 | 46 | 11 | 96 |
0.22 | 68 | 528 | 76 | 52 | 12 | 96 |
0.24 | 64 | 581 | 72 | 57 | 13 | 96 |
0.26 | 59 | 646 | 66 | 63 | 14 | 96 |
0.28 | 53 | 703 | 60 | 69 | 14 | 95 |
0.30 | 47 | 747 | 53 | 73 | 15 | 95 |
0.32 | 45 | 785 | 51 | 77 | 16 | 95 |
0.34 | 42 | 813 | 47 | 79 | 17 | 95 |
0.36 | 39 | 836 | 44 | 82 | 17 | 94 |
0.38 | 38 | 859 | 43 | 84 | 19 | 94 |
0.40 | 37 | 893 | 42 | 87 | 22 | 95 |
0.42 | 33 | 910 | 37 | 89 | 22 | 94 |
0.44 | 30 | 928 | 34 | 91 | 24 | 94 |
0.46 | 29 | 940 | 33 | 92 | 25 | 94 |
0.48 | 38 | 948 | 32 | 93 | 27 | 94 |
0.50 | 24 | 959 | 27 | 94 | 27 | 94 |
0.52 | 21 | 966 | 24 | 94 | 26 | 93 |
0.54 | 19 | 972 | 21 | 95 | 26 | 93 |
0.56 | 18 | 983 | 20 | 96 | 30 | 93 |
0.58 | 17 | 991 | 19 | 97 | 33 | 93 |
0.60 | 16 | 993 | 18 | 97 | 33 | 93 |
0.62 | 14 | 998 | 16 | 97 | 34 | 93 |
0.64 | 13 | 1002 | 15 | 98 | 36 | 93 |
0.66 | 13 | 1007 | 15 | 98 | 42 | 93 |
0.68 | 10 | 1009 | 11 | 98 | 39 | 93 |
0.70 | 9 | 1010 | 10 | 99 | 38 | 93 |
0.72 | 7 | 1011 | 8 | 99 | 33 | 93 |
0.74 | 7 | 1012 | 8 | 99 | 35 | 93 |
0.76 | 6 | 1014 | 7 | 99 | 35 | 92 |
0.78 | 6 | 1014 | 7 | 99 | 35 | 92 |
0.80 | 3 | 1014 | 3.4 | 99 | 21 | 92 |
Patients who are “classified correctly:” Patients who no showed for or attended their scheduled colonoscopy, and whom the model correctly identified as a no show or attendee, respectively
Patients who are “classified incorrectly:” Patients who no showed for or attended their scheduled colonoscopy, and whom the model incorrectly identified as an attendee or a no show, respectively
Sn sensitivity; Sp specificity; PPV positive predictive value; NPV negative predictive value
*Non-Adherence Score was calculated using non-adherence model
In multivariable regression analyses in the validation cohort, three model variables significantly predicted no shows: education level (OR 0.55 for college education, 95 % CI 0.33–0.91), NAR (OR 2.70 per 10 % increase, 95 % CI 1.85–3.93), and number of missed endoscopies (OR 4.04 per endoscopy, 95 % CI: 1.11–14.7) (eTable 2, see electronic supplementary material). Having Masshealth, Medicaid, or no insurance predicted an increased risk of non-adherence, and a history of colon polyps significantly reduced the likelihood of non-adherence.
In eTable 3 (see electronic supplementary material), we describe four hypothetical patients with a predicted Non-Adherence Score of at least 0.46 in order to highlight “real life” examples of individuals who would be considered to be at high risk for non-adherence using this model.
DISCUSSION
We developed and validated a six variable model for predicting non-adherence with scheduled colonoscopy. Two of the strongest predictors of non-attendance were the NAR and prior non-attendance with scheduled endoscopy. The NAR is a novel parameter that we designed and measured using an NLP system. Calculating the NAR through this automated and systematic approach can help to prospectively identify individuals who are at high-risk for colonoscopy non-adherence. Removing the NAR from the model significantly reduced its validated AUC.
These findings expand upon previous work by Turner et al., who identified an association between non-attendance with lower endoscopy and health care appointment non-adherence.9 A history of clinic appointment non-attendance has also been shown to predict future clinic appointment non-attendance, hospital readmission for congestive heart failure, and HIV viral load in patients on antiviral treatment.18,31,32 Thus, appointment-keeping behavior is an important predictor of adherence with many health care services.9
Measuring adherence with scheduled appointments across different departments requires access to a central scheduling database, which many clinicians and clinical support staff do not have. Our study demonstrates that NLP systems can enable real-time evaluation of a patient’s history of health service adherence using readily available unstructured data. We are not aware of any studies investigating a role for NLP in measuring appointment adherence.19–21,32–40 Thus, our work highlights a novel method using NLP for this purpose.
Although others have identified a relationship between appointment keeping behavior and endoscopy attendance, we demonstrate that such information can be incorporated into a non-adherence prediction model—a critical step towards improving health care quality. The NAR can be calculated quickly in real time by QPID using basic regular expression search technology that has been widely available since the 1960s and is common to virtually all existing NLP systems. So, this approach can be generalized.41–43 More broadly, this work highlights one way in which health information technology may facilitate process and quality improvement work.44,45
This model’s low sensitivity, high specificity, and high NPV make it well suited to help medical centers more effectively deploy resources to improve colonoscopy adherence. For example, “high-risk” patients, as identified by the model, could be assigned to patient navigators, who have been shown to significantly improve colonoscopy adherence.46–48 Moreover, while sensitivity and specificity are inherent characteristics of a model, its PPV and NPV depend upon the population-level incidence of the outcome of interest. Our model’s low PPV (25 %) may reflect the low baseline non-adherence rate in our population (8 %). Using a 20 % baseline non-adherence rate, as reported in previous studies, the model’s estimated PPV improves significantly (50 %), with only a modest reduction in the NPV.7,9,10,15,49 Moreover, the model is equally effective at predicting non-adherence by patients with no prior history of endoscopy, who may be at higher risk of colonoscopy non-adherence.9
Changes in the model’s NPV and PPV have implications for the allocation of resources, like patient navigators, that may improve colonoscopy adherence. For example, a higher model PPV increases the likelihood that a navigator will be appropriately assigned to a patient who otherwise would have no-showed. However, if the NPV drops, the model may also misclassify high-risk patients as low in risk. Nonetheless, in systems with limited resources for improving colonoscopy adherence, focusing on patients with higher non-adherence scores, which correspond to higher PPVs, will maximize effective resource utilization.
The differences between the discovery and validation cohorts may help to explain why only three model variables significantly predicted non-adherence in the validation sample. These differences likely stem from how the cohorts were constructed (i.e., randomly chosen cases with matched controls for the discovery sample and randomly selected cohort of all-comers for the validation sample), rather than from systematic differences between the populations from which they were derived. Indeed, the endoscopy suite’s patient population, colonoscopy referral and scheduling systems, reminder policies, and indications for colonoscopy did not change from 2009 to 2011. While chart reviewers were trained to use a consistent coding system, we cannot rule out minor variation between reviewers in the operationalization of certain variables.
Our work adds to existing literature about colonoscopy non-adherence in other ways. Higher education level has previously been associated with higher rates of CRC screening.50 Associations between endoscopy non-adherence, male gender, psychiatric illness, and appointment wait time have all been reported previously.17,22,26
This study has several limitations. We did not investigate certain variables that may be associated with non-attendance, including socio-economic status, ethnicity, indication for colonoscopy, colonoscopy referral source, or patient-specific barriers to attendance.10,14,16,25,32 Furthermore, because few patients in this study were uninsured, on Medicaid, or lacked college education, our findings may not extend to these populations. Likewise, because the NAR relies on data from prior interactions with a health system, it may have limited utility for predicting adherence for patients who are new to a health system, or rarely see a physician. In addition, our findings may not be generalizable to other health systems. Nonetheless, the significant differences between our discovery and validation cohorts, and our sensitivity analyses, provide some evidence that our model will perform adequately in other populations. Finally, our discovery analysis may not have been powered to detect certain clinically relevant predictors of adherence, or to precisely bound odds ratios for predictors that were identified. Further research is necessary to determine prospectively if implementation of this non-adherence model improves adherence with scheduled colonoscopy.
CONCLUSION
We developed and validated the first published model for predicting non-adherence with scheduled colonoscopy using data that are readily available in patient medical records and easily extractable with an NLP system. Our work confirms that a patient’s prior history of health care service non-adherence is a strong predictor of adherence with scheduled colonoscopy, and offers a novel tool that utilizes an NLP system to quickly and reliably assess a patient’s history of health care service adherence.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
Contributors
We would like to thank Lara Novak and Patricia Stevens for their assistance with this project.
Funding Sources
None.
Prior Presentations
“Predicting Non-adherence with Scheduled Elective Colonoscopy” oral abstract presented at Digestive Diseases Week, 2012. 22 May 2012. San Diego, CA.
Conflict of Interest
The authors declare that they do not have a conflict of interest.
REFERENCES
- 1.Walsh JM, Terdiman JP. Colorectal cancer screening: scientific review. JAMA. 2003;289(10):1288–1296. doi: 10.1001/jama.289.10.1288. [DOI] [PubMed] [Google Scholar]
- 2.Cesare H, Di Emilio G, Perry JP, et al. Cost effectiveness of colonoscopy, based on the appropriateness of an indication. Clin Gastroenterol Hepatol. 2008;6(11):1231–1236. doi: 10.1016/j.cgh.2008.06.009. [DOI] [PubMed] [Google Scholar]
- 3.Pignone M, Saha S, Hoerger T, Mandelblatt J. Cost-effectiveness analyses of colorectal cancer screening. Ann Intern Med. 2002;137(2):96–104. doi: 10.7326/0003-4819-137-2-200207160-00007. [DOI] [PubMed] [Google Scholar]
- 4.Doubeni CA, Weinmann S, Adams K, et al. Screening colonoscopy and risk for incident late-stage colorectal cancer diagnosis in average-risk adults. Ann Intern Med. 2013;158(5):312–320. doi: 10.7326/0003-4819-158-5-201303050-00003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Winawer SJ, Zauber AG, Ho MN, et al. Prevention of colorectal cancer by colonoscopic polypectomy. N Engl J Med. 1993;329(27):1977–1981. doi: 10.1056/NEJM199312303292701. [DOI] [PubMed] [Google Scholar]
- 6.Zauber AG, Winawer SJ, O’Brien MJ, et al. Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths. N Engl J Med. 2012;366(8):687–696. doi: 10.1056/NEJMoa1100370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kazarian ES, Carreira FS, Toribara NW, Denberg TD. Colonoscopy completion in a large safety net health care system. Clin Gastroenterol Hepatol. 2008;6(4):438–442. doi: 10.1016/j.cgh.2007.12.003. [DOI] [PubMed] [Google Scholar]
- 8.Kelly RB, Shank C. Adherence to screening flexible sigmoidoscopy in asymptomatic patients. Med Care. 1992;30(11):1029–1042. doi: 10.1097/00005650-199211000-00006. [DOI] [PubMed] [Google Scholar]
- 9.Turner BJ, Weiner M, Yang C, TenHave T. Predicting adherence to colonoscopy or flexible sigmoidoscopy on the basis of physician appointment–keeping behavior. Ann Intern Med. 2004;140(7):528–532. doi: 10.7326/0003-4819-140-7-200404060-00013. [DOI] [PubMed] [Google Scholar]
- 10.Denberg TD, Melahdo TV, Coombes JM, et al. Predictors of nonadherence to screening colonoscopy. J Gen Intern Med. 2005;20(11):989–995. doi: 10.1111/j.1525-1497.2005.00164.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Turner B, Weiner M, Berry S, Lillie K, Fosnocht K, Hollenbeak C. Overcoming poor attendance to first scheduled colonoscopy: a randomized trial of peer coach or brochure support. J Gen Intern Med. 2008;23(1):58–63. doi: 10.1007/s11606-007-0445-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Goodwin JS, Singh A, Reddy N, Riall TS, Kuo Y-F. Overuse of screening colonoscopy in the medicare population. Arch Intern Med. 2011;171(15):1335–1343. doi: 10.1001/archinternmed.2011.212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Adams LA, Pawlik J, Forbes GM. Nonattendance at outpatient endoscopy. Endoscopy. 2004;36(5):402–404. doi: 10.1055/s-2004-814329. [DOI] [PubMed] [Google Scholar]
- 14.Green AR, Peters-Lewis A, Percac-Lima S, et al. Barriers to screening colonoscopy for low-income latino and white patients in an urban community health center. J Gen Intern Med. 2008;23(6):834–840. doi: 10.1007/s11606-008-0572-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sola-vera J, Sáez J, Laveda R, et al. Factors associated with non-attendance at outpatient endoscopy. Scand J Gastroenterol. 2008;43(2):202–206. doi: 10.1080/00365520701562056. [DOI] [PubMed] [Google Scholar]
- 16.Gupta M, Holub JL, Eisen G. Do indication and demographics for colonoscopy affect completion? A large national database evaluation. Eur J Gastroenterol Hepatol. 2010;22(5):620–627. doi: 10.1097/MEG.0b013e3283352cd6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Beydoun H, Beydoun M. Predictors of colorectal cancer screening behaviors among average-risk older adults in the United States. Cancer Causes Control. 2008;19(4):339–359. doi: 10.1007/s10552-007-9100-y. [DOI] [PubMed] [Google Scholar]
- 18.Watson AJ, O’Rourke J, Jethwani K, et al. Linking electronic health record-extracted psychosocial data in real-time to risk of readmission for heart failure. Psychosomatics. 2011;52(4):319–327. doi: 10.1016/j.psym.2011.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448–457. doi: 10.1197/jamia.M1794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Denny JC, Peterson JF, Choma NN, et al. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J Am Med Inform Assoc. 2010;17(4):383–388. doi: 10.1136/jamia.2010.004804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Harkema H, Chapman WW, Saul M, Dellon ES, Schoen RE, Mehrotra A. Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inform Assoc. 2011;18(Suppl 1):i150–i156. doi: 10.1136/amiajnl-2011-000431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Murff HJ, FitzHenry F, Matheny ME, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–855. doi: 10.1001/jama.2011.1204. [DOI] [PubMed] [Google Scholar]
- 23.Zalis M, Harris M. Advanced search of the electronic medical record: augmenting safety and efficiency in radiology. J Am Coll Radiol. 2011;7(8):625–634. doi: 10.1016/j.jacr.2010.03.011. [DOI] [PubMed] [Google Scholar]
- 24.Blumenthal DM, Singal G, Mangla S, Macklin EA, Chung DC. 983 predicting noncompliance with scheduled outpatient elective colonoscopy. Gastroenterology. 2012;142(5, Supplement 1):S-173. [Google Scholar]
- 25.Inadomi JM. Taishotoyama symposium barriers to colorectal cancer screening: economics, capacity and adherence. J Gastroenterol Hepatol. 2008;23:S198–S204. doi: 10.1111/j.1440-1746.2008.05556.x. [DOI] [PubMed] [Google Scholar]
- 26.Holden DJ, Jonas DE, Porterfield DS, Reuland D, Harris R. Systematic review: enhancing the use and quality of colorectal cancer screening. Ann Intern Med. 2010;152(10):668–676. doi: 10.7326/0003-4819-152-10-201005180-00239. [DOI] [PubMed] [Google Scholar]
- 27.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–381. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Frederiksen BL, Jorgensen T, Brasso K, Holten I, Osler M. Socioeconomic position and participation in colorectal cancer screening. Br J Cancer. 2010;103(10):1496–1501. doi: 10.1038/sj.bjc.6605962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Guessous I, Dash C, Lapin P, Doroshenk M, Smith RA, Klabunde CN. Colorectal cancer screening barriers and facilitators in older persons. Prev Med. 2010;50(1–2):3–10. doi: 10.1016/j.ypmed.2009.12.005. [DOI] [PubMed] [Google Scholar]
- 30.Anderson J, Fortinsky R, Kleppinger A, Merz-Beyus A, Huntington C, III, Lagarde S. Predictors of compliance with free endoscopic colorectal cancer screening in uninsured adults. J Gen Intern Med. 2011;26(8):875–880. doi: 10.1007/s11606-011-1716-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lucas GM, Chaisson RE, Moore RD. Highly active antiretroviral therapy in a large urban clinic: risk factors for virologic failure and adverse drug reactions. Ann Intern Med. 1999;131(2):81–87. doi: 10.7326/0003-4819-131-2-199907200-00002. [DOI] [PubMed] [Google Scholar]
- 32.Goldman L, Freidin R, Cook E, Eigner J, Grich P. A multivariate approach to the prediction of no-show behavior in a primary care center. Arch Intern Med. 1982;142(3):563–567. doi: 10.1001/archinte.1982.00340160143026. [DOI] [PubMed] [Google Scholar]
- 33.Barron WM. Failed appointments: who misses them, why they are missed, and what can be done. Prim Care. 1980;7(4):563–574. [PubMed] [Google Scholar]
- 34.Alaeddini A, Yang K, Reddy C, Yu S. A probabilistic model for predicting the probability of no-show in hospital appointments. Health Care Manag Sci. 2011;14(2):146–157. doi: 10.1007/s10729-011-9148-9. [DOI] [PubMed] [Google Scholar]
- 35.Chariatte V, Berchtold A, Akré C, Michaud P-A, Suris J-C. Missed appointments in an outpatient clinic for adolescents, an approach to predict the risk of missing. J Adolesc Health. 2008;43(1):38–45. doi: 10.1016/j.jadohealth.2007.12.017. [DOI] [PubMed] [Google Scholar]
- 36.Daggy J, Lawley M, Willis D, et al. Using no-show modeling to improve clinic performance. Health Inform J. 2010;16(4):246–259. doi: 10.1177/1460458210380521. [DOI] [PubMed] [Google Scholar]
- 37.Henderson R. Encouraging attendance at outpatient appointments: can we do more? Scott Med J. 2008;53(1):9–12. doi: 10.1258/rsmsmj.53.1.9. [DOI] [PubMed] [Google Scholar]
- 38.Neal RD, Lawlor DA, Allgar V, et al. Missed appointments in general practice: retrospective data analysis from four practices. Br J Gen Pract. 2001;51:830–832. [PMC free article] [PubMed] [Google Scholar]
- 39.Dove HG, Schneider KC. The usefulness of patients’ individual characteristics in predicting no-shows in outpatient clinics. Med Care. 1981;19(7):734–740. doi: 10.1097/00005650-198107000-00004. [DOI] [PubMed] [Google Scholar]
- 40.Lee V, Earnest A, Chen M, Krishnan B. Predictors of failed attendances in a multi-specialty outpatient centre using electronic databases. BMC Health Serv Res. 2005;5(51):1–8. doi: 10.1186/1472-6963-5-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Thompson K. Programming techniques: regular expression search algorithm. Commun ACM. 1968;11(6):419–422. doi: 10.1145/363347.363387. [DOI] [Google Scholar]
- 42.Friedman C, Johnson SB, Forman B, Starren J. Architectural requirements for a multipurpose natural language processor in the clinical environment. Proc Annu Symp Comput Appl Med Care. 1995;347–351. [PMC free article] [PubMed]
- 43.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Buntin MB, Burke MF, Hoaglin MC, Blumenthal D. The benefits of health information technology: a review of the recent literature shows predominantly positive results. Health Aff. 2011;30(3):464–471. doi: 10.1377/hlthaff.2011.0178. [DOI] [PubMed] [Google Scholar]
- 45.DeVore S, Champion RW. Driving population health through accountable care organizations. Health Aff. 2011;30(1):41–50. doi: 10.1377/hlthaff.2010.0935. [DOI] [PubMed] [Google Scholar]
- 46.Elkin EB, Shapiro E, Snow JG, Zauber AG, Krauskopf MS. The economic impact of a patient navigator program to increase screening colonoscopy. Cancer. 2012;118(23):5982–5988. doi: 10.1002/cncr.27595. [DOI] [PubMed] [Google Scholar]
- 47.Jandorf L, Gutierrez Y, Lopez J, Christie J, Itzkowitz S. Use of a patient navigator to increase colorectal cancer screening in an urban neighborhood health clinic. J Urban Health. 2005;82(2):216–224. doi: 10.1093/jurban/jti046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lebwohl B, Neugut A, Stavsky E, et al. Effect of a patient navigator program on the volume and quality of colonoscopy. J Clin Gastroenterol. 2011;45(5):e47–e53. doi: 10.1097/MCG.0b013e3181f595c3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lee CS, McCormick PA. Telephone reminders to reduce non-attendance rate for endoscopy. J R Soc Med. 2003;96(11):547–548. doi: 10.1258/jrsm.96.11.547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Liang SY, Phillips KA, Nagamine M, Ladabaum U, Haas JS. Rates and predictors of colorectal cancer screening. Prev Chron Dis. 2006;3(4):A117–A129. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.