Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jul 28.
Published in final edited form as: Prev Sci. 2016 May;17(4):461–471. doi: 10.1007/s11121-015-0628-x

Derivation and Evaluation of a Risk-Scoring Tool to Predict Participant Attrition in a Lifestyle Intervention Project

Luohua Jiang 1,2,, Jing Yang 2, Haixiao Huang 3, Ann Johnson 3, Edward J Dill 4, Janette Beals 3, Spero M Manson 3, Yvette Roubideaux 5; the Special Diabetes Program for Indians Diabetes Prevention Demonstration Project
PMCID: PMC5532883  NIHMSID: NIHMS881491  PMID: 26768431

Abstract

Participant attrition in clinical trials and community-based interventions is a serious, common, and costly problem. In order to develop a simple predictive scoring system that can quantify the risk of participant attrition in a lifestyle intervention project, we analyzed data from the Special Diabetes Program for Indians Diabetes Prevention Program (SDPI-DP), an evidence-based lifestyle intervention to prevent diabetes in 36 American Indian and Alaska Native communities. SDPI-DP participants were randomly divided into a derivation cohort (n = 1600) and a validation cohort (n = 801). Logistic regressions were used to develop a scoring system from the derivation cohort. The discriminatory power and calibration properties of the system were assessed using the validation cohort. Seven independent factors predicted program attrition: gender, age, household income, comorbidity, chronic pain, site–s user population size, and average age of site staff. Six factors predicted long-term attrition: gender, age, marital status, chronic pain, site–s user population size, and average age of site staff. Each model exhibited moderate to fair discriminatory power (C statistic in the validation set: 0.70 for program attrition, and 0.66 for long-term attrition) and excellent calibration. The resulting scoring system offers a low-technology approach to identify participants at elevated risk for attrition in future similar behavioral modification intervention projects, which may inform appropriate allocation of retention resources. This approach also serves as a model for other efforts to prevent participant attrition.

Keywords: Lifestyle modifications, Multi-site study, Retention, Risk prediction models, Weight loss program


Participant attrition in clinical trials is a serious, common, and costly problem (Probstfield and Frye 2011). A range of factors has been associated with risk of participant attrition in longitudinal studies (Blanton et al. 2006; Brown et al. 2006; Clark et al. 1996; Dalle Grave et al. 2005; Fabricatore et al. 2009; Fitzpatrick et al. 2014; Honas et al. 2003; Kong et al. 2010; Manson et al. 2011; McGuigan et al. 2003; O–Brien et al. 2012; WarrenFindlow et al. 2003; Williams et al. 2008). With respect to lifestyle interventions to promote weight loss, multiple studies have attempted to identify baseline factors that predict the risk of participant attrition (Fabricatore et al. 2009; Fitzpatrick et al. 2014; Honas et al. 2003; Rothberg et al. 2015; Spring et al. 2014). Most lifestyle intervention projects that investigated this issue had relatively small sample size with younger age being the most consistent predictor for attrition in those studies. Based on our experience in implementing such interventions, we hypothesize the participants with less resources (both in terms of time and economic resources) would have less likelihood for retention success in a lifestyle intervention. Indeed, using data from a multi-site diabetes prevention initiative among American Indians and Alaska Natives (AI/ANs) (Jiang et al. 2015), our group found personal characteristics associated with loss to follow-up included younger age, male, lower household income, lacking family support, and chronic pain. In addition, sites with large or small user populations, younger staff, higher staff ratings of participant disinterest in the intervention program, and higher staff ratings of participant lack of transportation or child/elder care were less likely to retain participants.

Identifying factors that affect retention is an important first step to improve participant retention in clinical trials and translational projects. These factors may further inform the development of a simple risk-prediction scoring system that can help healthcare providers and program administrators to identify participants likely to be lost to follow-up. Strategies then can be targeted and resources allocated to retain those at highest risk of attrition. In recent years, the risk score method has been applied widely to predict the occurrence of chronic diseases, such as cardiovascular disease, diabetes, and dementia (D’Agostino et al. 2008; Exalto et al. 2013; Kahn et al. 2009; Lee et al. 2006; Lindstrom and Tuomilehto 2003; Noble et al. 2011; Wilson et al. 1998). Such risk scores have been commonly used as screening tools to identify individuals at high risk for specific chronic diseases; these people then are targeted for primary or secondary prevention efforts. This method also has been used to predict hospital readmission (Bradley et al. 2013; Donze et al. 2013; Kansagara et al. 2011) thereby focusing resource-intensive transitional care interventions on the patients at greatest risk. Recently, Johnson et al. (2014) used a cumulative risk model to identify families at high risk of withdrawal from a cohort study of children with genetic risk for type 1 diabetes. For each family with a risk score of early withdrawal ≥4, they developed an individually tailored retention plan using a variety of strategies such as increasing communications between study visits and addressing individual family concerns (e.g., childcare, transportation problems). They showed that the withdrawal rates for the high risk groups were substantially lower after implementing the retention plan. Yet, to our knowledge, risk-scoring system has not been applied to quantify the risk of attrition in behavioral modification intervention projects, such as lifestyle interventions to promote weight loss for which retention is a big challenge (Fitzpatrick et al. 2014; Spring et al. 2014). When implemented in real-world clinical settings, retention rate of a weight management program may be as low as 26 % at 6 months (Spring et al. 2014).

Obesity and its associated chronic diseases affect disadvantaged populations disproportionately, especially American Indians and Alaska Natives (AI/ANs) (CDC 2011). The prevalence of obesity (29.4 vs. 20.3 %) and overweight (69.6 vs. 59.1 %) is significantly higher among AI/ANs than among the general US population (Barnes et al. 2005). Minority participants typically are also more difficult and costly to recruit as well as retain in longitudinal intervention projects (Garfield et al. 2003; Probstfield and Frye 2011). Building upon our previous research (Jiang et al. 2015), the current study sought to develop and evaluate a simple, low-technology system for identifying individuals who are at increased risk of attrition in a multi-site diabetes prevention translational initiative that implemented a lifestyle intervention among AI/AN communities, the Special Diabetes Program for Indians Diabetes Prevention (SDPI-DP) program. We considered only risk factors that are readily and easily available at the beginning of a project to use as a participant retention tool in implementing future similar intervention efforts.

Methods

Study Design and Population

The SDPI-DP Program is a congressionally mandated demonstration project designed to reduce diabetes incidence among AI/ANs with pre-diabetes by translating the Diabetes Prevention Program (DPP) lifestyle intervention for application in a diverse range of organizational, cultural, and geographic healthcare settings. The details of this effort are described elsewhere (Jiang et al. 2013). Briefly, 36 health care programs serving 80 tribes in 18 states and 11 IHS administrative areas participated in the SDPI-DP. The participating programs implemented the 16-session Lifestyle Balance Curriculum adopted from the DPP (2002) and evaluated the effectiveness of the prevention activities over a 3-year period. After a baseline assessment, participants attended the lifestyle curriculum consisting of diet, exercise, and behavior modification sessions to help each individual reach and maintain a goal of 7 % weight loss. The curriculum was provided in group settings within 16–24 weeks after baseline and typically taught by the program dietitian and/or health educator. The curriculum was supplemented by monthly one-to-one individual lifestyle coaching, to individualize nutrition and physical activity plans as needed and to identify and solve participation barriers.

Participants were recruited locally by each grant program, mainly through community events such as health fairs, but also from local clinics or by provider referral. Interested potential participants were invited for a screening visit to determine their eligibility for SDPI-DP. Eligibility criteria included being AI/AN (based on eligibility to receive IHS services), being at least 18 years of age, having no previous diagnosis of diabetes, and having impaired fasting glucose (a fasting blood glucose level of 100 to 125 mg/dl) and/or impaired glucose tolerance (IGT, i.e., an oral glucose-tolerance test result of 140 to 199 mg/dl 2 h after a 75-g oral glucose load). Eligible participants were then invited to participate in the program. Enrollment began in January 2006 and is ongoing. The present study included baseline and retention data from 2553 participants who completed the baseline assessment and started the intervention by 07/31/2008. Retention data were available for these participants between their baseline assessments and 07/31/2009. The grantee sites were instructed to use a variety of strategies (e.g., gardening classes, walking club, reminder phone calls) to encourage retention and continued maintenance towards lifestyle change goals. Incentives such as pedometers or exercise T-shirts were usually given after the participant completed each assessment.

The lifestyle intervention was generally well accepted by local health programs. Adaptation for local culture and situation was allowed provided the same basic information was presented and adaptation was documented. Many grantees drew upon their local culture to translate educational concepts and curriculum into tribal languages, and incorporated, for instance, talking circles, indigenous foods, or drumming into intervention sessions. Among the participants included in this study, on average, each person attended 13 classes with two thirds (69.5 %) of them attending all 16 classes of the DPP curriculum. During the implementation of the curriculum, 84 % of the participants used the Keeping Track Booklet to monitor their weekly physical activity and 92 % of them used the booklet to monitor their fat and calorie intake. Additionally, each participant attended an average of six life style coaching visits in the first year.

Measures

At baseline, within a month of completing the last lifestyle curriculum class (usually 4–6 months after baseline), and annually after baseline, participants underwent a comprehensive clinical assessment to evaluate diabetes risk and incidence. At the same time points, each participant completed a questionnaire encompassing sociodemographics, health-related behavior, and a range of psychosocial factors. In this study, consistent with our previous publication (Jiang et al. 2015), program attrition for a participant was defined as not completing all 16 DPP curriculum sessions. Long-term attrition, or loss to follow-up (LTFU), was operationalized as a participant becoming inactive in the project as reported by the site staff for any reason other than diabetes conversion, death, or pregnancy. We examined the association between attrition and the following participant- and site-level characteristics.

Participant Characteristics

Sociodemographics

Participants answered questions related to their sociodemographic characteristics, including age, gender, educational attainment, employment status, marital status, and annual household income.

Psychosocial Factors

Participants were queried about a wide range of psychosocial factors that may be related to retention, including distress, anxiety, pain, family support, smoking, physical activity, diet, and stages of change for exercise, diet, and weight loss. Our previous multivariate analyses indicated the following variables were significantly or marginally related to program and/or long-term retention and are included here:

  1. Pain. Avisual analog pain scale (range 1–10) was used to assess each participant’s perception of general pain (Carlsson 1983).

  2. Family support. The availability of a family support person was determined by having a family member complete a brief family questionnaire at baseline.

  3. Comorbidity. Self-reported number of comorbid conditions was assessed using the Self-Administered Comorbidity Questionnaire (Sangha et al. 2003).

Site Characteristics

Site-specific factors included the user population size of the health facility of each grantee site (small [less than 5000 users], medium [5000–9999], and large [≥10,000]). The characteristics of staff members at each grantee site were obtained from a Provider Annual Questionnaire (PAQ) completed by grantee staff members. Based on our previous work, in this study, we examined the relationship between retention and average age of staff members (<40 vs. ≥40 years). The PAQs were completed by site staff at three time points: December 2006, 2007, and 2008. We used the averages of those three time points for each of the measurements collected from the PAQs as potential factors affecting program and long-term retention. Two grantee sites had very low response rates for the PAQs (≤2 per year), thus those two sites were excluded from all analyses in the current study (n = 152, leaving 2401 participants in the final analysis).

Statistical Analysis

We randomly selected two thirds of SDPI-DP participants to form a derivation dataset and used the remaining one third of participants as the validation set. To account for within-site clustering, initially a multivariate generalized estimation equation (GEE) model (Hanley et al. 2003) with a logit link was built for program attrition, while a Cox regression model with robust standard error estimators (Lee et al. 1992) was fit for long-term attrition. However, the multivariate logistic regression models for program and long-term attrition yielded very similar risk-prediction scoring systems and discriminatory abilities, but better calibration results. Therefore, we present the results from the logistic regression models here.

Potential risk factors for attrition were identified based on our previous investigation (Jiang et al. 2015). When constructing the final regression models, factors with a P value >0.2 were removed from the model one at a time using backward elimination. Two site level risk factors, staff rating of participants’ lack of interest in SDPI-DP (4 items, α = 0.81) and staff rating of lack of transportation or child/elder care (2 items, α = 0.85), were obtained as questions related to their retention experience and hence were not available at baseline. Although significantly correlated with participant attrition, they were removed from the final models of this study, because most likely they will not be available when site staff calculates risk prediction scores at baseline.

After the final regression models were identified, the risk scoring system was derived based on the regression coefficients of the final models using data from the derivation cohort. To assign risk scores (points), we divided each coefficient by the smallest coefficient and rounded it up to the nearest integer. Continuous risk factors were categorized into quartiles in order to simplify the process of calculating the attrition risk scores by assigning a point score to each of the quartile categories. Two adjacent categories of a variable were considered to be overlapping if their estimated regression coefficients were within one standard error of each other. Any overlapping categories were collapsed into one combined category.

We then tested the performance of this risk scoring system in the validation dataset. The discriminatory capability of the risk prediction scores was assessed using the C statistic, or the area under the receiver operating characteristic curve (AROC). The C statistic is defined as the probability that a model can correctly discriminate a pair of participants with different attrition outcomes. It ranges from 0.5 to 1.0, where a value of 0.5 indicates the model is no better than chance at making the discrimination. A model is usually considered having moderate discriminatory capability when its C statistic is between 0.7 and 0.8, while a C statistic >0.8 implies excellent discriminatory ability (Hosmer and Lemeshow 2000). The model’s goodness of fit was assessed by the Hosmer-Lemeshow χ2 test (Hosmer et al. 1997) and calibration (McGeechan et al. 2008), which compared the estimated risk of attrition obtained from the model for participants in a specific risk category with the observed risk, i.e., the actual proportion of program or long-term attrition for participants in that category. A Hosmer-Lemeshow goodness-of-fit statistic above 0.05 indicates adequate calibration. All data analyses were conducted using SAS 9.3 software (SAS Institute, Inc., Cary, NC).

Missing data were uncommon for most of the variables included in this analysis (≤5 %) except income and marital status (20 and 16 %, respectively). Still, in the multiple regression models without imputation, about one third of the observations were excluded due to missing data on one or more variables. We fit the final multivariate regression models with and without imputed covariates and found they yielded very similar results. In the current study, we chose to report the results without missing data imputation because, in practice, field staff predicting attrition would not have access to imputation technologies.

Results

Table 1 describes the baseline characteristics and attrition rates of the derivation and validation datasets. The average age of the 1600 participants in the derivation cohort was 47 years old. The majority of participants were female (74.8 %), employed (74.8 %), and married or living together with a partner (59.2 %). Thirty-one percent of the derivation cohort did not complete all 16 DPP curriculum sessions, and 43 % of them were LTFU by July 31, 2009. None of the characteristics were significantly different between the derivation and validation cohorts.

Table 1.

Baseline characteristics of participants in derivation and validation sets

Characteristics Derivation set (N = 1600)
N (%)
Validation set (N = 801)
N (%)
P value
Gender 0.36
 Female 1196 (74.8) 585 (73)
 Male 404 (25.3) 216 (27)
Age group 0.59
 18 to <40 448 (28) 240 (30)
 40 to <60 891 (55.7) 437 (54.6)
 ≥60 261 (16.3) 124 (15.5)
Education status 0.95
 <High school 197 (14) 105 (14.7)
 High school grad 291 (20.7) 150 (21.1)
 Some college 634 (45.2) 319 (44.8)
 ≥College grad 282 (20.1) 138 (19.4)
Annual household income (USD) 0.44
 <15,000 220 (18.6) 131 (21.5)
 15,000–<30,000 263 (22.2) 122 (20)
 30,000–<50,000 351 (29.6) 180 (29.5)
 ≥50,000 351 (29.6) 177 (29)
Marriage status 0.84
 Married or live together 756 (59.2) 363 (57.9)
 Separated, divorced, or widowed 321 (25.2) 161 (25.7)
 Never married 199 (15.6) 103 (16.4)
Employment status 0.10
 Employed 1055 (74.8) 495 (71)
 Unemployed 208 (14.7) 132 (18.9)
 Retired 105 (7.4) 52 (7.5)
 Student 43 (3) 18 (2.6)
Presence of family support person 0.06
 No family support person 620 (38.8) 342 (42.7)
 Having family support person 980 (61.3) 459 (57.3)
Comorbidity index 0.65
 ≤2 934 (62.8) 456 (61.8)
 >2 554 (37.2) 282 (38.2)
Pain visual assessment 0.35
 ≤4 1053 (75.9) 506 (74)
 >4 335 (24.1) 178 (26)
Finishing all 16 classes 0.82
 No 488 (30.5) 248 (31)
 Yes 1112 (69.5) 553 (69)
Loss to follow-up 0.73
 No 911 (56.9) 462 (57.7)
 Yes 689 (43.1) 339 (42.3)

Table 2 presents the final multivariate logistic regression models for program and long-term attrition. We identified seven variables that are independent predictors for program attrition and six predictors for long-term attrition. Based on regression coefficients of the final models, for the program attrition risk score, one point was assigned to the male gender, three points were given for those <40 years old compared to one point for those in the 40 to 60 age group, either two or one points were assigned to income category of <15 or 15 to <30k, respectively, two points were added for those with two or fewer comorbidities at baseline as well as those reporting a baseline pain >4. At the site level, four points were assigned to those sites with small or large user populations and three points were added to the sites with younger staff members (average age <40 years old). In general, the point assignments for long-term attrition were similar to those for program attrition, except that income and comorbidity were not related to long-term attrition. Instead, two points were added to the long-term attrition risk score for those who were separated, divorced, or widowed. Also, while the predictor with the largest impact on program attrition was a site’s user population size, participant age (<40 years old) had the greatest effect on long-term attrition. An example of how to use these points to calculate the attrition risk scores for a particular participant is shown in Table 3.

Table 2.

Final multivariate logistic regression models for program and long-term attrition

Program attrition Long-term attrition


β P value Points β P value Points
Participant characteristics
 Male 0.28 0.08 1 0.29 0.04 1
 Age group (years)
  18 to <40 0.52 0.03 3 0.80 0.0001 4
  40 to <60 0.28 0.20 1 0.49 0.008 2
 Annual household income (USD)
  <15,000 0.46 0.01 2
   15,000 to <30,000 0.24 0.18 1
 Separated, divorced, or widowed 0.41 0.007 2
 Comorbidity index ≤2 0.48 0.003 2
 Pain visual assessment >4 0.37 0.03 2 0.22 0.12 1
Site characteristics
 Small (<5000) or large (≥10,000) user population size 0.85 <0.0001 4 0.41 0.004 2
 Average age of staff members <40 years 0.53 0.0003 3 0.61 <0.0001 3
Range 0–17 Range 0–13

Table 3.

Example of calculating participant attrition risk scores

Attrition risk factors Points Example participanta


Program attrition Long-term attrition Program attrition Long-term attrition
Male 1 1 1 1
Age group (years)
 18 to <40 3 4 3 4
 40 to <60 1 2
Annual household income (USD)
 <15,000 2
 15,000 to <30,000 1
Separated, divorced, or widowed 2 2
Comorbidity index ≤2 2
Pain visual assessment >4 2 1 2 1
Site characteristics
 Small (<5000) or large (≥10,000) user population size 4 2
 Average age of staff members <40 years 3 3 3 3
Total risk score 9 11
Risk category Intermediate High
a

A 35-year-old male participant who had an annual household income of 35k, was divorced, reported 3 comorbidities and a chronic pain score of 6 at baseline, participating in SDPI-DP at a grantee site with medium user population size and average staff age of 38 years old

The discrimination power of the scores was fair to moderate (Table 4). The C statistics of the final model for program attrition were 0.67 in the derivation set and 0.70 in the validation set. If we use a risk score of 8 or higher to predict program attrition, the sensitivity of the classification will be 0.71 and the specificity will be 0.55. Because income had a relatively high missing rate in our dataset, we also calculated C statistics for the program attrition model without income. The discrimination power decreased to 0.65 in both the derivation and validation set in that scenario. When we excluded site level characteristics from the model, the model C statistic dropped to 0.59 for the derivation set and to 0.63 for the validation set. For long-term attrition, the discrimination power of the scores was only fair. The C statistics of the final model for long-term attrition were 0.63 in the derivation set and 0.66 in the validation set. If we choose 6 as the cutoff for long-term attrition, the sensitivity of the prediction will be 0.63, while the specificity will be 0.57. Because marital status also had a relatively high missing rate in our dataset, we calculated C statistics after excluding marital status from the model; the resulting C statistics were 0.63 and 0.64 in the derivation and validation set, respectively. Dropping site level characteristics from the long-term attrition model decreased the C statistics to less than 0.60.

Table 4.

C statistics for different models predicting participant attrition

Derivation set Validation set
Program attrition
 Final model 0.67 0.70
 Final model without income 0.65 0.65
 Final model without site-level covariates 0.59 0.63
Long-term attrition
 Final model 0.63 0.66
 Final model without marital status 0.63 0.64
 Final model without site-level covariates 0.58 0.58

As shown in Table 5, based on the risk scores described in Table 2, the risk of attrition was stratified into three categories: low, intermediate, and high. For program attrition, low-risk patients had 0–7 points in the risk score (44 % of participants) and <20 % estimated risk of not completing all 16 DPP curriculum sessions; high-risk participants had 11 or more points (21 % of participants) and a 44 % estimated probability of program attrition. The P values of the Hosmer-Lemeshow goodness-of-fit tests were P = 0.94 and P = 0.46 in the derivation and validation sets, respectively, indicating good calibration. For long-term attrition, low-risk patients accumulated 0–5 points in the risk score (49 % of participants) and a 31 % estimated risk of LTFU at the end of the study; high-risk participants had eight or more points (21 % of participants) and >50 % estimated probability of long-term attrition. The Hosmer-Lemeshow goodness-of-fit tests were P = 0.16 and P = 0.36 in the derivation and validation sets, respectively, again indicating good calibration. (Table 6 reports the observed and predicted attrition risk for each level of risk score.)

Table 5.

Observed vs. predicted risk for program and long-term attrition

Points Risk category Derivation set Validation set


N (%) Observed risk Estimated risk N (%) Observed risk Estimated risk
Program attrition
 0–7 Low 494 (44.4) 0.168 0.165 267 (47.8) 0.172 0.168
 8–10 Intermediate 390 (35.0) 0.290 0.300 186 (33.3) 0.323 0.298
 11+ High 229 (20.6) 0.450 0.440 105 (18.8) 0.495 0.443
Long-term attrition
 0–5 Low 567 (49.0) 0.302 0.308 276 (48.8) 0.304 0.312
 6–7 Intermediate 343 (29.6) 0.440 0.439 166 (29.4) 0.452 0.438
 8+ High 248 (21.4) 0.581 0.566 123 (21.8) 0.561 0.558

Table 6.

Observed vs. predicted risk for each level of risk score

Risk score Derivation set Validation set


N (%) Observed risk Estimated risk N (%) Observed risk Estimated risk
Program attrition
 0 5 (0.4) 0.200 0.058
 1 23 (2.1) 0.130 0.075 6 (1.1) 0.000 0.075
 2 16 (1.4) 0.188 0.093 7 (1.3) 0.000 0.092
 3 70 (6.3) 0.100 0.113 39 (7.0) 0.077 0.110
 4 49 (4.4) 0.143 0.136 26 (4.7) 0.115 0.133
 5 102 (9.2) 0.147 0.151 62 (11.1) 0.177 0.154
 6 87 (7.8) 0.172 0.183 46 (8.2) 0.130 0.183
 7 142 (12.8) 0.225 0.224 81 (14.5) 0.284 0.222
 8 122 (11.0) 0.221 0.263 67 (12.0) 0.224 0.260
 9 115 (10.3) 0.278 0.292 53 (9.5) 0.377 0.293
 10 153 (13.7) 0.353 0.336 66 (11.8) 0.379 0.339
 11 69 (6.2) 0.406 0.389 34 (6.1) 0.324 0.389
 12 83 (7.5) 0.410 0.424 35 (6.3) 0.457 0.422
 13 46 (4.1) 0.478 0.474 12 (2.2) 0.917 0.469
 14 21 (1.9) 0.571 0.520 16 (2.9) 0.563 0.514
 15 8 (0.7) 0.750 0.580 6 (1.1) 0.667 0.581
 16 1 (0.1) 0.000 0.603 2 (0.4) 0.500 0.603
 17 1 (0.1) 1.000 0.668
Long-term attrition
 0 8 (0.7) 0.125 0.162 1 (0.2) 0.000 0.162
 1 9 (0.8) 0.333 0.200 3 (0.5) 0.000 0.205
 2 99 (8.5) 0.273 0.234 40 (7.1) 0.125 0.235
 3 74 (6.4) 0.365 0.283 39 (6.9) 0.179 0.282
 4 212 (18.3) 0.264 0.315 113 (20.0) 0.336 0.314
 5 165 (14.2) 0.345 0.370 80 (14.2) 0.425 0.369
 6 146 (12.6) 0.390 0.410 81 (14.3) 0.383 0.412
 7 197 (17.0) 0.477 0.461 85 (15.0) 0.518 0.462
 8 87 (7.5) 0.552 0.523 45 (8.0) 0.400 0.525
 9 94 (8.1) 0.553 0.558 52 (9.2) 0.596 0.555
 10 47 (4.1) 0.681 0.619 21 (3.7) 0.762 0.611
 11 18 (1.6) 0.556 0.662 3 (0.5) 1.000 0.642
 12 1 (0.1) 1.000 0.692 2 (0.4) 0.500 0.699
 13 1 (0.1) 1.000 0.750

Discussion

The issue of participant attrition has been recognized as a serious challenge for the success of clinical trials. Attrition is even a greater issue in the large-scale implementation of behavioral lifestyle interventions, which lack the resources of randomized clinical trials. As discussed in our previous study, the relationships between participant characteristics and retention among these AI/AN participants are generally consistent with the existing preventive intervention literature (Jiang et al. 2015). A couple recent publications reporting factors associated with participant retention in weight management programs were also relatively consistent with our findings (Rothberg et al. 2015; Spring et al. 2014). In particular, older age was consistently reported as being linked with less attrition risk. Further, in the Veterans Health Administration–s nationwide MOVE! Weight Management Program, female gender and more comorbidities were also found to be associated with smaller risk for attrition. Both of those studies found baseline BMI was significantly related to retention, but the directions of the relationships were contradictory in those two studies. The MOVE! Program also found a number of facility and program factors affecting retention rates, such as lack of co-payment requirement and geographic proximity to VA facility. However these factors were either not applicable or not available in our study.

Quantifying the risk of attrition can facilitate targeted retention strategies to minimize participant attrition. It also can guide health care providers and public health practitioners in allocating scarce resources more appropriately to focus on those at high attrition risk when pursuing disease prevention. Using data from a multi-site lifestyle intervention project, we developed and evaluated a scoring system that predicted the risk of program and long-term participant attrition. This easy-to-use method is moderately discriminatory and exhibits excellent calibration properties. It will enable healthcare providers and program administrators to prospectively identify participants at high risk of potential attrition. Program staff then can focus their limited time and resources on intensively tracking a smaller group of participants, and mobilizing retention techniques such as personalized reminder calls, transportation and child care reimbursement, as well as flexible appointment time for program visits.

The simplicity of this risk score system is notable. Although we derived the system in a large cohort of participants with more than 40 potential risk factors–each of which could reasonably influence the risk of attrition–only a few simple factors explained much of the variance in risk of participant attrition. Therefore, this approach offers a practical means by which to identify participants at high risk of loss to follow-up from a behavioral lifestyle intervention and will facilitate more targeted retention efforts in future similar projects. It is worthy to note that excluding income from the program attrition model and excluding marital status from the long-term attrition model only slightly reduced the model–s discriminatory power. Therefore, although both income and marital status had relatively high missing rates, the risk score system developed here can still be applied to those with missing data involving either or both of these two variables.

Consistent with our previous investigation (Jiang et al. 2015), both individual- and site-level characteristics predicted participation status in both the program and long-term attrition models. In an effort to simplify measures (i.e., categorize the continuous assessment of certain risk factors) and to exclude variables not available at baseline, some minor changes in the relationships between risk factors and attrition were observed. For example, the association between comorbidity index and program attrition changed from being marginally significant (P = 0.06) to statistically significant (P = 0.003). This likely reflects the fact that older participants, who usually suffer more comorbidities, had lower risk of program attrition. When age was entered into the model as a continuous variable, comorbidity was not significantly related to attrition. However, collapsing the age variable into three categories did not fully control for that effect. Additionally, household income and availability of a family support person were not significantly correlated with long-term retention. Instead, marital status appeared to be more important in predicting the risk of participant attrition in the long term. Marital status may share a large amount of common variance with household income and family support, which could partially explain why after adding marital status, household income and family support person were no longer significant in the final model regarding long-term attrition.

Several site level factors we previously identified as related to participant attrition (Jiang et al. 2015) were excluded from the final regression models in this study because of their unavailability at baseline. The two site level factors remaining in the final models proved to be pivotal to the discriminatory capabilities of the prediction models. The models with only participant characteristics had markedly reduced C statistics, indicating participant attrition cannot be adequately predicted without considering site level characteristics. This observation suggests possible ways for the coordinating center of a large multi-site intervention to identify sites with elevated average participant attrition risk and to enable those sites to focus on participant attrition at an earlier phase.

The risk prediction models for program and long-term attrition have considerable overlap, indicating calculating two separate risk scores may be somewhat redundant in practice. Given the importance of completing all 16 classes in achieving maximal SDPI-DP intervention effects (Jiang et al. 2013), we suggest field staff only calculate the risk scores for program attrition at the beginning of the intervention and initially focus on those at high risk for program attrition. As the intervention progresses to a later stage, and if retention resources are adequate, the staff may choose to calculate the risk scores for long-term attrition in order to more accurately identify those at increased risk for long-term attrition.

This risk score system must be interpreted in light of several limitations. First, the system was developed based on a sample of AI/ANs with pre-diabetes who participated in a lifestyle intervention. Although the method for developing the risk-prediction system is definitely generalizable, the attrition risk scores developed in this study may not be directly applicable for other longitudinal interventions. Differences in the nature of the intervention, the focal population, and factors available for consideration may lead to a somewhat different risk-prediction system. For example, in smoking session programs, baseline smoking frequency has been shown to be a strong predictor for intervention adherence and participant retention (Kealey et al. 2007; Snow et al. 2007). Therefore, the risk model for participant attrition in a smoking cessation program would likely need to include baseline smoking frequency as a risk factor. Also, in an intervention project targeting at multi-ethnic groups, race/ethnicity is a possible risk factor that is predictive of attrition risk. Future external validation studies will be helpful to elucidate how widely one can implement this system in other lifestyle and/or weight loss interventions. Yet, even if we find project adaptions are needed when applying the risk score system to another population and/or intervention, our study sets up an example for developing risk scoring systems of participant attrition in other types of longitudinal studies.

Second, the performance characteristics of our prediction models are only fair, especially for long-term attrition. A few reasons may explain this observation. Some important predictors may be missing from the measurements of SDPI-DP. For example, transportation problems could be a critical risk for attrition. However, the SDPI-DP did not have a reliable baseline measure of participant transportation. In future studies, it will be important to measure transportation problems at baseline and evaluate if adding this factor boosts the performance of the prediction models. Furthermore, in contrast to specific chronic diseases that are often strongly correlated with certain disease-specific biomarkers, participant attrition may be less predictive in large multi-site projects implemented among diverse communities. In particular, the reasons for long-term attrition seem to be much more diverse than those for program attrition (Jiang et al. 2015), rendering long-term attrition more difficult to predict. Nevertheless, our prediction models provide an initial tool for program administrators to identify those at elevated risk for attrition in a lifestyle intervention project.

In conclusion, we propose a simple and low-technology predictive scoring system that provides a practical tool to quantify the risk of participant attrition in a lifestyle intervention program. It represents the initial step to introduce the use of risk prediction model in optimizing the allocation of retention resources in large-scale longitudinal studies. The employment of this easy-to-use scoring system may help healthcare providers and program administrators develop targeted retention strategies for participants at high risk of potential attrition, which may ultimately minimize participant attrition in SDPI-DP and other similar chronic disease prevention projects. This kind of tool will be useful particularly for improving the likely success of large-scale, evidence-based interventions implemented in real-world settings, which usually have limited resources to track and retain participants. As SDPI-DP is an ongoing project, applying the attrition risk-prediction system developed in this study to newly recruited, future participants represents a critical next step in testing the applicability of this system in practice. Another important line of inquiry is to analyze the retention strategies that have been used by various SDPI-DP grantee sites and identify the strategies that are most effective at retaining the participants with high risk of attrition.

Acknowledgments

Funding Funding for SDPI-DP project was provided by the Indian Health Service (HHSI242200400049C, S. Manson). Manuscript preparation was supported in part by American Diabetes Association (ADA #7-12-CT-36, L. Jiang) and the National Institute of Diabetes and Digestive and Kidney Diseases (1P30DK092923, S.M. Manson).

Footnotes

Compliance with Ethical Standards All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The SDPI-DP protocol was approved by the institutional review board (IRB) of the University of Colorado Denver and the IHS IRB. When required, grantees obtained approval from other entities charged with overseeing research in their programs (e.g., tribal review boards).

Conflict of Interest The authors declare that they have no competing interests.

Informed Consent All participants provided written informed consent and Health Insurance Portability and Accountability Act authorization.

References

  1. Barnes PM, Adams PF, Powell-Griner E. Health characteristics of the American Indian and Alaska Native adult population: United States, 1999–2003 advance data, from vital and health statistics. Vol. 356. Hyattsville: US Department of Health and Human Services, National Center for Health Statistics; 2005. [Google Scholar]
  2. Blanton S, Morris DM, Prettyman MG, McCulloch K, Redmond S, Light KE, et al. Lessons learned in participant recruitment and retention: The EXCITE trial. Physical Therapy. 2006;86:1520–1533. doi: 10.2522/ptj.20060091. [DOI] [PubMed] [Google Scholar]
  3. Bradley EH, Yakusheva O, Horwitz LI, Sipsma H, Fletcher J. Identifying patients at increased risk for unplanned readmission. Medical Care. 2013;51:761–766. doi: 10.1097/MLR.0b013e3182a0f492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brown DM, Thorne JE, Foster GL, Duncan JL, Brune LM, Munana A, et al. Factors affecting attrition in a longitudinal study of patients with AIDS. AIDS Care. 2006;18:821–829. doi: 10.1080/09540120500466747. [DOI] [PubMed] [Google Scholar]
  5. Carlsson AM. Assessment of chronic pain. I. Aspects of the reliability and validity of the visual analogue scale. Pain. 1983;16:87–101. doi: 10.1016/0304-3959(83)90088-X. [DOI] [PubMed] [Google Scholar]
  6. CDC. 2011 National Diabetes Fact Sheet. 2011 Retrieved 08 April 2011, from http://www.cdc.gov/diabetes/pibs/estimates11.htm.
  7. Clark MM, Niaura R, King TK, Pera V. Depression, smoking, activity level, and health status: Pretreatment predictors of attrition in obesity treatment. Addictive Behaviors. 1996;21:509–513. doi: 10.1016/0306-4603(95)00081-x. [DOI] [PubMed] [Google Scholar]
  8. D’Agostino RB, Sr, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation. 2008;117:743–753. doi: 10.1161/CIRCULATIONAHA.107.699579. [DOI] [PubMed] [Google Scholar]
  9. Dalle Grave R, Calugi S, Molinari E, Petroni ML, Bondi M, Compare A, et al. Weight loss expectations in obese patients and treatment attrition: An observational multicenter study. Obesity Research. 2005;13:1961–1969. doi: 10.1038/oby.2005.241. [DOI] [PubMed] [Google Scholar]
  10. Donze J, Aujesky D, Williams D, Schnipper JL. Potentially avoidable 30-day hospital readmissions in medical patients: Derivation and validation of a prediction model. JAMA Internal Medicine. 2013;173:632–638. doi: 10.1001/jamainternmed.2013.3023. [DOI] [PubMed] [Google Scholar]
  11. Exalto LG, Biessels GJ, Karter AJ, Huang ES, Katon WJ, Minkoff JR, et al. Risk score for prediction of 10 year dementia risk in individuals with type 2 diabetes: A cohort study. Lancet Diabetes Endocrinology. 2013;1:183–190. doi: 10.1016/S2213-8587(13)70048-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fabricatore AN, Wadden TA, Moore RH, Butryn ML, Heymsfield SB, Nguyen AM. Predictors of attrition and weight loss success: Results from a randomized controlled trial. Behavioral Research Therapy. 2009;47:685–691. doi: 10.1016/j.brat.2009.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fitzpatrick SL, Jeffery R, Johnson KC, Roche CC, Van Dorsten B, Gee M, et al. Baseline predictors of missed visits in the Look AHEAD study. Obesity. 2014;22:131–140. doi: 10.1002/oby.20613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Garfield SA, Malozowski S, Chin MH, Narayan KM, Glasgow RE, Green LW, et al. Considerations for diabetes translational research in real-world settings. Diabetes Care. 2003;26:2670–2674. doi: 10.2337/diacare.26.9.2670. [DOI] [PubMed] [Google Scholar]
  15. Hanley JA, Negassa A, Edwardes MD, Forrester JE. Statistical analysis of correlated data using generalized estimating equations: An orientation. American Journal of Epidemiology. 2003;157:364–375. doi: 10.1093/aje/kwf215. [DOI] [PubMed] [Google Scholar]
  16. Honas JJ, Early JL, Frederickson DD, O–Brien MS. Predictors of attrition in a large clinic-based weight-loss program. Obesity Research. 2003;11:888–894. doi: 10.1038/oby.2003.122. [DOI] [PubMed] [Google Scholar]
  17. Hosmer DW, Lemeshow S. Applied logistic regression. 2. New York: Wiley; 2000. [Google Scholar]
  18. Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine. 1997;16:965–980. doi: 10.1002/(sici)1097-0258(19970515)16:9<965::aid-sim509>3.0.co;2-o. [DOI] [PubMed] [Google Scholar]
  19. Jiang L, Manson SM, Beals J, Henderson WG, Huang H, Acton KJ, et al. Translating the Diabetes Prevention Program into American Indian and Alaska Native communities: Results from the Special Diabetes Program for Indians Diabetes Prevention demonstration project. Diabetes Care. 2013;36:2027–2034. doi: 10.2337/dc12-1250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jiang L, Manson SM, Dill EJ, Beals J, Johnson A, Huang H, et al. Participant and site characteristics related to participant retention in a diabetes prevention translational project. Prevention Science. 2015;16:41–52. doi: 10.1007/s11121-013-0451-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Johnson SB, Lynch KF, Lee HS, Smith L, Baxter J, Lernmark B, et al. At high risk for early withdrawal: Using a cumulative risk model to increase retention in the first year of the TEDDY study. Journal of Clinical Epidemiology. 2014;67:609–611. doi: 10.1016/j.jclinepi.2014.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kahn HS, Cheng YJ, Thompson TJ, Imperatore G, Gregg EW. Two risk-scoring systems for predicting incident diabetes mellitus in U.S. adults age 45 to 64 years. Annals of Internal Medicine. 2009;150:741–751. doi: 10.7326/0003-4819-150-11-200906020-00002. [DOI] [PubMed] [Google Scholar]
  23. Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk prediction models for hospital readmission: A systematic review. Journal of the American Medical Association. 2011;306:1688–1698. doi: 10.1001/jama.2011.1515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kealey KA, Ludman EJ, Mann SL, Marek PM, Phares MM, Riggs KR, et al. Overcoming barriers to recruitment and retention in adolescent smoking cessation. Nicotine & Tobacco Research. 2007;9:257–270. doi: 10.1080/14622200601080315. [DOI] [PubMed] [Google Scholar]
  25. Kong W, Langlois MF, Kamga-Ngande C, Gagnon C, Brown C, Baillargeon JP. Predictors of success to weight-loss intervention program in individuals at high risk for type 2 diabetes. Diabetes Research Clinical Practice. 2010;90:147–153. doi: 10.1016/j.diabres.2010.06.031. [DOI] [PubMed] [Google Scholar]
  26. Lee EW, Wei LJ, Amato D. Cox-type regression analysis for large numbers of small groups of correlated failure time observations survival analysis: State of the Art. Netherlands: Kluwer; 1992. pp. 237–347. [Google Scholar]
  27. Lee ET, Howard BV, Wang W, Welty TK, Galloway JM, Best LG, et al. Prediction of coronary heart disease in a population with high prevalence of diabetes and albuminuria: The Strong Heart Study. Circulation. 2006;113:2897–2905. doi: 10.1161/CIRCULATIONAHA.105.593178. [DOI] [PubMed] [Google Scholar]
  28. Lindstrom J, Tuomilehto J. The diabetes risk score: A practical tool to predict type 2 diabetes risk. Diabetes Care. 2003;26:725–731. doi: 10.2337/diacare.26.3.725. [DOI] [PubMed] [Google Scholar]
  29. Manson SM, Jiang L, Zhang L, Beals J, Acton KJ, Roubideaux Y. Special diabetes program for Indians: Retention in cardiovascular risk reduction. Gerontologist. 2011;51:S21–S32. doi: 10.1093/geront/gnq083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. McGeechan K, Macaskill P, Irwig L, Liew G, Wong TY. Assessing new biomarkers and predictive models for use in clinical practice: A clinician–s guide. Archives of Internal Medicine. 2008;168:2304–2310. doi: 10.1001/archinte.168.21.2304. [DOI] [PubMed] [Google Scholar]
  31. McGuigan WM, Katzev AR, Pratt CC. Multi-level determinants of retention in a home-visiting child abuse prevention program. Child Abuse & Neglect. 2003;27:363–380. doi: 10.1016/s0145-2134(03)00024-3. [DOI] [PubMed] [Google Scholar]
  32. Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2 diabetes: Systematic review. British Medical Journal. 2011;343:d7163. doi: 10.1136/bmj.d7163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. O’Brien RA, Moritz P, Luckey DW, McClatchey MW, Ingoldsby EM, Olds DL. Mixed methods analysis of participant attrition in the nurse-family partnership. Prevention Science. 2012;13:219–228. doi: 10.1007/s11121-012-0287-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Probstfield JL, Frye RL. Strategies for recruitment and retention of participants in clinical trials. Journal of the American Medical Association. 2011;306:1798–1799. doi: 10.1001/jama.2011.1544. [DOI] [PubMed] [Google Scholar]
  35. Rothberg AE, McEwen LN, Kraftson AT, Ajluni N, Fowler CE, Miller NM, et al. Factors associated with participant retention in a clinical, intensive, behavioral weight management program. BMC Obesity. 2015;2:11. doi: 10.1186/s40608-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Sangha O, Stucki G, Liang MH, Fossel AH, Katz JN. The self-administered comorbidity questionnaire: A new method to assess comorbidity for clinical and health services research. Arthritis & Rheumatology. 2003;49:156–163. doi: 10.1002/art.10993. [DOI] [PubMed] [Google Scholar]
  37. Snow WM, Connett JE, Sharma S, Murray RP. Predictors of attendance and dropout at the Lung Health Study 11-year follow-up. Contemporary Clinical Trials. 2007;28:25–32. doi: 10.1016/j.cct.2006.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Spring B, Sohn MW, Locatelli SM, Hadi S, Kahwati L, Weaver FM. Individual, facility, and program factors affecting retention in a national weight management program. BMC Public Health. 2014;14:363. doi: 10.1186/1471-2458-14-363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. The DPP Research Group. The diabetes prevention program (DPP): Description of lifestyle intervention. Diabetes Care. 2002;25:2165–2171. doi: 10.2337/diacare.25.12.2165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Warren-Findlow J, Prohaska TR, Freedman D. Challenges and opportunities in recruiting and retaining underrepresented populations into health promotion research. The Gerontologist. 2003;43:37–46. doi: 10.1093/geront/43.suppl_1.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Williams PL, Van Dyke R, Eagle M, Smith D, Vincent C, Ciupak G, et al. Association of site-specific and participant-specific factors with retention of children in a long-term pediatric HIV cohort study. American Journal of Epidemiology. 2008;167:1375–1386. doi: 10.1093/aje/kwn072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wilson PW, D–Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837–1847. doi: 10.1161/01.cir.97.18.1837. [DOI] [PubMed] [Google Scholar]

RESOURCES