Abstract
Habits involve regular, cue-triggered routines. In a field experiment, we tested whether incentivizing exercise routines—paying participants each time they visit the gym within a planned, daily two-hour window—leads to more persistent exercise than offering flexible incentives—paying participants each day they visit the gym, regardless of timing. Routine incentives generated fewer gym visits than flexible incentives, both during our intervention and after incentives were removed. Even among sub-groups that were experimentally induced to exercise at similar rates during our intervention, recipients of routine incentives exhibited a larger decrease in exercise after the intervention than recipients of flexible incentives.
1. INTRODUCTION
Small, repeated, everyday decisions can have profound effects on many critical life outcomes. Choices that may seem trivial in the moment, such as how much to exercise, what to eat, how hard to study, and how to spend money, often accumulate over time to have large consequences (e.g., Kuh et al., 2006; Mokdad et al., 2004; Schroeder, 2007). Interventions capable of shifting the habits that govern many everyday behaviors could improve individual welfare tremendously if applied to decisions about health, education, and personal finance (e.g., Beshears et al., 2013; Gertler et al., 2014; Loewenstein, Price, & Volpp, 2016).1 Companies, recognizing the importance of such behaviors for employee well-being and productivity, are increasingly interested in promoting positive employee habits in these domains. For example, more than 90% of employers with at least 200 employees offer workplace wellness programs, and 63% of employers with wellness offerings sponsor a program that encourages exercise habits (Jones, Molitor, & Reif, 2018; Mattke, Schnyer, & Van Busum, 2013).
Psychology research has shown that stable habits tend to be characterized by engagement in behaviors under consistent circumstances or “routine” conditions; they are typically done at the same time, in the same place, and following the same cue to act (Wood & Neal, 2016; Wood & Rünger, 2016). As an individual develops a pattern of repeatedly responding to a given configuration of contextual cues by rehearsing a specific set of behaviors, that configuration of cues gradually begins to trigger a mental representation of the behavioral response without requiring the exertion of executive control. This fluid and effortless mental association makes performance of the habitual behavior largely automatic (Wood & Rünger, 2016). The exact neural mechanisms by which this process occurs are still debated, but the nature of a habit can be conceptualized as a reduction in the cognitive effort associated with engaging in the habitual behavior when routine contextual cues are in place. These patterns suggest an opportunity for organizations: routines are central to habitual behavior, and organizations may be able to capitalize on this fact when attempting to encourage the formation of beneficial habits.
Past research evaluating interventions that encourage routines has shown promising results (Carels et al., 2011, 2014; Judah, Gardner, & Aunger, 2013; Lally, Chipperfield, & Wardle, 2008; for a discussion, see Wood & Rünger, 2016). However, prior interventions designed to facilitate habit formation have done far more than simply encouraging routines. For example, they have provided general lifestyle advice (Carels et al., 2011, 2014; Lally, Chipperfield, & Wardle, 2008) or coupled one behavior (e.g., flossing) with another (e.g., tooth brushing; Judah, Gardner, & Aunger, 2013). Furthermore, previous studies attempting to encourage routines were small in scale (each with sample sizes of ~100 participants or fewer). To address this gap in the literature, we conducted a 2,508-participant field experiment designed to test whether people form longer-lasting exercise habits if they are encouraged to maintain a strict routine rather than encouraged to exercise frequently without necessarily adhering to a particular schedule.
Our field experiment included employees at Google who were interested in exercising more regularly at workplace gyms. At the beginning of the experiment, all participants chose a daily, two-hour window when it would be best for them to exercise, and all participants were informed that they would receive reminders to exercise every weekday at the beginning of that window. Participants were then randomly assigned to one of five experimental conditions. During the four-week intervention, participants in two “flexible” experimental conditions were paid $3 and $7, respectively, for any weekday when they exercised for at least 30 minutes at a workplace gym. Participants in two “routine” experimental conditions were also paid $3 and $7, respectively, for these workouts but only if they entered the gym within their chosen two-hour window. Participants in the control group received no monetary incentives for exercise. We analyzed data on participants’ gym visits both during the four-week intervention period and after the intervention period, when incentive payments were no longer offered.
During the intervention period, the two flexible conditions (pooled together) increased the number of gym visits by 0.19 per week relative to the two routine conditions (pooled together), and they increased the likelihood of having at least one gym visit in a given week by 4 percentage points relative to the routine conditions. Of course, for the purpose of learning about habit formation, we are even more interested in the persistence of treatment effects after the intervention period ended and the financial incentives were removed. During the first four weeks of the post-intervention period, the pooled flexible conditions increased the number of gym visits by 0.10 per week and increased the likelihood of having at least one gym visit in a given week by 6 percentage points relative to the pooled routine conditions, although the former difference is only marginally statistically significant.
While measuring the relative impact of the flexible and routine incentive schemes is relevant for judging the efficacy of these two types of policies, the difference in post-intervention exercise we detect is likely driven by the fact that the flexible conditions produced more exercise than the routine conditions during the intervention (at greater expense to the employer). This finding that more exercise in the past begets more exercise in the future (regardless of the timing of past gym visits) is consistent with past empirical research (e.g., Charness & Gneezy, 2009) and models of habit formation (e.g., Becker & Murphy, 1988). Our study design—which randomized not only whether incentives for exercise were flexible or routine, but also their magnitude—allows us to ask a more interesting and novel question. Specifically, we can examine which type of incentive scheme generated more post-intervention gym visits, holding constant the frequency of intervention-period gym visits. It turns out that the routine condition offering $7 incentive payments and the flexible condition offering $3 incentive payments generated approximately the same number of intervention-period gym visits, with the routine condition generating more in-window gym visits (visits that began during a participant’s chosen two-hour window) and fewer out-of-window gym visits (visits that began outside a participant’s chosen two-hour window). Past research on habits suggests that the routine condition participants who were offered $7 incentive payments should be more likely to develop exercise routines and should therefore sustain more persistent exercise habits after the intervention than the flexible condition participants who were offered $3 incentive payments.2 If anything, however, we find the opposite result in our experimental data. In the transition from the intervention period to the first four weeks of the post-intervention period, the routine condition offering $7 incentive payments exhibited a decrease in average weekly gym visits that was 0.14 larger (i.e., more negative) than the decrease observed in the flexible condition offering $3 incentive payments. A similar pattern emerged when examining the average weekly likelihood of at least one gym visit. Specifically, the decrease in the average weekly likelihood of at least one gym visit was 5 percentage points larger for the routine condition with $7 payments than for the flexible condition with $3 payments. These findings are reinforced by an instrumental variables analysis of our data, which leads to similar conclusions. In short, when people are induced to exercise at an equal frequency but in a more routinized way, we find evidence that they form weaker exercise habits, contrary to past theorizing.
To interpret our experimental results within a broader context, we present a simple model of habit formation, which could apply to visiting the gym, giving feedback to employees, or engaging in any other behavior that might be repeated in a consistent fashion. The model has a single agent and two periods. In each period, the agent has three possible decisions: taking an in-window action (that is, an action at a planned and consistent time), an out-of-window action (that is, an action at any other time within the period), or no action. Mapping the model to our experiment, the agent can have an in-window gym visit, an out-of-window gym visit, or no gym visit in a given period. The intrinsic utility of each type of action (in-window or out-of-window, relative to no action) is randomly drawn at the beginning of each period.3 In the first period, the agent may be offered financial incentives for taking an action. A routine incentive scheme offers a payment for an in-window action but not for an out-of-window action, while a flexible incentive scheme offers a payment for either action. An action of a given type in the first period forms a habit in the sense that it increases the intrinsic utility of that same type of action in the second period (an assumption that is consistent with our experimental results). In our baseline model, we do not assume that an in-window action in the first period has a stronger habit-forming effect than an out-of-window action in the first period. This assumption is easily accommodated within our framework and does not change our qualitative conclusions, but it is not necessary for the model to serve its purpose, which is to highlight other factors that influence the effectiveness of routine incentive schemes relative to flexible incentive schemes. In our analysis of the model, we take the perspective of a manager or policymaker who is fundamentally indifferent between in-window and out-of-window actions and who simply wishes to increase the overall likelihood that the agent takes either an in-window or an out-of-window action in the second period (i.e., in the long run).
With this setup, the model’s predictions depend on the rates of in-window and out-of-window actions in the absence of incentives. For instance, if the likelihood of an in-window action in the absence of incentives is high and the likelihood of an out-of-window action in the absence of incentives is low, a routine incentive scheme leads to a larger increase in the overall likelihood of seeing either an in-window or an out-of-window action in the second period than a flexible incentive scheme offering payments of the same dollar amount. If, on the other hand, the likelihood of an in-window action is similar to or less than the likelihood of an out-of-window action in the absence of incentives, a flexible incentive scheme is more effective at increasing the overall likelihood of seeing either an in-window or an out-of-window action in the second period than a routine incentive scheme offering payments of the same dollar amount.4 Our experimental data appear to match this latter case. Intuitively, if opportunities for routine (in-window) gym visits are often inferior to alternative (out-of-window) gym visit opportunities, an incentive scheme that promotes routine exercise tends to generate in-window gym visits that supplant out-of-window gym visits, instead of in-window gym visits that take place when no gym visit would have otherwise occurred. In our data, an increase in in-window gym visits appears to support habit formation, but a reduction in out-of-window gym visits simultaneously appears to undermine other routines that might have developed. Our field experiment suggests that this problem can arise in dynamic, fast-paced workplaces, where it is difficult to identify a regular time window for exercise that is unlikely to be disrupted or superseded by alternative exercise opportunities. We conclude that while routines have been proven elsewhere to be important to habit formation, it may be challenging for managers to encourage routines in environments with frequently shifting demands on people’s time. Routine incentive schemes may be more effective when applied to behaviors and environments for which the best opportunity to take the desired action is consistent from one time period to the next. In such stable contexts, routine incentive schemes can potentially strengthen incipient habits.
This paper is related to several strands of prior research. First, we build on previous applications of psychological insights to design interventions that change behavior (Benartzi et al., 2017; Johnson & Goldstein, 2003; Larrick & Soll, 2008; Madrian & Shea, 2001; Thaler & Benartzi, 2004; Thaler & Sunstein, 2008). In particular, recent research has shown that interventions rewarding repeated engagement in desirable behaviors like exercise, for as little as a month, can build habits that stay in place after incentives are removed (Acland & Levy, 2015; Charness & Gneezy, 2009; Hussam et al., 2017; Royer, Stehr, & Sydnor, 2015). These findings are consistent with prior work theorizing that habits are formed by repeatedly engaging in a behavior (Becker & Murphy, 1988). Our paper extends this line of work by testing an intervention that leverages psychological insights about the importance of repeated engagement in a behavior in a routine fashion for the purposes of forming a habit.
The idea that routines are important for habit formation (Wood & Neal, 2016; Wood & Rünger, 2016) is based in part on experimental studies of habitual behaviors. For example, individuals with a strong prior habit of eating popcorn in movie theaters ate the same amount of popcorn whether it was fresh or stale if they were sitting in a movie theater, but ate more fresh popcorn than stale popcorn if they were sitting in a meeting room, suggesting that automatic performance of a habitual behavior (eating popcorn, regardless of whether it is fresh or stale) is associated with routine conditions (sitting in a movie theater) but not with non-routine conditions (sitting in a meeting room). Individuals without a strong prior habit of eating popcorn in movie theaters ate more fresh popcorn than stale popcorn both in the movie theater and in the meeting room (Neal et al., 2011). There is also previous research that directly studies individuals who have succeeded in establishing beneficial habits. In the domain of medication regimens, adherence is higher among those with regular pill-taking routines (Brooks et al., 2014). In a sample of regular gym visitors, 75% reported that they tended to exercise at the same time of day, and although exercising at the same time of day did not correlate with exercise frequency within this highly selected sample, the large fraction of individuals in the sample who indicated that time of day was part of their exercise routine suggests that consistent timing may be helpful for forming a habit (Tappe et al., 2013). As described above, routine-building interventions that have been evaluated in previous research have used small sample sizes and have either incorporated (a) more features than the mere encouragement of routines or (b) context-specific design elements that make generalization difficult (Carels et al., 2011, 2014; Judah, Gardner, & Aunger, 2013; Lally, Chipperfield, & Wardle, 2008). This paper reports the results of a larger-scale field experiment focused on daily routines that could be a broadly applicable path to promoting beneficial habits.
The remainder of this paper is organized as follows. Section 2 presents our experimental design and our methods for analyzing the data. Section 3 presents our experimental results. In Section 4, we analyze the simple model that we use to interpret our findings, and we discuss the limitations of our study. Section 5 concludes.
2. EXPERIMENTAL DESIGN AND IMPLEMENTATION
Setting
We collaborated with the technology company Google to conduct a randomized controlled trial with a subset of the company’s employees. To be eligible to participate, an individual was required to be a full-time, part-time, or fixed-term employee or an intern at one of the company’s seven U.S. office locations that partnered on our study, leaving us with roughly 25,000 eligible employees. Each office location where our experiment was implemented had at least one on-site fitness center. Although each fitness center boasts unique features, all offer personal trainers and group fitness classes, and all are equipped with exercise machines and weights. Basic gym access (e.g., use of the exercise machines and weights) is free to all employees, but employees must pay fees for extra services, such as personal training, nutrition counseling, and some special group classes. Upon entering the gym, employees encounter a computer kiosk where they are asked to swipe their employee identification badge and record their gym visit. We rely on these login data to track individual gym attendance. Employees are also asked to swipe their badge as they exit the gym.
Participant Recruitment and Randomization
Figure 1 shows the flow and randomization of study participants, and Appendix Figure 1 illustrates the timeline of the experiment.
Figure 1.
Experimental Flow Chart
Recruitment.
Participant recruitment began on February 3, 2015, through a series of poster and email advertisements (see Appendix B, Figures B1 and B2). These advertisements explained that employees had a chance to be paid for exercising and encouraged employees to visit an internal company website to learn more and register with a friend from their office by February 23, 2015 (a deadline that was subsequently extended by two days to accommodate additional recruiting efforts). The posters and emails informed employees that completing an initial registration survey would enter them into raffles for a Fitbit Surge (a fitness tracker valued at approximately $250) and a $100 entertainment gift card.
Registration Survey.
Google employees who responded to our recruitment campaign were given a web link to complete our registration survey (see Appendix B, Figure B3). Upon starting the survey, employees were told that the program, labeled the Fresh Start Fitness Challenge, was part of a research study being conducted by Google in partnership with academic researchers and was designed to help employees achieve their fitness goals. They were also reminded about the raffles and were told that completion of the survey did not guarantee registration in the study in the event of over-enrollment.
The survey began with a consent form and some background questions (name, email address, office location, typical number of days per week involving exercise for at least 30 minutes, gender, and ethnicity). Next, employees were asked to register their employee identification badge with the Google gym, allowing us to track their gym entrances and exits (see Appendix C for additional details about the gym registration process). After being prompted to register with the gym, participants were asked to select a “workout buddy” (their partner for the program) by providing the name and corporate email address of another employee at the same office location. This employee then received an email with a prompt to complete the registration survey (see Appendix D for more detailed information about the partner pairing process).
After choosing a workout partner, employees were asked to select a two-hour block of time when they preferred to start their weekday workouts (which would last at least 30 minutes) at the company gym.5 Based on informal conversations with Google employees, we made the workout windows two hours long. Our goal was to strike a reasonable balance between windows that would (a) sufficiently accommodate day-to-day variability in employees’ schedules and (b) be sufficiently narrow to ensure a series of gym visits initiated at different times within a window would still constitute a time-based routine. Although employees could coordinate workout windows with their partners (31.6% of the final sample selected a workout window that overlapped perfectly6 with their partner’s), they were not required to do so. Employees were then told that they would receive daily reminders (sent to their corporate email address) Monday through Friday, 15 minutes prior to the start of their workout window. They could also opt in to receive text message reminders at the same time by providing their cell phone number (35.8% of the final sample received text message reminders).
At this point in the registration survey, employees were offered a $10 Amazon gift card to create an (optional) account with AchieveMint, a free app that aggregates data from other apps and fitness trackers, including minute-by-minute step data from Fitbit, which we would collect for this study. Among the employees who were enrolled in the study, 25.9% (650 individuals) created an AchieveMint account and received a $10 gift card, and 4.5% (114 individuals) synched a Fitbit with AchieveMint.
Employees were then told that they were officially registered for the study and received a confirmation email (see Appendix B, Figure B4). At this point, participants could exit the survey or continue to optional demographic questions (e.g., age, height, weight, employment information, and current exercise habits).7 Out of the employees who were enrolled in the study, 54% completed all of these optional questions.
In total, 2,508 employees, or approximately 10% of the eligible population, successfully completed all steps of the registration process for our study.
Experimental Conditions.
Each participating pair of employees was randomly assigned to one of five conditions (four treatment conditions and one control condition). Participants in the control condition did not receive monetary payments for completing workouts. Participants in the treatment conditions received monetary payments when they completed a qualifying workout during the four-week intervention period. Two of the treatment conditions were flexible conditions, in which participants earned a payment for each weekday (Monday-Friday) during which they worked out at the company gym for at least 30 minutes. The other two treatment conditions were routine conditions, in which participants earned a payment for each weekday during which they worked out at the company gym for at least 30 minutes, provided that they started the workout during their preselected workout window. For both the flexible and routine conditions, participants were randomly assigned to receive either $3 per workout or $7 per workout. In summary, the five experimental conditions were the control group, the flexible $3 payment group, the flexible $7 payment group, the routine $3 payment group, and the routine $7 payment group. It is worth noting that we randomized incentive size as well as the presence of routine versus flexible incentives because we anticipated that the routine and flexible conditions would induce different numbers of gym visits during the intervention period if incentives per qualifying gym visit were equal across conditions. By varying incentive size, we hoped to make it possible to compare the effects of routine versus flexible exercise post-intervention given roughly equal exercise levels during the intervention.
Power Calculations.
At the outset of this experiment, it was unclear how many of the tens of thousands of employees recruited to participate in our exercise program would enroll. We used the following method to conduct power calculations and to determine how many experimental conditions it would be possible to include in our study. First, we consulted prior research on encouraging gym attendance in healthy populations to assess the typical size of the effect of financial incentives on an individual’s number of gym visits per week (Acland & Levy, 2015; Charness & Gneezy, 2009; Milkman, Minson, & Volpp, 2014; Royer, Stehr, & Sydnor, 2015). We found that incentives of approximately the same magnitude as ours have increased the number of gym visits per week by 10%−200%, with a typical standard deviation of 1.25 visits per week. Because prior research has reliably shown a large and significant effect of incentives on subsequent exercise habits, we determined that we could replicate this well-established finding by using a holdout control group that was small relative to our treatment groups. To meet our goal of having 80% power to detect a 35% difference between our control group and our flexible $3 payment group, we aimed to assign 135 participants to the control group (we ended up with 132 in the control). In addition, we aimed for 80% power to detect a 15%−20% difference in gym visits between the flexible and routine conditions, which required approximately 750 participants per condition.8
We hoped to include up to eight treatment arms in our study. In addition to the four treatment arms described previously, we planned to incorporate up to four additional treatment conditions, which would have been identical to the four included treatment conditions except that participants would have been required to coordinate their workout windows with their workout partners. The purpose of these coordinated conditions would have been to assess the effects of social support on the creation of exercise habits. We decided in advance that if fewer than 3,135 (= 750×4 + 135) employees signed up for the study, we would only include the conditions that allowed participants to select their workout windows individually. We implemented this plan when 2,702 employees signed up for the study (2,508 of whom completed all of the steps necessary for registration). This explains why our recruitment materials and intake survey encouraged employees to sign up for the study with a workout partner.
Randomization.
Our registration survey closed on February 25, 2015 (two days later than initially planned, as we extended our registration deadline to allow for additional recruiting efforts), and participants were randomized in pairs into one of the five experimental conditions on three separate dates (February 26, February 27, and March 3) depending on when they fulfilled all requirements for randomization. In order to proceed to randomization, participants must have (a) been partnered successfully and (b) registered online with the company gym.9 On February 26, 1,582 individuals (791 pairs) were randomized, followed by 826 additional individuals (413 pairs) on February 27 and 100 individuals (50 pairs) on March 3. In total, 2,508 participants (1,254 pairs) were randomly assigned to conditions.
For each of the three randomization waves, we used a stratified randomization procedure with four strata based on (a) whether the average of the two partners’ self-reported typical number of workouts per week was above or below the median within their randomization wave (the median for all waves was 2.5 workouts per week) and (b) whether or not the partners had (spontaneously) coordinated their workout windows. The randomization scheme therefore had 12 strata total, four for each of the three randomization waves. All regression results that we report control for strata fixed effects.
The Intervention
Information Provided to Participants about Their Experimental Conditions.
As soon as a participant was randomized to an experimental condition, he or she received an email containing a link to a website describing the incentive structure for his or her condition and to a comprehension check survey (see Appendix E for details regarding this process). To encourage participants to read the treatment information, they were truthfully told that they would learn the registration raffle results as well as more details about their incentives after they completed the survey. Participants were also asked not to speak to anyone other than their workout buddy about the Fresh Start Fitness Challenge. However, we could not monitor or enforce compliance with this request.
Intervention Period.
The intervention period began on March 2, 2015, for participants who were randomized in February and on March 4, 2015, for participants who were randomized on March 3. The intervention period ended on March 31, 2015, for all participants. Participants in all five conditions received daily workout reminder emails and/or text messages 15 minutes before the start of their self-selected workout window (see Appendix B, Figure B13 for the exact contents of the reminder messages).10
Post-Intervention Period.
To encourage participants to continue to reliably swipe their employee identification badges when entering and exiting the gym, we sent emails to participants on April 1 (the first day after the conclusion of the intervention period) with the following announcement, among others: “On a randomly selected day in the month of April, we will have a lottery to select several members of the Challenge to receive $250 each. Here’s the catch: you can only win if you badged in and out at the gym on the day of the lottery” (see Appendix B, Figure B14 for the full text of the emails). We later announced that we would hold this lottery every month through the end of 2015.
On April 17, 2015 (two weeks after the intervention period ended), participants received an email (see Appendix B, Figure B15) asking them to complete an exit survey (see Appendix B, Figure B16). After the exit survey was completed, participants received study-related payments through an online payment system. During the post-intervention period, we continued to collect gym attendance data. In addition, participants continued to receive daily workout reminders for 10 months post-intervention (until February 1, 2016) unless they opted out.
Statistical Analysis
Dependent Variables.
Our primary outcome was participant gym attendance. To measure gym attendance, we obtained data tracking each time a study participant used his or her employee identification badge to enter or exit a company gym. Consistent with previous studies (Acland & Levy, 2015; Charness & Gneezy, 2009; Milkman, Minson, & Volpp, 2014), we initially planned to obtain and analyze data from two post-intervention follow-up periods: (1) the four-week period following the conclusion of the intervention (a length of time mirroring the length of our intervention) and (2) the ten-week period following the conclusion of the intervention (mapping roughly onto the follow-up periods from Charness and Gneezy, 2009 Study 1; Acland and Levy, 2015; and Milkman, Minson, & Volpp, 2014). However, we learned in the midst of implementing the study that we would also be able to obtain data through the end of the calendar year, which concluded 40 weeks after the end of the intervention period, and we therefore analyze these supplemental data in addition to the data we planned to collect. In the main text of this paper, we focus on analyses of the four-week post-intervention period, but analogous analyses for post-intervention weeks 5–10 and 11–40 can be found in Appendix Tables 2–5 and Appendix Figures 4–5.
Following past research on the impact of incentives on gym attendance habits, we measure gym attendance in two ways. First, we measure the total number of days each of our study participants visited the gym in each week (e.g., Acland & Levy, 2015; Charness & Gneezy, 2009; Milkman, Minson, & Volpp, 2014; Royer, Stehr, & Sydnor, 2015). Second, we measure whether or not a participant visited the gym at least once in a given week (e.g., Royer, Stehr, & Sydnor, 2015). This second dependent variable is a binary variable that is coded as 1 if a participant visited the gym at least once during the week and 0 otherwise. For both dependent variables, we count a gym visit as having occurred as long as we see a study participant badge in at the gym.11
We also separately measure “in-window gym visits” and “out-of-window gym visits.” We calculate the total number of in-window gym visits as the total number of days in a given week on which a participant was recorded as having made a gym visit within their pre-selected two-hour workout window (e.g., between 1:00pm and 3:00pm if the participant chose 1:00pm-3:00pm as their preferred workout window during the registration survey). Analogously, the total number of out-of-window gym visits is defined as the total number of days within a specific week on which a participant made a gym visit outside their pre-specified exercise window, even if they made an in-window visit on the same day. Thus, it is possible that participants’ sum of in-window and out-of-window gym visits exceeds their total weekly gym visits, as the total weekly gym visits variable records at most one visit per day.12
Regression Specifications.
Our primary regression specification is:
where i indexes participants and t indexes weeks. The right-hand-side variables of interest are indicators for experimental conditions (CFlex$3,i, CFlex$7,i, CRout$3,i and CRout$7,i), and Xi is a vector of control variables. The control variables in our primary analyses are indicators for the 12 strata in our randomization scheme, with one indicator omitted to avoid collinearity with the constant term. The strata were defined by (a) randomization date (February 26, February 27, or March 3), (b) whether the average of the two partners’ self-reported typical number of workouts per week was above or below the median within their randomization wave, and (c) whether or not the partners had (spontaneously) coordinated their workout windows. We conduct separate regressions for the four weeks of the intervention period and for the first four weeks of the post-intervention period, and we cluster standard errors at the participant pair level. The left-hand-side variable yit is one of six outcomes:
Number of days with a gym visit for participant i during week t
Number of days with an in-window gym visit for participant i during week t
Number of days with an out-of-window gym visit for participant i during week t
Whether participant i visited the gym at all during week t
Whether participant i visited the gym during his/her workout window during week t
Whether participant i visited the gym outside of his/her workout window during week t
We also conduct an analysis that uses the same regression framework but switches the outcome variable to be the change from the four intervention weeks to the first four post-intervention weeks in the mean of one of the six variables above. For this analysis, the regression sample includes one observation per participant. These results complement the results comparing levels of post-intervention gym attendance across experimental conditions.
3. RESULTS
Sample Summary
Table 1 presents summary statistics for self-reported variables collected in our pre-intervention registration survey: company tenure, weekly pre-intervention workout frequency13, body mass index (BMI, calculated from self-reported height and weight), gender, job function, and ethnicity. This table shows the mean, standard deviation, and proportion of participants who responded to each question for all participants in our study (Column 1), as well as for participants in our control group (Column 2), flexible groups (Columns 3–5), and routine groups (Columns 6–8). Performing pairwise statistical tests to compare each demographic variable across experimental conditions, we find that 4 out of the 60 possible comparisons feature a difference that is statistically significant at the 5% level, roughly the number that would be expected by chance. Thus, it appears that random assignment successfully achieved balance across conditions.
Table 1.
Summary Statistics Describing Study Participants Overall and By Condition
Total | Control | Flexible |
Routine |
|||||
---|---|---|---|---|---|---|---|---|
Overall | $3 | $7 | Overall | $3 | $7 | |||
Number of Years with Company | 3.08 (2.60) | 3.25 (2.38) | 3.16 (2.69) | 3.31 (2.80) | 3.02 (2.56) | 2.96 (2.54) | 3.16 (2.69) | 2.75 (2.35) |
Proportion that responded | 69% | 77% | 72% | 70% | 73% | 67% | 68% | 65% |
Self-Reported Workouts Per Week (Pre-Intervention) | 2.67 (1.54) | 2.64 (1.63) | 2.69 (1.51) | 2.66 (1.54) | 2.72 (1.49) | 2.65 (1.56) | 2.62 (1.59) | 2.68 (1.54) |
Proportion that responded | 93% | 98% | 94% | 94% | 93% | 93% | 93% | 93% |
Body Mass Index (BMI) | 24.81 (4.34) | 24.36 (4.09) | 24.82 (4.56) | 24.83 (4.73) | 24.82 (4.39) | 24.84 (4.12) | 24.75 (4.23) | 24.94 (4.00) |
Proportion that responded | 67% | 73% | 68% | 68% | 69% | 64% | 66% | 62% |
Proportion of Males | 55% | 55% | 53% | 52% | 54% | 57% | 58% | 56% |
Proportion that responded | 95% | 97% | 95% | 95% | 94% | 94% | 94% | 94% |
Job Function | ||||||||
Tech | 61% | 65% | 60% | 59% | 60% | 61% | 62% | 60% |
Global Business Organization | 21% | 13% | 21% | 24% | 19% | 21% | 22% | 20% |
General & Administrative | 19% | 22% | 19% | 17% | 21% | 18% | 16% | 20% |
Proportion that responded | 69% | 75% | 71% | 70% | 73% | 66% | 68% | 64% |
Ethnicity | ||||||||
White | 49% | 52% | 49% | 49% | 50% | 49% | 50% | 48% |
Black | 3% | 3% | 3% | 2% | 3% | 3% | 3% | 2% |
Asian | 36% | 35% | 34% | 36% | 33% | 37% | 37% | 38% |
Hispanic | 5% | 3% | 6% | 6% | 6% | 4% | 4% | 4% |
Mixed or Other | 7% | 7% | 8% | 7% | 8% | 7% | 6% | 8% |
Proportion that responded | 89% | 91% | 88% | 89% | 88% | 89% | 88% | 89% |
Sample Size | 2,508 | 132 | 1,194 | 600 | 594 | 1,182 | 594 | 588 |
Note: This table summarizes key employee characteristics based on responses to questions included in the registration survey, which participants had the option to skip. Since responding to these questions was voluntary, we report the proportion of participants who responded to each question. Standard deviations for means are reported in parentheses. Percentages may not add up to 100% due to rounding.
Appendix Table 1 summarizes participant engagement with various aspects of the Fresh Start Fitness Challenge. Participants were required to receive workout reminder emails as of the date of randomization, and only 1%−2% opted out of receiving these emails by the end of the intervention period. Reminder text messages were an optional feature of the program, and 42% of participants opted to receive text messages as of the date of randomization, with only 6% subsequently opting out of text reminders during the intervention (leaving 36% still receiving text messages at the end of the intervention period).
As for participants’ chosen workout windows, 22% of participants selected workout windows beginning between 3:00AM and 8:45AM; 29% selected windows beginning between 9:00AM and 2:45PM; 48% selected windows beginning between 3:00PM and 8:45PM; and 1% selected windows beginning between 9:00PM and 2:45AM. Nearly one-third of participants had workout windows that exactly matched their partners’, a fraction that is statistically significantly different from the 4% that would be expected if participants had their chosen workout windows but were randomly assigned to pairs.14 Participants were also imperfect at predicting the workout windows that would correspond to their most regular gym visits. We determined this by looking at the timing of gym visits by participants in the control condition and flexible treatment conditions who had at least one weekday gym visit during our study’s four-week incentive period. (We do not look at participants in the routine treatment conditions because they had monetary incentives encouraging gym visits during their selected workout windows.) We see that the mean fraction of incentive-period weekday gym visits that began during a participant’s chosen workout window was 51%. Further, 64% of individuals could have selected a counterfactual workout window that would have had more incentive-period weekday gym visits in it than the chosen workout window.
Approximately one-quarter of participants signed up for an AchieveMint account, although we only track minute-by-minute physical activity data for the 4.5% of participants who linked a Fitbit device to their account.
Treatment Effects During the Intervention Period
Appendix Figure 2 presents means of weekly overall, in-window, and out-of-window gym attendance by experimental condition over the course of our four-week intervention period. The patterns indicate that larger incentive payments yielded more exercise, while routine incentives yielded more in-window workouts but fewer overall workouts. Table 2 presents the results of regressions that confirm these patterns. Note that each of the three outcome variables counts at most one gym visit per day, as explained above. The sum of the control group means for the in-window visits and out-of-window visits variables exceeds the control group mean of the overall visits variable because participants could have recorded both an in-window visit and an out-of-window visit on the same day.
Table 2. Panel A. Regressions Predicting Participants’ Weekly Workouts during the Intervention Period.
This table reports a series of ordinary least squares regressions predicting a study participant’s weekly number of (a) overall workouts, (b) workouts initiated during their workout window, and (c) workouts initiated outside of their workout window during the four-week intervention period. In each column, we report the mean number of workouts completed by the control group within this period. The primary predictors included in these regressions are treatment status indicators, which indicate the size of the incentive offered for exercise ($3 versus $7) and the flexibility of the workout schedule (flexible versus routine). We report pairwise Wald tests to assess whether or not all paired regression coefficients reported differ significantly from each other.
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
---|---|---|---|---|---|---|---|---|---|
Total Workouts | Total In-Window Workouts | Total Out-of-Window Workouts | Total Workouts | Total In-Window Workouts | Total Out-of-Window Workouts | Total Workouts | Total In-Window Workouts | Total Out-of-Window Workouts | |
Flexible Payment $3 | 0.58*** (0.14) | 0.32*** (0.09) | 0.27** (0.10) | ||||||
Flexible Payment $7 | 0.89*** (0.14) | 0.43*** (0.10) | 0.50*** (0.10) | ||||||
Routine Payment $3 | 0.40** (0.14) | 0.57*** (0.10) | −0.15 (0.09) | ||||||
Routine Payment $7 | 0.69*** (0.14) | 0.96*** (0.10) | −0.21* (0.09) | ||||||
$3 Interventions | 0.49*** (0.13) | 0.45*** (0.09) | 0.06 (0.09) | ||||||
$7 Interventions | 0.79*** (0.13) | 0.69*** (0.09) | 0.15+ (0.09) | ||||||
Flexible Interventions | 0.74*** (0.13) | 0.37*** (0.09) | 0.38*** (0.09) | ||||||
Routine Interventions | 0.54*** (0.13) | 0.77*** (0.09) | −0.18* (0.09) | ||||||
| |||||||||
Mean Values of Control Group | 1.11 | 0.59 | 0.59 | 1.11 | 0.59 | 0.59 | 1.11 | 0.59 | 0.59 |
Observations | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 |
R-squared | 0.07 | 0.07 | 0.10 | 0.07 | 0.05 | 0.04 | 0.07 | 0.06 | 0.10 |
| |||||||||
Wald Test ($3 Flexible-$7 Flexible) | |||||||||
Difference in Coefficients | −0.31*** (0.09) | −0.11 (0.07) | −0.24*** (0.07) | ||||||
Wald Test ($3 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.18* (0.09) | −0.25*** (0.08) | 0.41*** (0.05) | ||||||
Wald Test ($3 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | −0.11 (0.09) | −0.64*** (0.08) | 0.47*** (0.05) | ||||||
Wald Test ($7 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.49*** (0.09) | −0.14+ (0.08) | 0.65*** (0.06) | ||||||
Wald Test ($7 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.20* (0.09) | −0.53*** (0.08) | 0.71*** (0.05) | ||||||
Wald Test ($3 Routine-$7 Routine) | |||||||||
Difference in Coefficients | −0.29** (0.09) | −0.39*** (0.09) | 0.06 (0.04) | ||||||
Wald Test ($3-$7) | |||||||||
Difference in Coefficients | −0.30*** (0.06) | −0.25*** (0.06) | −0.09* (0.04) | ||||||
Wald Test (Flexible-Routine) | |||||||||
Difference in Coefficients | 0.19** (0.06) | −0.39*** (0.06) | 0.56*** (0.04) |
Note: Standard errors clustered by workout buddy pair are in parentheses.
p<0.10
p<0.05
p<0.01
p<0.001.
The control variables in the regressions are indicators for randomization strata (12 strata: three randomization dates, crossed with whether or not workout window perfectly overlapped with partner’s workout window, crossed with whether or not self-reported pair number of workouts per week was above median for randomization date), as well as an indicator for missing workout window.
Incentive Size.
Higher incentive payments led to more exercise during the intervention period. As Table 2A Column 4 shows, participants paid $7 per qualifying gym visit went to the Google gym a regression-estimated 0.30 more times per week than those paid $3 (p<0.001) and 0.79 more times per week than those in the control group (p<0.001). The difference of 0.49 visits per week between participants paid $3 per qualifying gym visit and those in the control group was also statistically significant (p<0.001). As Table 2B Column 4 shows, participants paid $7 per qualifying gym visit went to the Google gym one or more times per week at a regression-estimated six percentage point higher rate than participants paid $3 (p<0.001) and at a regression-estimated 20 percentage point higher rate than those in the control group (p<0.001). The difference of 14 percentage points between participants paid $3 per qualifying gym visit and those in the control group was also statistically significant (p<0.001).
Table 2. Panel B. Regressions Predicting Participants’ Likelihood of Working out Each Week during the Intervention Period.
This table reports a series of ordinary least squares regressions predicting a study participant’s weekly likelihood of completing a (a) workout anytime, (b) workout initiated during their workout window, and (c) workout initiated outside of their workout window during the four-week intervention period. In each column, we report the mean weekly fraction of participants in the control group who completed a workout within this period. The primary predictors included in these regressions are treatment status indicators, which indicate the size of the incentive offered for exercise ($3 versus $7) and the flexibility of the workout schedule (flexible versus routine). We report pairwise Wald tests to assess whether or not all paired regression coefficients reported differ significantly from each other.
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
---|---|---|---|---|---|---|---|---|---|
Any Workouts? (Y/N) | Any In-Window Workouts? (Y/N) | Any Out-of-Window Workouts? (Y/N) | Any Workouts? (Y/N) | Any In-Window Workouts? (Y/N) | Any Out-of-Window Workouts? (Y/N) | Any Workouts? (Y/N) | Any In-Window Workouts? (Y/N) | Any Out-of-Window Workouts? (Y/N) | |
Flexible Payment $3 | 0.15*** (0.04) | 0.13*** (0.04) | 0.12*** (0.04) | ||||||
Flexible Payment $7 | 0.23*** (0.04) | 0.17*** (0.04) | 0.20*** (0.04) | ||||||
Routine Payment $3 | 0.13** (0.04) | 0.20*** (0.04) | −0.04 (0.04) | ||||||
Routine Payment $7 | 0.17*** (0.04) | 0.30*** (0.04) | −0.07* (0.04) | ||||||
$3 Interventions | 0.14*** (0.04) | 0.17*** (0.03) | 0.04 (0.04) | ||||||
$7 Interventions | 0.20*** (0.04) | 0.23*** (0.03) | 0.06+ (0.04) | ||||||
Flexible Interventions | 0.19*** (0.04) | 0.15*** (0.03) | 0.16*** (0.04) | ||||||
Routine Interventions | 0.15*** (0.04) | 0.25*** (0.03) | −0.06 (0.03) | ||||||
| |||||||||
Mean Values of Control Group | 0.50 | 0.31 | 0.32 | 0.50 | 0.31 | 0.32 | 0.50 | 0.31 | 0.32 |
Observations | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 |
R-squared | 0.05 | 0.06 | 0.08 | 0.05 | 0.05 | 0.03 | 0.05 | 0.05 | 0.08 |
| |||||||||
Wald Test ($3 Flexible-$7 Flexible) | |||||||||
Difference in Coefficients | −0.07** (0.02) | −0.04 (0.02) | −0.07** (0.02) | ||||||
Wald Test ($3 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.03 (0.02) | −0.07** (0.03) | 0.16*** (0.02) | ||||||
Wald Test ($3 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | −0.02 (0.02) | −0.17*** (0.02) | 0.20*** (0.02) | ||||||
Wald Test ($7 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.10*** (0.02) | −0.03 (0.03) | 0.24*** (0.02) | ||||||
Wald Test ($7 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.05* (0.02) | −0.13*** (0.02) | 0.27*** (0.02) | ||||||
Wald Test ($3 Routine-$7 Routine) | |||||||||
Difference in Coefficients | −0.04+ (0.02) | −0.09*** (0.03) | 0.03+ (0.02) | ||||||
Wald Test ($3-$7) | |||||||||
Difference in Coefficients | −0.06*** (0.02) | −0.07*** (0.02) | −0.02 (0.02) | ||||||
Wald Test (Flexible-Routine) | |||||||||
Difference in Coefficients | 0.04* (0.02) | −0.10*** (0.02) | 0.22*** (0.02) |
Note: Standard errors clustered by workout buddy pair are in parentheses.
p<0.10
p<0.05
p<0.01
p<0.001.
The control variables in the regressions are indicators for randomization strata (12 strata: three randomization dates, crossed with whether or not workout window perfectly overlapped with partner’s workout window, crossed with whether or not self-reported pair number of workouts per week was above median for randomization date), as well as an indicator for missing workout window.
Flexible vs. Routine Incentives.
Table 2A Column 7 shows that participants in the flexible conditions visited the gym a regression-estimated 0.19 times more per week during the intervention period than participants in the routine conditions (p<0.01), and as Table 2B Column 7 shows, participants in the flexible conditions visited the gym one or more times during a week at a regression-estimated four percentage point higher rate than participants in the routine conditions (p<0.05). The point estimates for these effects are more than half the size of the point estimates for the effects of a $4 increase in payments for qualifying gym visits (that is, the differences induced by raising payments from $3 to $7).
As expected, participants in the routine conditions exercised significantly more during their workout windows than did participants in the flexible conditions. Participants in the routine conditions completed 1.34 in-window workouts per week on average and at least one in-window workout in a given week 56.2% of the time, whereas participants in the flexible conditions completed 0.95 in-window workouts per week on average and at least one in-window workout in a given week 46.2% of the time. Table 2A Column 8 and Table 2B Column 8 show that the regression-estimated differences comparing these two outcome variables for the routine conditions versus the flexible conditions are statistically significant (p<0.001). Conversely, those in the flexible conditions exercised significantly more outside of their workout windows than did participants in the routine conditions. Participants in the flexible conditions completed 0.98 out-of-window workouts per week on average and at least one out-of-window workout in a given week 48.5% of the time, whereas participants in the routine conditions completed 0.42 out-of-window workouts per week on average and at least one out-of-window workout in a given week 26.8% of the time. Table 2A Column 9 and Table 2B Column 9 show that these comparisons are also statistically significant (p<0.001).
Another way of analyzing the mix of in-window and out-of-window visits is to examine the fraction of participants’ workouts that took place during their workout windows by experimental condition. For each participant, we calculate the number of weekdays during the incentive period that featured a gym visit during their workout window, and we divide that number by the number of weekdays during the incentive period that featured any gym visit. Dropping individuals for whom the denominator of the fraction is zero (i.e., individuals who did not visit the gym during the incentive period), we find that the mean of the fraction is 77.7% in the routine conditions. This is significantly higher than the 50.8% mean in the flexible conditions (p<0.001) and the 53.2% mean in the control condition (p<0.001).
Post-Intervention Results
Results for Levels of Exercise Activity.
Patterns of post-intervention gym attendance over our four-week follow-up period are depicted in Appendix Figure 3. Specifically, the three plots in Appendix Figure 3A present means of overall, in-window, and out-of-window gym visits, respectively, for each week by experimental condition, while the three plots in Appendix Figure 3B present the fraction of participants with at least one overall, in-window, and out-of-window gym visit, respectively, for each week by experimental condition.15 Table 3 presents the regression analogs of Appendix Figures 3A and 3B.
Table 3. Panel A. Regressions Predicting Participants’ Weekly Workouts during Post-Intervention Weeks 1–4.
This table reports a series of ordinary least squares regressions predicting a study participant’s weekly number of (a) overall workouts, (b) workouts initiated during their workout window, and (c) workouts initiated outside of their workout window during the four weeks following the intervention period. In each column, we report the mean number of workouts completed by the control group within this period. The primary predictors included in these regressions are treatment status indicators, which indicate the size of the incentive offered for exercise ($3 versus $7) and the flexibility of the workout schedule (flexible versus routine). We report pairwise Wald tests to assess whether or not all paired regression coefficients reported differ significantly from each other.
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
---|---|---|---|---|---|---|---|---|---|
Total Workouts | Total In-Window Workouts | Total Out-of-Window Workouts | Total Workouts | Total In-Window Workouts | Total Out-of-Window Workouts | Total Workouts | Total In-Window Workouts | Total Out-of-Window Workouts | |
Flexible Payment $3 | 0.21+ (0.11) | 0.07 (0.08) | 0.13* (0.07) | ||||||
Flexible Payment $7 | 0.28* (0.11) | 0.09 (0.08) | 0.20** (0.07) | ||||||
Routine Payment $3 | 0.12 (0.11) | 0.07 (0.08) | 0.04 (0.07) | ||||||
Routine Payment $7 | 0.18 (0.11) | 0.15+ (0.08) | 0.01 (0.07) | ||||||
$3 Interventions | 0.17 (0.11) | 0.07 (0.07) | 0.09 (0.06) | ||||||
$7 Interventions | 0.23* (0.11) | 0.12+ (0.07) | 0.10+ (0.06) | ||||||
Flexible Interventions | 0.25* (0.11) | 0.08 (0.07) | 0.16** (0.06) | ||||||
Routine Interventions | 0.15 (0.11) | 0.11 (0.07) | 0.03 (0.06) | ||||||
| |||||||||
Mean Values of Control Group | 0.76 | 0.42 | 0.39 | 0.76 | 0.42 | 0.39 | 0.76 | 0.42 | 0.39 |
Observations | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 |
R-squared | 0.08 | 0.05 | 0.05 | 0.08 | 0.05 | 0.04 | 0.08 | 0.05 | 0.05 |
| |||||||||
Wald Test ($3 Flexible-$7 Flexible) | |||||||||
Difference in Coefficients | −0.07 (0.07) | −0.02 (0.05) | −0.06 (0.05) | ||||||
Wald Test ($3 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.09 (0.07) | 0.00 (0.05) | 0.09+ (0.05) | ||||||
Wald Test ($3 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.04 (0.07) | −0.08 (0.06) | 0.12** (0.04) | ||||||
Wald Test ($7 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.16* (0.07) | 0.02 (0.05) | 0.15** (0.05) | ||||||
Wald Test ($7 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.10 (0.07) | −0.06 (0.06) | 0.18*** (0.05) | ||||||
Wald Test ($3 Routine-$7 Routine) | |||||||||
Difference in Coefficients | −0.06 (0.07) | −0.09 (0.06) | 0.03 (0.04) | ||||||
Wald Test ($3-$7) | |||||||||
Difference in Coefficients | −0.06 (0.05) | −0.05 (0.04) | −0.02 (0.03) | ||||||
Wald Test (Flexible-Routine) | |||||||||
Difference in Coefficients | 0.10+ (0.05) | −0.03 (0.04) | 0.14*** (0.03) |
Note: Standard errors clustered by workout buddy pair are in parentheses.
p<0.10
p<0.05
p<0.01
p<0.001.
The control variables in the regressions are indicators for randomization strata (12 strata: three randomization dates, crossed with whether or not workout window perfectly overlapped with partner’s workout window, crossed with whether or not self-reported pair number of workouts per week was above median for randomization date), as well as an indicator for missing workout window.
We replicate the well-established finding from Charness and Gneezy (2009), Acland and Levy (2015), and Royer, Stehr, and Sydnor (2015) that when participants are paid to exercise repeatedly, they continue to exercise significantly more even after payments cease compared to a control group that is never paid to exercise. As Table 3A Column 7 reports, participants in the flexible conditions made 0.25 more overall gym visits per week in the 4-week post-intervention period than participants in the control condition (p<0.05). As Table 3B Column 7 shows, a similar pattern emerges when we consider a participant’s likelihood of working out at least once in a given week. Participants in the flexible conditions were 12 percentage points more likely to visit the gym at least once in a given week during our follow-up period than those in the control condition (p<0.001).
Table 3. Panel B. Regressions Predicting Participants’ Likelihood of Working out Each Week during Post-Intervention Weeks 1–4.
This table reports a series of ordinary least squares regressions predicting a study participant’s weekly likelihood of completing a (a) workout anytime, (b) workout initiated during their workout window, and (c) workout initiated outside of their workout window during the four weeks following the intervention period. In each column, we report the mean weekly fraction of participants in the control group who completed a workout within this period. The primary predictors included in these regressions are treatment status indicators, which indicate the size of the incentive offered for exercise ($3 versus $7) and the flexibility of the workout schedule (flexible versus routine). We report pairwise Wald tests to assess whether or not all paired regression coefficients reported differ significantly from each other.
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
---|---|---|---|---|---|---|---|---|---|
Any Workouts? (Y/N) | Any In-Window Workouts? (Y/N) | Any Out-of-Window Workouts? (Y/N) | Any Workouts? (Y/N) | Any In-Window Workouts? (Y/N) | Any Out-of-Window Workouts? (Y/N) | Any Workouts? (Y/N) | Any In-Window Workouts? (Y/N) | Any Out-of-Window Workouts? (Y/N) | |
Flexible Payment $3 | 0.10** (0.04) | 0.04 (0.03) | 0.07* (0.03) | ||||||
Flexible Payment $7 | 0.14*** (0.04) | 0.07* (0.03) | 0.10** (0.03) | ||||||
Routine Payment $3 | 0.05 (0.04) | 0.04 (0.03) | 0.01 (0.03) | ||||||
Routine Payment $7 | 0.07+ (0.04) | 0.06* (0.03) | 0.01 (0.03) | ||||||
$3 Interventions | 0.08* (0.03) | 0.04 (0.03) | 0.04 (0.03) | ||||||
$7 Interventions | 0.10** (0.03) | 0.07* (0.03) | 0.05+ (0.03) | ||||||
Flexible Interventions | 0.12*** (0.03) | 0.05+ (0.03) | 0.08* (0.03) | ||||||
Routine Interventions | 0.06+ (0.03) | 0.05+ (0.03) | 0.01 (0.03) | ||||||
| |||||||||
Mean Values of Control Group | 0.34 | 0.22 | 0.23 | 0.34 | 0.22 | 0.23 | 0.34 | 0.22 | 0.23 |
Observations | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 | 10032 |
R-squared | 0.07 | 0.05 | 0.05 | 0.07 | 0.05 | 0.05 | 0.07 | 0.05 | 0.05 |
| |||||||||
Wald Test ($3 Flexible-$7 Flexible) | |||||||||
Difference in Coefficients | −0.04 (0.02) | −0.03 (0.02) | −0.03 (0.02) | ||||||
Wald Test ($3 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.05* (0.02) | 0.00 (0.02) | 0.06** (0.02) | ||||||
Wald Test ($3 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.03 (0.02) | −0.03 (0.02) | 0.06** (0.02) | ||||||
Wald Test ($7 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.08*** (0.02) | 0.03 (0.02) | 0.08*** (0.02) | ||||||
Wald Test ($7 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.07** (0.02) | 0.01 (0.02) | 0.09*** (0.02) | ||||||
Wald Test ($3 Routine-$7 Routine) | |||||||||
Difference in Coefficients | −0.02 (0.02) | −0.02 (0.02) | 0.00 (0.02) | ||||||
Wald Test ($3-$7) | |||||||||
Difference in Coefficients | −0.03 (0.02) | −0.03+ (0.02) | −0.01 (0.01) | ||||||
Wald Test (Flexible-Routine) | |||||||||
Difference in Coefficients | 0.06*** (0.02) | 0.00 (0.02) | 0.07*** (0.01) |
Note: Standard errors clustered by workout buddy pair are in parentheses.
p<0.10
p<0.05
p<0.01
p<0.001.
The control variables in the regressions are indicators for randomization strata (12 strata: three randomization dates, crossed with whether or not workout window perfectly overlapped with partner’s workout window, crossed with whether or not self-reported pair number of workouts per week was above median for randomization date), as well as an indicator for missing workout window.
We can also compare our flexible and routine conditions. Table 3A Column 7 shows that participants in the flexible conditions had a marginally significant 0.10 more overall gym visits per week than participants in the routine conditions (p<0.10), and Table 3B Column 7 shows that participants in the flexible conditions had a 6 percentage point higher likelihood of visiting the gym at least once in a given week than participants in the routine conditions (p<0.001). These differences in post-intervention gym attendance are approximately twice the size of those induced by a $4 increase in the incentive offered for qualifying gym visits during our intervention period (that is, the difference between our $3 and $7 incentive conditions), which are not statistically significant.
Although participants in the routine conditions worked out less frequently overall post-intervention than those in the flexible conditions, they did not exhibit a statistically significantly different number of in-window gym visits (see Table 3A Column 8). Similarly, although participants in the routine conditions were less likely to make at least one gym visit in a given week post-intervention than participants in the flexible conditions, Table 3B Column 8 shows that participants in the routine and flexible conditions worked out at least once during their workout windows in a given week at similar rates.16 To further explore this pattern, we count the number of weekdays during the 4-week post-intervention period on which a participant visited the gym during their workout window, and we divide by the number of weekdays on which the participant visited the gym at all. We call this the participant’s fraction of in-window gym visits. Among participants who ever visited the gym during the 4-week post-intervention period, the mean fraction of in-window gym visits is highest in the routine conditions (55.3%). The mean fraction of in-window gym visits is significantly lower in the flexible conditions (47.0%). For participants in the control condition, this statistic falls between the previous two at 51.3%. Overall, these patterns are consistent with the hypothesis that the routine conditions encouraged the formation of routines such that participants in these conditions developed a sustained habit of visiting the gym during their workout windows.
Participants in the flexible conditions completed 0.14 more out-of-window workouts per week post-intervention than participants in the routine conditions (p<0.001) and 0.16 more out-of-window workouts per week post-intervention than participants in the control condition (p<0.01). Similarly, participants in the flexible conditions were 7 percentage points more likely to make at least one out-of-window gym visit in a given week post-intervention than participants in the routine conditions (p<0.001) and 8 percentage points more likely to make at least one out-of-window gym visit in a given week post-intervention than participants in the control condition (p<0.05).
Results for Changes in Exercise Activity.
To complement our analysis of levels of exercise activity during the 4-week post-intervention period, we also examine changes in exercise activity from the intervention period to the 4-week post-intervention period. In Table 4A, the outcome variable is the change in a participant’s mean weekly number of overall, in-window, or out-of-window gym visits from the intervention period to the 4-week post-intervention period. In Table 4B, the outcome variable is the change in a participant’s mean of the indicator for having at least one overall, in-window, or out-of-window gym visit in a given week. Relative to the $3 incentive conditions, the $7 incentive conditions exhibited larger decreases in exercise activity after incentives were removed. Of course, this result may not be surprising, as the $7 incentive conditions exhibited more exercise activity during the intervention period (see Tables 2A and 2B), implying that a return to baseline would represent a larger decrease.
Table 4. Panel A. Regressions Predicting the Change in Participants’ Weekly Workouts from the Four-Week Intervention Period to Post-Intervention Weeks 1–4.
This table reports a series of ordinary least squares regressions predicting a study participant’s change from the four-week intervention period to the four weeks following the intervention period in average weekly number of (a) overall workouts, (b) workouts initiated during their workout window, and (c) workouts initiated outside of their workout window. In each column, we report the mean change for the control group. The primary predictors included in these regressions are treatment status indicators, which indicate the size of the incentive offered for exercise ($3 versus $7) and the flexibility of the workout schedule (flexible versus routine). We report pairwise Wald tests to assess whether or not all paired regression coefficients reported differ significantly from each other.
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
---|---|---|---|---|---|---|---|---|---|
Change in Total Workouts | Change in Total In-Window Workouts | Change in Total Out-of-Window Workouts | Change in Total Workouts | Change in Total In-Window Workouts | Change in Total Out-of-Window Workouts | Change in Total Workouts | Change in Total In-Window Workouts | Change in Total Out-of-Window Workouts | |
Flexible Payment $3 | −0.37*** (0.10) | −0.25*** (0.07) | −0.13 (0.08) | ||||||
Flexible Payment $7 | −0.61*** (0.10) | −0.34*** (0.07) | −0.31*** (0.08) | ||||||
Routine Payment $3 | −0.28** (0.10) | −0.51*** (0.07) | 0.19* (0.08) | ||||||
Routine Payment $7 | −0.51*** (0.10) | −0.81*** (0.08) | 0.22** (0.08) | ||||||
$3 Interventions | −0.33*** (0.09) | −0.38*** (0.06) | 0.03 (0.08) | ||||||
$7 Interventions | −0.56*** (0.09) | −0.57*** (0.07) | −0.05 (0.08) | ||||||
Flexible Interventions | −0.49*** (0.09) | −0.29*** (0.06) | −0.22** (0.08) | ||||||
Routine Interventions | −0.40*** (0.09) | −0.66*** (0.07) | 0.20** (0.08) | ||||||
| |||||||||
Mean Values of Control Group | −0.35 | −0.17 | −0.21 | −0.35 | −0.17 | −0.21 | −0.35 | −0.17 | −0.21 |
Observations | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 |
R-squared | 0.03 | 0.07 | 0.08 | 0.03 | 0.04 | 0.01 | 0.02 | 0.06 | 0.08 |
| |||||||||
Wald Test ($3 Flexible-$7 Flexible) | |||||||||
Difference in Coefficients | 0.24*** (0.07) | 0.09+ (0.05) | 0.18*** (0.05) | ||||||
Wald Test ($3 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | −0.09 (0.06) | 0.26*** (0.06) | −0.32*** (0.04) | ||||||
Wald Test ($3 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.14* (0.07) | 0.56*** (0.06) | −0.35*** (0.05) | ||||||
Wald Test ($7 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | −0.33*** (0.07) | 0.17** (0.06) | −0.50*** (0.05) | ||||||
Wald Test ($7 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | −0.10 (0.07) | 0.47*** (0.07) | −0.53*** (0.05) | ||||||
Wald Test ($3 Routine-$7 Routine) | |||||||||
Difference in Coefficients | 0.23*** (0.07) | 0.30*** (0.07) | −0.03 (0.04) | ||||||
Wald Test ($3-$7) | |||||||||
Difference in Coefficients | 0.24*** (0.05) | 0.20*** (0.04) | 0.07* (0.03) | ||||||
Wald Test (Flexible-Routine) | |||||||||
Difference in Coefficients | −0.10* (0.05) | 0.36*** (0.04) | −0.42*** (0.03) |
Note: Standard errors clustered by workout buddy pair are in parentheses.
p<0.10
p<0.05
p<0.01
p<0.001.
The control variables in the regressions are indicators for randomization strata (12 strata: three randomization dates, crossed with whether or not workout window perfectly overlapped with partner’s workout window, crossed with whether or not self-reported pair number of workouts per week was above median for randomization date), as well as an indicator for missing workout window.
Table 4. Panel B. Regressions Predicting the Change in Participants’ Likelihood of Working out Each Week from the Four-Week Intervention Period to Post-Intervention Weeks 1–4.
This table reports a series of ordinary least squares regressions predicting a study participant’s change from the four-week intervention period to the four weeks following the intervention period in average weekly likelihood of completing a (a) workout anytime, (b) workout initiated during their workout window, and (c) workout initiated outside of their workout window. In each column, we report the mean change for the control group. The primary predictors included in these regressions are treatment status indicators, which indicate the size of the incentive offered for exercise ($3 versus $7) and the flexibility of the workout schedule (flexible versus routine). We report pairwise Wald tests to assess whether or not all paired regression coefficients reported differ significantly from each other.
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
---|---|---|---|---|---|---|---|---|---|
Change in Mean of Any Workouts Indicator | Change in Mean of Any In-Window Workouts Indicator | Change in Mean of Any Out-of-Window Workouts Indicator | Change in Mean of Any Workouts Indicator | Change in Mean of Any In-Window Workouts Indicator | Change in Mean of Any Out-of-Window Workouts Indicator | Change in Mean of Any Workouts Indicator | Change in Mean of Any In-Window Workouts Indicator | Change in Mean of Any Out-of-Window Workouts Indicator | |
Flexible Payment $3 | −0.06+ (0.03) | −0.10*** (0.02) | −0.06+ (0.03) | ||||||
Flexible Payment $7 | −0.09** (0.03) | −0.10*** (0.02) | −0.10** (0.03) | ||||||
Routine Payment $3 | −0.08* (0.03) | −0.16*** (0.03) | 0.05+ (0.03) | ||||||
Routine Payment $7 | −0.10*** (0.03) | −0.24*** (0.03) | 0.08** (0.03) | ||||||
$3 Interventions | −0.07* (0.03) | −0.13*** (0.02) | −0.00 (0.03) | ||||||
$7 Interventions | −0.10*** (0.03) | −0.17*** (0.02) | −0.01 (0.03) | ||||||
Flexible Interventions | −0.07** (0.03) | −0.10*** (0.02) | −0.08** (0.03) | ||||||
Routine Interventions | −0.09** (0.03) | −0.20*** (0.02) | −0.07* (0.03) | ||||||
| |||||||||
Mean Values of Control Group | −0.16 | −0.09 | −0.09 | −0.16 | −0.09 | −0.09 | −0.16 | −0.09 | −0.09 |
Observations | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 | 2508 |
R-squared | 0.02 | 0.05 | 0.05 | 0.02 | 0.03 | 0.00 | 0.02 | 0.04 | 0.05 |
| |||||||||
Wald Test ($3 Flexible-$7 Flexible) | |||||||||
Difference in Coefficients | 0.03 (0.02) | 0.01 (0.02) | 0.04* (0.02) | ||||||
Wald Test ($3 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | 0.02 (0.02) | 0.07** (0.02) | −0.11*** (0.02) | ||||||
Wald Test ($3 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.05* (0.02) | 0.14*** (0.02) | −0.14*** (0.02) | ||||||
Wald Test ($7 Flexible-$3 Routine) | |||||||||
Difference in Coefficients | −0.01 (0.02) | 0.06** (0.02) | −0.15*** (0.02) | ||||||
Wald Test ($7 Flexible-$7 Routine) | |||||||||
Difference in Coefficients | 0.01 (0.02) | 0.13*** (0.02) | −0.19*** (0.02) | ||||||
Wald Test ($3 Routine-$7 Routine) | |||||||||
Difference in Coefficients | 0.03 (0.02) | 0.07** (0.02) | −0.03+ (0.02) | ||||||
Wald Test ($3-$7) | |||||||||
Difference in Coefficients | 0.03* (0.02) | 0.04* (0.02) | 0.01 (0.01) | ||||||
Wald Test (Flexible-Routine) | |||||||||
Difference in Coefficients | 0.02 (0.02) | 0.10*** (0.01) | −0.15*** (0.01) |
Note: Standard errors clustered by workout buddy pair are in parentheses.
p<0.10
p<0.05
p<0.01
p<0.001.
The control variables in the regressions are indicators for randomization strata (12 strata: three randomization dates, crossed with whether or not workout window perfectly overlapped with partner’s workout window, crossed with whether or not self-reported pair number of workouts per week was above median for randomization date), as well as an indicator for missing workout window.
It is more interesting to focus on a comparison of the routine $7 condition and the flexible $3 condition. These two conditions induced similar numbers of total workouts during the intervention period, but the workouts induced by these conditions during the intervention period were distributed differently: the routine $7 condition produced more in-window workouts and fewer out-of-window workouts than the flexible $3 condition. Past research on the psychology of habit formation suggests that the routine behavior induced by the routine $7 condition should help to establish exercise habits and therefore lead to less of a decrease in exercise activity after the end of the intervention period. However, Figure 2 shows that if anything, the opposite pattern emerged in the data, with the routine $7 condition exhibiting a larger decrease in the number of overall workouts per week than the flexible $3 condition. Table 4A Column 1 indicates that this difference is statistically significant: the routine $7 condition had a decrease in the number of overall workouts per week that was 0.14 larger than the one we observe in the flexible $3 condition (p<0.05). As Table 4B Column 1 reports, the routine $7 condition had a decrease in the likelihood of having at least one overall workout in a given week that was 5 percentage points larger than the decrease observed in the flexible $3 condition (p<0.05).
Figure 2. Comparing Exercise in the Routine $7 and Flexible $3 Conditions During the Intervention and Post-Intervention.
This graph focuses on two experimental conditions that induced similar numbers of overall workouts during the intervention period, but with different distributions of in-window versus out-of-window workouts. The routine $7 condition produced more in-window workouts and fewer out-of-window workouts than the flexible $3 condition.
Results Using an Instrumental Variables Framework.
We also implemented an instrumental variables strategy that predicts exercise activity in the post-intervention period using exercise activity during the intervention. Specifically, the outcome variable in these regressions is the total number of gym visits per week during the four weeks immediately following the intervention period, or an indicator for having at least one gym visit during a given week, again limiting the sample to the four weeks immediately following the intervention period. We also conducted versions of the instrumental variables analysis that focus only on in-window visits or only on out-of-window visits during the four weeks following the intervention period. The right-hand-side variables of interest are the number of in-window gym visits and the number of out-of-window gym visits per week during the intervention period. We instrument for these two variables using four treatment group indicators, omitting an indicator for the control group. Appendix Table 19 shows the regression results. An incremental in-window gym visit per week during the intervention period leads to 0.22 extra total gym visits per week and a 9 percentage point increase in the likelihood of visiting the gym at least once in a given week during the four weeks following the intervention period. An incremental out-of-window gym visit per week during the intervention period leads to 0.32 extra total gym visits per week and a 16 percentage point increase in the likelihood of visiting the gym at least once in a given week during the four weeks following the intervention period. The difference between these coefficients is statistically significant when the outcome variable is the indicator for visiting the gym at least once in a given week. These results bolster our main finding that the flexible conditions, which generated more out-of-window and fewer in-window gym visits during the intervention when compared to the routine conditions, generated more exercise activity during the four weeks following the intervention period.
Heterogeneity Analyses
We conducted a number of heterogeneity analyses to determine whether our results in Tables 2, 3, and 4 varied as a function of individual characteristics. We did not find statistically significant heterogeneity as a function of the time of day when a participant scheduled her workout window, whether a participant had the same workout window as her partner, or whether a participant was an above versus below median exerciser pre-intervention (based on the self-reported typical number of pre-intervention workouts completed weekly). We also did not find heterogeneity as a function of a participant’s gender, BMI (based on self-reported weight and height), or job function (which might be related to the degree of flexibility in a participant’s work schedule). We did find some statistically significant treatment effect heterogeneity by participant job level. However, further inspection revealed that this heterogeneity was driven by thinly populated job levels, and we therefore suspect the finding is attributable to multiple hypothesis testing.
In addition to searching for heterogeneous treatment effects (statistically significant differences in treatment effects across subgroups), we searched for subgroups for which there was even suggestive evidence that the routine conditions led to more gym visits during the 4-week post-intervention period than the flexible conditions. Such a difference within a subgroup would be in the opposite direction of our results for the full sample, and our model of habit formation indicates that such subgroups might exist. However, when we examined subgroups defined by the individual characteristics studied in the previous paragraph (e.g., the time of day when a participant scheduled her workout window), only one subgroup—participants whose job function was “general and administrative”—offered evidence suggesting that the routine conditions led to more gym visits than the flexible conditions. In this case, the difference was nearly zero and was not statistically significant. Thus, we found essentially no evidence for situations in which routine incentives were more impactful than flexible incentives, but we cannot rule out the possibility that such situations exist.
Longevity of Effects
We also examined the longevity of the treatment effects by repeating our analyses using data on participants’ gym visits during post-intervention weeks 5–10 and post-intervention weeks 11–40 (Appendix Tables 2–5 and Appendix Figures 4–5.). In addition, we tested the effect of having any incentives for exercise on the number of total gym visits per week and the likelihood of working out in a given week during post-intervention weeks 5–10 and post-intervention weeks 11–40 by regressing the total weekly workouts outcome variable and the indicator for having at least one workout in a given week on an indicator that takes the value 1 if a participant was randomly assigned to one of the incentive conditions and the value 0 if a participant was in the control group (Appendix Table 6).
As shown in Appendix Table 2, during post-intervention weeks 5–10, participants in the flexible conditions visited the gym 0.11 more times per week (p<0.01) and had a 6 percentage point higher likelihood of a gym visit in a given week (p<0.001) than participants in the routine conditions. However, as shown in Appendix Table 4, the differences in coefficients are no longer statistically significant during post-intervention weeks 11–40. Appendix Table 6 shows that participants in the four incentive conditions pooled, relative to participants in the control condition, were 7 percentage points more likely to make a gym visit in a given week during post-intervention weeks 5–10 (p<0.05). They were also a marginally statistically significant 5 percentage points more likely to make a gym visit in a given week during post-intervention weeks 11–40 (p<0.10). Appendix Figure 6 displays the mean number of overall gym visits during each week for the four incentive conditions pooled and for the control condition. The difference was not statistically significant for this outcome measure when pooling post-intervention weeks 5–10 (Appendix Table 6 column 1) or when pooling post-intervention weeks 11–40 (Appendix Table 6 column 3).
Robustness Checks
We conducted a number of robustness checks. We found that our key results were qualitatively similar when we (a) introduced additional control variables into our regressions, (b) limited the sample to Google’s largest office location, (c) only counted gym visits if a participant badged out at least 30 minutes after badging in, and (d) examined outcome variables capturing the minutes a participant spent at the gym per week. We also found that our key results were qualitatively similar when we limited the sample to participants who chose workout windows during the typical workday (that is, starting at 9:00am or later and ending at 5:00pm or earlier). One possible concern with our main results is that participants in the routine conditions may only appear to have exercised less frequently during the 4-week post-intervention period than participants in the flexible conditions because participants in the routine conditions developed the habit of exercising during their workout windows and then sustained the habit by exercising at home during their workout windows, leading to fewer observed visits to workplace gyms. Participants in the routine conditions who chose workout windows during the typical workday probably did not exercise at home during their workout windows, so the finding that the results were similar when we limited the sample to participants with workout windows during the workday suggests that this alternative explanation does not drive our main results. Finally, we examined self-reported data on exercise outside of Google gyms. We did not find evidence that patterns in exercise outside of Google gyms offset experimental treatment effects on the frequency of visits to Google gyms during the intervention period or on the frequency of visits to Google gyms during the four weeks following the intervention period. However, when we compare the routine $7 condition and the flexible $3 condition, we cannot rule out the possibility that the difference in the change in Google gym visits from the intervention period to the four weeks following the intervention period is offset by the difference in the change in exercise outside of Google gyms. For further details, see Appendix F.
4. DISCUSSION
A Model of the Tradeoff between Flexibility and Routinization
We present a simple model of habit formation that can help explain the patterns in our experimental data while also offering insight into the conditions under which flexible incentives may be more versus less effective than routine incentives at promoting habits.
Model Setup.
In the model, there is one agent who makes decisions in two periods. In each period, the agent faces two opportunities to take an action, the in-window opportunity and the out-of-window opportunity.17 For concreteness, the action might be visiting the gym, flossing one’s teeth, or giving feedback to employees. The agent can take the action at most once per period, so there are three possible decisions at during period t: taking an in-window action (at = I), an out-of-window action (at = 0), or no action (at = N). Period 1 is the intervention period, during which financial incentives for taking the action might be offered. Period 2 is the post-intervention period, during which financial incentives do not apply. An in-window or out-of-window action in period 1 forms a habit, increasing the utility of that same type of action in period 2.
We normalize the utility of not taking the action (N) to zero in both periods. Relative to not taking the action, taking an in-window action and taking an out-of-window action are each associated with an intrinsic utility, defined as the net money-metric utility benefit that the agent receives from taking the action at that time, without accounting for any benefit from receiving financial incentives. We can think of the intrinsic utility as representing the personal enjoyment that the agent derives from taking the action at that time minus the opportunity cost of not engaging in some other activity at that time, but the intrinsic utility can, of course, capture many additional factors.
The intrinsic utility of an in-window action and the intrinsic utility of an out-of-window action in a given period are random variables, and they are drawn from known distributions and observed by the agent at the beginning of that period. The intrinsic utility of an in-window action in period 1 is vin,1; the intrinsic utility of an out-of-window action in period 1 is vout,1; the intrinsic utility of an in-window action in period 2 is vin,2 + hin(a1); and the intrinsic utility of an out-of-window action in period 2 is vout,2 + hout(a1). We assume that vin,1, vout,1, vin,2 and vout,2 are independent random variables, with vin,1 and vin,2 drawn from the uniform distribution on , and with vout,1 and vout,2 drawn from the uniform distribution on . We impose the restriction to ensure that the supports of the distributions always contain both strictly positive and strictly negative values. The terms hin(a1) and hout(a1) represent habit formation. We assume that taking an action of a certain type (in-window versus out-of-window) in period 1 increases the intrinsic utility of taking an action of that same type in period 2 by ℎ, i.e., hin(I) = hout(O) = h and hin(O) = hin(N) = hout(I) = hout(N) = 0.18 We impose the restriction , again to ensure that the supports of the intrinsic utility distributions for in-window and out-of-window actions contain strictly negative values.
We compare the flexible incentive scheme and the routine incentive scheme to each other and to the control condition. Let iin denote the incentive payment that the agent receives for an in-window action in period 1, iout denote the incentive payment that the agent receives for an out-of-window action in period 1, and ion denote the incentive payment that the agent receives for not taking the action in period 1. The flexible incentive scheme offers the agent iin = iout =if and ino = 0. The routine incentive scheme offers the agent iin =ir and iout =ino =0. In the control condition, we have iin = iout = ino = 0. We impose the restriction to ensure that the intrinsic utility of an in-window or out-of-window action in period 1 plus the incentive that may be associated with such an action is sometimes strictly negative.
We assume that the agent is myopic in the sense that when he or she chooses an action in period 1, he or she does not consider the effect of habit formation on his or her expected utility in period 2.19 Thus, in period 1, the agent compares vin,1 + iin, vout,1 + iout, and zero, and he or she chooses the option corresponding to the greatest among these (I, O, or N, respectively). In period 2, the agent compares vin,1 + hin(a1), vout,2 + hout(a1), and zero, and he or she chooses the option corresponding to the greatest among these (I, O, or N, respectively). Appendix G provides a complete characterization of this model. Here, we discuss the key conclusions from the model.
Predictions for Period 1.
We first consider the model’s predictions regarding the effect of the incentive schemes on decisions during the intervention (period 1), holding the dollar amount of the incentive offers constant across the flexible and routine schemes (if = ir = i). Relative to the control condition, the flexible incentive scheme increases both the likelihood of an in-window action and the likelihood of an out-of-window action. The routine incentive scheme increases the likelihood of an in-window action by more than the flexible incentive scheme does, but it decreases the likelihood of an out-of-window action. On net, the flexible incentive scheme increases the likelihood of taking any action (in-window or out-of-window) by more than the routine incentive scheme does. Intuitively, the routine incentive scheme promotes the in-window action both (a) in certain cases where the agent, in the absence of incentives, would have chosen to take neither the in-window action nor the out-of-window action and (b) in certain cases where the agent, in the absence of incentives, would have chosen the out-of-window action. The former effect represents an increase in the likelihood of taking any action (in-window or out-of-window), but the latter effect merely represents a shift from an out-of-window action to an in-window action. The flexible incentive scheme, on the other hand, promotes the in-window action in certain cases where the agent, in the absence of incentives, would have chosen to take neither the in-window action nor the out-of-window action, and it also promotes the out-of-window action in certain cases where the agent, in the absence of incentives, would have chosen to take neither the in-window action nor the out-of-window action. The former effect corresponds to the first effect (a) of the routine incentive scheme, but the latter effect also represents an increase in the likelihood of taking any action (in-window or out-of-window), thereby accounting for the greater impact of the flexible incentive scheme relative to the routine incentive scheme on the likelihood of taking any action. These predictions from the model are borne out in the experimental data.
Predictions for Period 2, Holding Incentive Dollar Amounts Constant.
We now turn to the model’s predictions regarding the effect of the incentive schemes on decisions post-intervention (period 2), again holding the dollar amount of the incentive offers constant across the flexible and routine schemes (if = ir = i). We focus on the likelihood of taking any action (in-window or out-of-window) as the outcome of interest, as this outcome reveals the relevant tradeoffs associated with using flexible versus routine incentive schemes to create habits.
It is ambiguous whether the flexible incentive scheme or the routine incentive scheme will cause a larger increase in the likelihood of taking any action.20 The sign of the comparison depends on the values for the parameters min, mout, and i, but not ℎ. In Figure 3, we fix the value for the parameter i at 0.1 and display all combinations of values for min and mout that satisfy our parameter restrictions. The combinations of min and mout for which the flexible incentive scheme causes a larger increase in the likelihood of taking any action in period 2 than the routine incentive scheme are shaded in black, and the combinations for which the opposite is true are shaded in grey. The white areas denote combinations that do not satisfy our parameter restrictions.21
Figure 3. The Relative Effect of Routine and Flexible Incentives in Our Simple Model of Habit Formation.
The value of min varies along the vertical axis, and the value of mout varies along the horizontal axis. The value of i is fixed at 0.1. Routine incentives increase the likelihood of taking any action (in-window or out-of-window) post-intervention by more than flexible incentives in the grey region, while flexible incentives have a greater effect in the black region. In the white region, , so the incentive size is too large to be valid.
Figure 3 shows that the flexible incentive scheme causes a larger increase in the likelihood of taking any action in period 2 than the routine incentive scheme if mout is greater than min—the entire area below the 45-degree line is shaded black. However, min > mout is not sufficient for the opposite to be true. For most possible values of mout, min must be substantially higher than mout in order for the routine incentive scheme to cause a larger increase in the likelihood of taking any action.
To develop intuition for this pattern, first consider how an in-window action or an out-of-window action in period 1 changes the likelihood of taking any action in period 2. An in-window action in period 1 increases the likelihood of an in-window action in period 2 due to habit formation. If mout is low, the incremental in-window action in period 2 is likely to occur when the agent would not otherwise have taken any action, but if mout is high, the incremental in-window action in period 2 is likely to replace an out-of-window action that would otherwise have occurred. The former effect represents an increase in the likelihood of taking any action, while the latter does not. Symmetric statements hold for the effect of an out-of-window action in period 1 on the likelihood of an out-of-window action in period 2. An incremental in-window action in period 1 therefore increases the likelihood of taking any action in period 2 more than an incremental out-of-window action in period 1 if and only if min > mout.
Now consider the effect of the incentive schemes on the likelihood of an in-window or an out-of-window action in period 1. The flexible incentive scheme increases both the likelihood of an in-window action and the likelihood of an out-of-window action. The routine incentive scheme, on the other hand, increases the likelihood of an in-window action by more than the flexible incentive scheme does, while decreasing the likelihood of an out-of-window action. Thus, for the routine incentive scheme to cause a larger increase in the likelihood of taking any action in period 2 than the flexible incentive scheme, it must be the case that the effect of a period 1 in-window action on the likelihood of taking any action in period 2 is much greater than the effect of a period 1 out-of-window action. Based on the argument in the previous paragraph, satisfying this condition requires that min be substantially greater than mout. See Appendix G for further details.
In the experiment, the flexible scheme generated more post-intervention exercise than the routine scheme with the same dollar amount of incentives offered. Mapping the data to the model, the experimental setting was not a case in which min was substantially greater than mout.
Predictions for Period 2, Holding Period 1 Activity Constant.
Instead of holding the dollar amount of the incentive offers constant across the flexible and routine schemes, we can hold constant the likelihood of taking any action in period 1 and compare the likelihood of taking any action in period 2 for the flexible and routine incentive schemes. This comparison requires the dollar amount of the routine incentive offer to be larger than the dollar amount of the flexible incentive offer. In this case, the likelihood of taking any action in period 2 is greater for the flexible incentive scheme than for the routine incentive scheme if and only if min < mout. Intuitively, relative to the flexible incentive scheme, the routine incentive scheme simply increases the likelihood of an in-window action in period 1 while decreasing the likelihood of an out-of-window action in period 1 by the same amount. The effect of the routine incentive scheme relative to the flexible incentive scheme on the likelihood of taking any action in period 2 therefore hinges on whether a period 1 in-window action or a period 1 out-of-window action exerts a stronger influence on the likelihood of taking any action in period 2. By the argument given in the previous subsection, this comparison is driven by the relative sizes of min and mout.
In the experiment, the gym visit frequencies in the flexible $3 and routine $7 experimental conditions were approximately equal during the intervention. Table 4 and Figure 2 indicate that, if anything, the routine $7 condition exhibited a larger decrease in gym visit frequency post-intervention than the flexible $3 condition. Mapping this finding to the model, the experimental results suggest that min is slightly less than mout.
Interpretation and Implications of the Model.
The model offers guidance as to the types of activities for which flexible incentives versus routine incentives might be more effective for promoting habits. For some activities, there are regularly occurring opportunities for taking action that are frequently the most convenient or the most rewarding. For example, many individuals find it convenient to floss their teeth right before going to sleep for the night. When we apply the model to such activities, we would use parameters such that min > mout, and we would predict that routine incentives would be more effective than flexible incentives for promoting habits. For other activities, the best opportunity for taking action occurs on an irregular schedule and under inconsistent circumstances. For example, the best opportunity for a manager to give developmental feedback to an employee may be when there is a temporary drop in the team’s workload, but such drops may be the result of unpredictable decreases in the number of requests from clients. When we apply the model to these types of activities, we would use parameters such that min < mout, and we would predict that flexible incentives would be more effective than routine incentives for promoting habits.
If we focus on a specific activity for which habit formation is desirable, the model also offers guidance as to the types of decision-making environments in which flexible incentives versus routine incentives might be more effective for promoting habits. Consider the case of promoting exercise habits in our experiment. Experimental participants were required to select a daily two-hour window to be the in-window gym visit opportunity, so the intrinsic utility of the in-window action is interpreted as the intrinsic utility of visiting the gym during that window, while the intrinsic utility of the out-of-window action is interpreted as the intrinsic utility of visiting the gym at whichever time outside that window is most desirable. It is plausible to anticipate min > mout in some environments and to anticipate min < mout in other environments. If an individual’s day-to-day schedule is predictable and stable, there may be a two-hour window that is very frequently the best time to visit the gym, suggesting that min > mout. On the other hand, if an individual’s schedule varies significantly from one day to the next, the two-hour window that is most frequently the best time to visit the gym may still quite regularly be inferior to another time on a given day (it is simply a different time each day that is superior), suggesting that min < mout. This latter description seems to apply to the participants in our experiment, whose workplace environment is dynamic and fast-paced. In the model, the routine incentive scheme causes a larger increase in the likelihood of taking any action in period 2 than the flexible incentive scheme only when min is substantially greater than mout, so the model implies that the routine scheme is better for habit formation than the flexible scheme when an individual’s schedule is predictable and stable. This implication is important for managers and policy makers. Incentives for routines may be most impactful when they are applied in predictable and stable environments or when they are accompanied by a restructuring of the environment that creates opportune moments for taking action on a regular basis.
It would be possible to extend the model in several ways to illuminate other factors that may influence the comparison between flexible and routine incentive schemes. First, the interpretation of the in-window opportunity as a small window of time and the out-of-window opportunity as the most desirable among many alternative windows of time suggests that the parameter governing habit strength should be higher for the in-window action than for the out-of-window action, i.e., . Such an assumption would reflect past research findings that successful habits are built on stable cues. This assumption would increase the effect of the routine incentive scheme on the likelihood of taking any action in period 2, but provided that is not too much greater than , the model’s implications based on the relative sizes of min and mout would not change.
Second, the model could be extended by endogenizing the decision of which opportunity to label the in-window opportunity. If an individual is uncertain as to which opportunity is most frequently the best for taking the action, the flexible incentive scheme encourages more exploration than the routine incentive scheme and may therefore be more likely to help the individual discover a regular time that is particularly good for taking the action. Such a discovery may lead to more consistent post-intervention engagement in the desired behavior. See Larcom, Rauch, & Willems (2017) for evidence that forcing individuals to experiment with different routines can lead them to switch to more beneficial routines.22
Third, it would be natural to extend the model to endogenize the length of the time window associated with the in-window opportunity. An interesting tradeoff arises in this extension. On one hand, increasing the length of the time window associated with the in-window opportunity increases min because a longer time window creates more opportunities for a high realization of the intrinsic utility of the in-window action. An increase in min increases the effectiveness of the routine incentive scheme relative to the flexible incentive scheme. On the other hand, if we allow the parameter governing habit strength to be higher for the in-window action than for the out-of-window action (i.e., ), increasing the length of the time window associated with the in-window opportunity is likely to decrease because the in-window action becomes less strongly connected to a narrowly defined routine. A decrease in decreases the effectiveness of the routine incentive scheme relative to the flexible incentive scheme.
Finally, it would be interesting to extend the model to consider different types of habit formation. For example, in the context of our experiment, if an individual is unable to visit the gym during the preselected workout window because of a scheduling conflict, having the commitment to figure out another time to go to the gym may be a habit-forming activity. The flexible incentive scheme encourages this behavior by rewarding out-of-window gym visits, so the flexible scheme may have the advantage that it promotes the resilience to find an alternative time to exercise in the face of scheduling conflicts.
While our experiment is not designed to disentangle the exact mechanisms by which the flexible and routine incentive schemes exert influence on post-intervention exercise, the results offer an important lesson for managers and policy makers who wish to help individuals form beneficial habits. Despite research indicating that successful habits are often characterized by engagement in a behavior under routine conditions, interventions designed to take advantage of this pattern face countervailing forces that may render them ineffective. The model provides insight into the types of activities and decision-making environments for which flexible versus routine incentive schemes are likely to be more impactful.
Limitations
In spite of its scale and scope, our study has a number of important limitations. First, we cannot perfectly measure participant exercise. In particular, we did not directly observe participants’ exercise habits outside of Google’s gyms, and some visits to Google gyms were unobserved because participants failed to badge in. We asked participants about these issues in our exit survey (see the subsection “Robustness Checks”) and found that their responses generally did not undermine our main conclusions, but the self-reported information may not be reliable. Furthermore, a potential concern is that participants might collude with their workout partners or others in order to game the incentive system, for example by bringing another employee’s identification badge to the gym and recording a gym visit for that employee even when that employee did not visit the gym. Such behavior is unlikely to have occurred, however, as employees use their identification badges many times during the day and must keep them on hand in order to access each of the many different physical spaces within a Google campus.23
Second, our empirical results would likely have been different if we had made different decisions regarding the details of our experimental design. For example, all participants in the experiment, including those in the control condition, were asked to select a two-hour workout window that applied to every weekday. The routine incentive schemes may have been more effective if the window were longer or shorter, if the window were allowed to vary across days, if the window could be adjusted according to work schedules or exercise class schedules,24 or if the window could be adjusted after participants had a chance to learn about the exercise times that worked best for them. Our experiment also did not attempt to “piggyback” exercise habits on top of existing routines, which may have been more effective (Judah, Gardner, & Aunger, 2013), although it is not clear that existing routines could have been practicably harnessed for this purpose. Instead of using “piggybacking,” our experiment involved sending an email reminder associated with each of the workout windows to cue exercise behavior. When many of these reminders failed to trigger a gym visit, participants may have felt discouraged, undermining the habit-forming potential of the intervention (though notably, our treatment conditions did produce lasting behavior change relative to our control condition).25
Furthermore, the intervention period only lasted for four weeks. Although this length of time matches the intervention duration in several previous experiments studying exercise habits (Acland & Levy, 2015; Charness & Gneezy, 2009; Royer, Stehr, & Sydnor, 2015), a longer intervention may be necessary to establish in-window exercise routines. In addition, all participants in our experiment were paired with a workout partner, and the effects of the intervention might have differed had participants instead signed up alone. For instance, while there was no obligation to coordinate with the workout partner, participants may have nonetheless tried to coordinate, perhaps in ways that made gym visits less convenient and thereby undermined the formation of in-window exercise routines.26 All of these experimental details are practical design considerations that a manager or policy maker who is seeking to promote exercise habits among employees or other populations must confront, so it would be valuable for future research to explore adjustments to these design features.
A third limitation of our study is that it was conducted at a single company (Google) with an employee population that is not representative of the U.S. workforce. Google has workplace gyms, and our findings might have differed if we had conducted the study at a gym that was not located at participants’ place of work. While we found no evidence of heterogeneous treatment effects by job function, the impact of routine versus flexible incentives might be different in organizations that structure work more or less flexibly. In general, employees at Google have higher levels of education and higher incomes than the average U.S. worker. Perhaps routine incentives did not generate persistent exercise habits because these high achievers had already established exercise routines prior to participating in the experiment.
Finally, the finding that routine incentives generated weaker exercise habits than flexible incentives might have been driven by participants’ inferences regarding informational signals sent by the employer that were embedded in the design of the intervention. The informed consent form for our study explained that the experiment was conducted by outside academic researchers and that individual-level data collected during the course of the experiment would not be shared with Google, but participants might also have exhibited experimenter demand effects based on their understanding of the researchers’ desired outcomes. Whether because of perceptions of the employer’s desired outcomes or because of perceptions of the researchers’ desired outcomes, perhaps participants in the routine conditions responded by visiting the gym during their workout windows even when doing so was highly inconvenient, undermining the success of habit formation. We view this possibility as a legitimate component of the routine conditions for judging their efficacy. After all, the routine conditions were intended to increase in-window gym visits in some situations where those gym visits would not have occurred in the absence of the intervention. The interpretation of the results is slightly different, but the results still speak to the likely effects of similar employer-sponsored programs to promote exercise routines, as the introduction of any such programs may be accompanied by changes in perceptions regarding the employer’s desired outcomes.27
5. CONCLUSION
In a large field experiment, we found that routine incentives, which offered monetary rewards for visiting the gym during a two-hour window, generated more gym visits during that window but fewer gym visits overall than flexible incentives, which offered monetary rewards for visiting the gym at any time. After the incentives were no longer offered, the participants who had received routine incentives exhibited less exercise activity than the participants who had received flexible incentives, consistent with past research showing that more repetition creates stronger habits. Our more important and novel contribution to the literature on habits focused on comparing participants who received large routine incentives ($7 per qualifying gym visit) and participants who received small flexible incentives ($3 per qualifying gym visit). These two groups visited the gym at a similar frequency during our intervention, but those in the routine group visited the gym at more consistent times. Comparing these two groups, we surprisingly find that participants who received large routine incentives subsequently exhibited larger post-intervention decreases in exercise. Thus, despite past research suggesting that repeatedly rewarding beneficial behaviors under routine conditions might promote more lasting habits than repeatedly rewarding such behaviors on a flexible schedule, we find evidence for the other side of the tradeoff: an incentive program that promotes rigid routines can be counterproductive to habit formation. Our simple model of habit formation suggests that routine incentives are unlikely to be more effective than flexible incentives in dynamic, fast-paced work environments, but may be more successful in stable environments, where they can reinforce the development of routines that are less prone to disruption.
Our study raises a number of important questions for future research. We examined flexible incentive schemes and routine incentive schemes, but there may be a middle ground that is more effective than either of these options. For instance, an incentive scheme that pays participants for all workouts but pays more for in-window workouts might help individuals build an exercise routine while still encouraging participants who miss their workout window to exercise at another time. It would be valuable to explore this possibility further. In addition, we defined routines at a daily (rather than weekly or monthly) interval and defined workout windows as two-hour periods. Altering some of these definitions might have yielded different results. Finally, a routine incentive may be more or less effective in a social context than in an individual context. Workout partners who must stay on the same schedule to earn incentives may provide extra support and accountability to each other and make workouts more enjoyable, thereby making routines more persistent than they would otherwise be. On the other hand, a workout partner’s failure to exercise may also license an individual to skip a scheduled gym visit, so a social routine could be less persistent. Future research should examine this issue and related questions to identify effective approaches for promoting long-term habit formation.
Supplementary Material
Acknowledgments
We thank Quoc Dang Hung Ho, Andrew Joung, David Mao, Predrag Pandiloski, Byron Perpetua, and Kartikeya Vira for outstanding research assistance. We also thank Google for integral support on the design and execution of this experiment. We acknowledge financial support from the National Institutes of Health (grants P01AG005842, P30AG034532, and P30AG034546), the Marketing Science Institute (grant 4-1916), Harvard Business School, the Wharton School, and Google. Finally, we are grateful for outstanding feedback from Susanna Gallani, Matthew Levy, Yan Chen (Department Editor), an Associate Editor, and two anonymous reviewers, from seminar audiences at Carnegie Mellon University, Columbia University, Cornell University, Harvard University, Northwestern University, Stanford University, University of Michigan, University of Pennsylvania, and Yale University, and from participants in the Behavioral Decision Research in Management Conference, the Advances in the Science of Habits Conference, the Center for Health Incentives and Behavioral Economics Conference, the Society for Judgment and Decision Making Conference, and the Behavior Change for Good Conference. This study is pre-registered at ClinicalTrials.gov (NCT02346799) and is registered at the AEA RCT Registry (AEARCTR-0003471). This research was approved by the IRBs of Harvard University, the National Bureau of Economic Research, and the University of Pennsylvania.
Footnotes
For further references, see Milkman, Minson, & Volpp, 2014; Patel et al., 2016; Sen et al., 2014; Staats et al., 2017; and Thaler & Benartzi, 2004.
Indeed, in a survey of 69 psychology professors at top 40 universities as ranked by U.S. News and World Report (2016), 77% of respondents predicted that an individual who was induced to exercise at a regular time of day over the course of a month would form a more persistent habit than an individual who was induced to exercise the same amount over the course of a month but not necessarily at a regular time of day. However, this evidence is only suggestive because the survey asked about a hypothetical scenario that did not reflect all of the features of our field experiment. See Appendix A for details.
We use the term “intrinsic utility” to denote utility from all sources other than financial incentive payments. We are not attempting to draw a connection to the distinction between intrinsic and extrinsic motivation.
It may sound puzzling at first for the likelihood of an in-window action to be less than the likelihood of an out-of-window action. We interpret the opportunity for an in-window action as a narrow window of time (e.g., two hours during a day), while the opportunity for an out-of-window action encompasses several such windows of time (e.g., all other hours during the day). Thus, the in-window opportunity may be the preferred time to take the action more frequently than any other narrow window, but the likelihood of an in-window action may nonetheless be less than the likelihood of taking the action in any of several alternative windows.
Employees were given a list of 96 two-hour time windows (one window starting every 15 minutes) and were told to select one. They were encouraged to discuss this time window with their work group to confirm that exercising during the time window would not be disruptive to their work.
In our initial survey, 117 individuals had missing observations for their window selections. For the purposes of our stratified randomization procedure, we created a stratifying variable that was an indicator for perfect overlap with partner’s window, coded as 1 if workout partners had perfectly matching (identical) selections of workout windows r if they both had a missing selection in the initial survey. We use this variable when constructing the strata fixed effects that serve as control variables in our regression analyses. However, we manually inputted workout windows for 116 of the 117 individuals after the randomization procedure. Workout windows including these updates are summarized in Table 2 and are used to determine whether a given gym visit is an in-window or out-of-window visit. Our regressions control for an indicator for missing workout window, so the one individual with a missing window is effectively excluded from the analysis.
At the request of our corporate partner, we also included four questions about overall well-being. Prior to the initiation of data collection, our research team committed to exclude these questions from our eventual analysis, asthey were not variables of interest to our team.
We performed our power calculations using the online tool available at http://www.sample-size.net/means-effect-sizeclustered/. This calculator accounts for the effect of intracluster correlation on statistical power. Prior to collecting data, we assumed an intracluster correlation of 0.05 (a typical assumption) and a mean outcome of 1 gym visit per week, which gave us 80% power to detect a 33% difference in weekly exercise between the control group and the flexible $3 payment group and an 18.5% difference between treatment conditions. When we updated our power calculations post-experiment using the observed intracluster correlation in our sample of 0.26, the observed outcome standard deviation in the control group of 1.45, and the actual sample sizes, we determined that the detectable effect sizes in our study were 44% and 27%, respectively.
Although participants were told that they would be required to finish both steps of the gym registration process (online and in-person registration) to be included in the study, randomization occurred as long as both partners had completed the online registration process. The rationale behind this decision was that upon first visiting the gym after online registration, participants would be automatically prompted to complete in-person registration, thus ensuring we would be able to track all gym visits. Of the 2,508 participants who were randomized to experimental conditions, 1,111 had not yet completed the in-person registration process by the date of their randomization (704 for the first randomization wave, 375 for the second, and 32 for the third). Participants who had not completed in-person registration received multiple reminder emails encouraging them to do so as soon as possible (see Appendix B, Figure B10).
At the bottom of the daily reminder emails, participants were given links that would allow them to unsubscribe from the email and text message reminders.
Note that to earn incentive payments for workouts, participants were required to badge out of the gym at least 30 minutes after badging in, so we use a more inclusive definition of a gym visit in our analysis than in our rewards scheme. We believe that the inclusive definition of a gym visit better reflects an individual’s exercise behavior. However, Appendix Tables 13–15 show that the results are similar if we use the less inclusive definition, which only counts a gym visit as having occurred if we see a study participant badge out of the gym at least 30 minutes after badging in.
This occurs in 2.56% of our weekly observations. Although our decision to code variables in this way means that our statistical results regarding in-window and out-of-window gym visits do not “add up” to our statistical results regarding total gym visits, we believe that our variable definitions provide the best representation of the experimental results. When a participant has multiple employee identification badge swipes at the gym on the same day, the amount of time between swipes is less than two hours in the majority of cases, suggesting that two adjacent swipes are in fact associated with the same gym visit (perhaps with a break outside the gym in the middle of the visit). Thus, if a participant has both an in-window badge swipe and an out-of-window badge swipe on the same day, we record one gym visit when counting total gym visits, but because the gym visit straddles the exercise window boundary, we record one in-window gym visit when counting in-window gym visits and one out-of-window gym visit when counting out-of-window gym visits.
The mean self-reported typical number of workouts per week is more than double the mean number of observed gym visits per week in the control group during the incentive period (we do not have data on badging in and badging out at the gym prior to the incentive period). Perhaps individuals have inflated perceptions of their own workout frequency or are reporting their ideal workout frequency. It is also possible that their responses incorporate workouts that do not take place at the gym.
We conducted 10,000 simulations in which we randomly assigned individuals to pairs, holding fixed each individual’s chosen workout window. Across the simulations, the mean fraction of pairs with exactly overlapping workout windows was 4%, and the range from the 2.5th percentile to the 97.5th percentile of the distribution of the fraction across simulations did not contain the observed fraction using the real pairings.
It is interesting to note that the frequency of exercise declines from week to week during the post-intervention period even in the control group. The experiment is not designed to explain this pattern, but perhaps the decline is due to a Hawthorne effect fading away.
Participants in the control condition did not have a statistically significantly different mean number of post-intervention in-window gym visits compared to participants in the flexible conditions or participants in the routine conditions. They were marginally significantly less likely to have at least one in-window gym visit in a given week post-intervention (p<0.10).
The decision of which opportunity to label the in-window opportunity and which opportunity to label the out-of-window opportunity is outside the model. When mapping the model to the experiment, we think of the in-window opportunity as the two-hour window that the agent expects to be the best window for visiting the gym, and we think of the out-of-window opportunity as the best opportunity among all other windows. After presenting the baseline version of the model, we discuss an extension that endogenizes the determination of the in-window and out-of-window opportunities.
Instead of assuming that hin(I) = hout(O) = h, we could have assumed that . It would also be possible to assume that taking an action of one type (in-window or out-of-window) has a habit-forming effect on subsequently taking an action of the opposite type (out-of-window or in-window, respectively), i.e., . We decided not to pursue these approaches because they would add complexity to the model and would yield only incremental insights. If we were to make these alternative assumptions with not too much greater than and h′ not too much greater than zero, we would draw qualitatively similar conclusions from the model.
We have also analyzed the model with a sophisticated agent, who anticipates the impact of his or her period 1 action on his or her expected utility in period 2. The results are qualitatively similar.
For all parameter values that we consider, the flexible incentive scheme causes an increase in the likelihood of taking any action in period 2, relative to the control group. For certain parameter values, the routine incentive scheme causes a decrease in the likelihood of taking any action. See Appendix G.
In Appendix G, we recreate Figure 3 but also show analogous figures with the value for the parameter i changed to 0.05 or 0.2. Together, these three figures demonstrate that varying the value for the parameter i only slightly changes the comparison between the flexible and routine incentive schemes.
To explore this possibility in the data from our experiment, we first identify the weekdays on which a given participant had an out-of-window gym visit during the 4-week post-intervention period. For each participant, we then calculate the fraction of those days that featured an out-of-window gym visit that could be matched to an intervention-period gym visit by the same participant meeting two criteria: (1) occurred on the same day of the week and (2) had the same starting time of day, plus or minus 15 minutes. Among participants who had at least one out-of-window gym visit during the 4-week post-intervention period, the mean of the fraction in the flexible conditions was 34.7%, which was larger than the 26.3% mean fraction in the routine conditions (p<0.001) but not statistically significantly different from the 31.6% mean fraction in the control condition. The differences across conditions are similar if we use a window of plus or minus 5 minutes or a window of plus or minus 30 minutes around the starting time of day.
We also empirically examine whether this form of collusion might have occurred. For each participant, we calculate the fraction of intervention-period gym visits that started within five minutes of a gym visit by the workout partner. If collusion between workout partners were frequent, we would expect this fraction to be higher in the experimental conditions that make such collusion more financially beneficial. However, the mean of this fraction does not significantly vary across experimental conditions in an F-test of joint equality.
We do not have data on work schedules or exercise class schedules.
Even among participants in the routine $7 experimental condition, who had the most in-window gym visits during the intervention period, 69% of weekdays during the intervention were not associated with an in-window gym visit.
To explore this possibility empirically, we separate each pair of participants into the member with more in-window gym visits and the member with fewer in-window gym visits during the intervention period. The individuals in the first category are less likely to have made inconvenient schedule adjustments to coordinate with their partners. However, when we conduct the analysis in Tables 2–4 using only this subset of the sample, the results are similar. Separating each participant pair based on the fraction of intervention-period gym visits that were in-window also delivers similar results. These patterns do not support the hypothesis that our main results are driven by participants’ decisions to coordinate with their partners at the expense of convenience, but we do not rule out the hypothesis because the empirical tests are imperfect.
Another concern is that experimenter demand effects might have been particularly strong during the first week of the intervention period, and data from that first week might be driving the results. However, when we conduct the same analysis as in Table 2 but drop data from the first week of the intervention period, the results are similar.
Contributor Information
John Beshears, Harvard Business School.
Hae Nim Lee, The Wharton School.
Katherine L. Milkman, The Wharton School
Robert Mislavsky, Carey Business School.
Jessica Wisdom, Google.
REFERENCES
- Acland D, & Levy MR (2015). Naiveté, Projection Bias, and Habit Formation in Gym Attendance. Management Science, 61(1), 146–160. [Google Scholar]
- Becker GS, & Murphy KM (1988). A Theory of Rational Addiction. Journal of Political Economy, 96(4), 675–700. [Google Scholar]
- Benartzi S, Beshears J, Milkman KL, Sunstein CR, Thaler RH, Shankar M, Tucker W, Congdon WJ, & Galing S (2017). Should Governments Invest More in Nudging? Psychological Science, 28(8), 1041–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beshears J, Choi JJ, Laibson D, & Madrian BC (2013). Simplification and Saving. Journal of Economic Behavior & Organization, 95, 130–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks TL, Leventhal H, Wolf MS, O’Conor R, Morillo J, Martynenko M, Wisnivesky JP, & Federman AD (2014). Strategies Used by Older Adults with Asthma for Adherence to Inhaled Corticosteroids. Journal of General Internal Medicine, 29(11), 1506–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carels RA, Burmeister JM, Koball AM, Oehlhof MW, Hinman N, LeRoy M, Bannon E, Ashrafioun L, Storfer-Isser A, Darby LA, & Gumble A (2014). A Randomized Trial Comparing Two Approaches to Weight Loss: Differences in Weight Loss Maintenance. Journal of Health Psychology, 19(2), 296–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carels RA, Young KM, Koball A, Gumble A, Darby LA, Oehlhof MW, Wott CB, & Hinman N (2011). Transforming Your Life: An Environmental Modification Approach to Weight Loss. Journal of Health Psychology, 16(3), 430–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charness G, & Gneezy U (2009). Incentives to Exercise. Econometrica, 77(3), 909–931. [Google Scholar]
- Gertler P, Heckman J, Pinto R, Zanolini A, Vermeersch C, Walker S, Chang SM, & Grantham-McGregor S (2014). Labor Market Returns to an Early Childhood Stimulation Intervention in Jamaica. Science, 344(6187), 998–1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hussam R, Rabbani A, Reggiani G, & Rigol N (2017). Habit Formation and Rational Addiction: A Field Experiment in Handwashing. Harvard Business School Working Paper. [Google Scholar]
- Johnson EJ, & Goldstein D (2003). Do Defaults Save Lives? Science, 302(5649), 1338–1339. [DOI] [PubMed] [Google Scholar]
- Jones D, Molitor D, & Reif J (2018). What Do Workplace Wellness Programs Do? Evidence from the Illinois Workplace Wellness Study. NBER Working Paper 24229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Judah G, Gardner B, & Aunger R (2013). Forming a Flossing Habit: An Exploratory Study of the Psychological Determinants of Habit Formation. British Journal of Health Psychology, 18(2), 338–353. [DOI] [PubMed] [Google Scholar]
- Kuh GD, Kinzie JL, Buckley JA, Bridges BK, & Hayek JC (2006). What Matters to Student Success: A Review of the Literature. Vol. 8. Washington, DC: National Postsecondary Education Cooperative. [Google Scholar]
- Lally P, Chipperfield A, & Wardle J (2008). Healthy Habits: Efficacy of Simple Advice on Weight Control Based on a Habit-formation Model. International Journal of Obesity, 32(4), 700–707. [DOI] [PubMed] [Google Scholar]
- Larcom S, Rauch F, & Willems T (2017). The Benefits of Forced Experimentation: Striking Evidence from the London Underground Network. Quarterly Journal of Economics, 132(4), 2019–2055. [Google Scholar]
- Larrick RP, & Soll JB (2008). The MPG Illusion. Science, 320(5883), 1593–1594. [DOI] [PubMed] [Google Scholar]
- Loewenstein G, Price J, & Volpp K (2016). Habit Formation in Children: Evidence from Incentives for Healthy Eating. Journal of Health Economics, 45, 47–54. [DOI] [PubMed] [Google Scholar]
- Madrian BC, & Shea DF (2001). The Power of Suggestion: Inertia in 401(k) Participation and Savings Behavior. The Quarterly Journal of Economics, 116(4), 1149–1187. [Google Scholar]
- Mattke S, Schnyer C, & Van Busum KR (2013). A Review of the U.S. Workplace Wellness Market. RAND Health Quarterly, 2(4), 7. [PMC free article] [PubMed] [Google Scholar]
- Milkman KL, Minson JA, & Volpp KGM (2014). Holding the Hunger Games Hostage at the Gym: An Evaluation of Temptation Bundling. Management Science, 60(2), 283–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mokdad AH, Marks JS, Stroup DF, & Gerberding JL (2004). Actual Causes of Death in the United States, 2000. JAMA, 291(10), 1238–1245. [DOI] [PubMed] [Google Scholar]
- Neal DT, Wood W, Wu M, & Kurlander D (2011). The Pull of the Past: When Do Habits Persist Despite Conflict With Motives? Personality and Social Psychology Bulletin, 37(11), 1428–1437. [DOI] [PubMed] [Google Scholar]
- Patel MS, Asch DA, Rosin R, Small DS, Bellamy SL, Heuer J, Sproat S, Hyson C, Haff N, Lee SM, Wesby L, Hoffer K, Shuttleworth D, Taylor DH, Hilbert V, Zhu J, Yang L, Wang X, & Volpp KG (2016). Framing Financial Incentives to Increase Physical Activity among Overweight and Obese Adults: A Randomized, Controlled Trial. Annals of Internal Medicine, 164(6), 385–394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Royer H, Stehr M, & Sydnor J (2015). Incentives, Commitments, and Habit Formation in Exercise: Evidence from a Field Experiment with Workers at a Fortune-500 Company . American Economic Journal: Applied Economics, 7(3), 51–84. [Google Scholar]
- Schroeder SA (2007). We Can Do Better — Improving the Health of the American People. New England Journal of Medicine, 357(12), 1221–1228. [DOI] [PubMed] [Google Scholar]
- Sen AP, Sewell TB, Riley EB, Stearman B, Bellamy SL, Hu MF, Tao Y, Zhu J, Park JD, Loewenstein G, Asch DA, & Volpp KG (2014). Financial Incentives for Home-Based Health Monitoring: A Randomized Controlled Trial. Journal of General Internal Medicine, 29(5), 770–777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staats BR, Dai H, Hofmann D, & Milkman KL (2017). Motivating Process Compliance Through Individual Electronic Monitoring: An Empirical Examination of Hand Hygiene in Healthcare. Management Science, 63(5), 1563–1585. [Google Scholar]
- Tappe K, Tarves E, Oltarzewski J, & Frum D (2013). Habit Formation among Regular Exercisers at Fitness Centers: An Exploratory Study. Journal of Physical Activity and Health, 10(4), 607–613. [DOI] [PubMed] [Google Scholar]
- Thaler RH, & Benartzi S (2004). Save More Tomorrow: Using Behavioral Economics to Increase Employee Saving. Journal of Political Economy, 112(S1), S164–S187. [Google Scholar]
- Thaler RH, & Sunstein C (2008). Nudge: Improving Decisions about Health, Wealth, and Happiness. New Haven, CT: Yale University Press. [Google Scholar]
- U.S. News & World Report (2016). Retrieved 20 May, 2017, from https://www.usnews.com/.
- Wood W, & Neal DT (2016). Healthy through Habit: Interventions for Initiating & Maintaining Health Behavior Change . Behavioral Science & Policy, 2(1), 71–83. [Google Scholar]
- Wood W, & Rünger D (2016). Psychology of Habit. Annual Review of Psychology, 67, 289–314. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.