Abstract
Objective:
HabitWorks (HW) is a personalized, transdiagnostic, Smartphone-based interpretation bias intervention for depression and anxiety that has demonstrated feasibility and acceptability in prior pilot studies. This preregistered randomized controlled trial (https://doi.org/10.17605/OSF.IO/EJ89T; Silverman et al., 2025) tested the effectiveness of HW compared to a symptom tracking (ST) control condition.
Method:
A nonclinical community sample of U.S. adults with at least mild anxiety and depression symptoms (N=340; Mage=33.04 years, 57.4% women, 60.6% White, 14.7% Black, 11.2% Asian, 10.9% Multiracial, 22.1% Hispanic/Latine) were randomly assigned to complete three weekly interpretation bias exercises and once weekly depression and anxiety symptom surveys (HW), or three weekly depression and anxiety symptom surveys (ST) for four weeks.
Results:
A priori benchmarks for retention, adherence and satisfaction were achieved: 77.8% of HW participants were still using the app in Week 4, 43.7% achieved perfect adherence, and app usability and acceptability were rated positively. As hypothesized, HW was superior to ST at improving negative and benign interpretation biases (Word-Sentence Association Paradigm) and functional impairment (Work and Social Adjustment Scale); and HW (vs. ST) participants reported significantly greater global improvement (Clinical Global Impressions Scale-Improvement Self-Report) and subjective engagement (Twente Engagement with eHealth Technologies Scale) at post-intervention. Unexpectedly, while depression and anxiety symptoms (Patient Health Questionnaire-8 and Generalized Anxiety Disorder Scale-7) improved significantly, these changes were not unique to HW.
Conclusions:
HW is an engaging and scalable intervention that may be effective for improving overall severity and functioning. Further validation of effectiveness for specific symptom domains is needed.
Keywords: interpretation bias, anxiety, depression, Smartphone-based, mobile app
Depression and anxiety symptoms are highly prevalent, affecting an estimated 21.4% and 18.2% of U.S. adults at a given time (Terlizzi & Zablotsky, 2024). Yet, these disorders are vastly undertreated, with less than a quarter of U.S. adults receiving minimally adequate treatment (Alonso et al., 2018; Moitra et al., 2022). To reduce this treatment gap, consistent with the National Institute of Mental Health’s experimental therapeutics approach, researchers have taken an interest in identifying evidence-based mechanisms of change that can be directly targeted through brief digital mental health interventions (DMHIs), which can then be scaled up and offered to many people at once. One such intervention is HabitWorks, a personalized, transdiagnostic, Smartphone-based intervention that targets interpretation bias, or the tendency to resolve ambiguous situations in a negative or threatening manner (Hirsch et al., 2016). This study tests the effectiveness of HabitWorks in a demographically representative sample of U.S. adults with anxiety and depression symptoms.
Interpretation Bias Interventions
Theories of depression and anxiety disorders propose that interpretation bias maintains a vicious cycle in which a person experiences the world as more threatening, which heightens negative affect, increases behavioral avoidance, and leads to more biased cognition (Beck & Clark 1997; Mathews & MacLeod, 2005). In support of these theories, a robust body of research indicates that interpretation bias plays a key role in the development and maintenance of depression and anxiety disorders (Hirsch et al., 2016; Vos et al., 2025). Cognitive models also suggest that by reducing the tendency to interpret ambiguous situations as negative or threatening, depression and anxiety symptoms will improve (MacLeod & Mathews, 2012). Consistent with these models, cognitive behavioral therapy (CBT) aims to teach people to modify distorted interpretations of ambiguity (e.g., jumping to conclusions) via cognitive restructuring. With CBT, a full therapy session might be used to reframe a person’s negative interpretation of a single ambiguous situation. Thus, while effective, CBT can be time-intensive and inaccessible due to common treatment barriers (e.g., time, cost, provider shortages).
In contrast, interpretation bias modification interventions directly target interpretation bias via exercises that offer quick, repeated practice resolving ambiguous material in a benign way (e.g., 50 ambiguous situations per 5-minute session) via computer or Smartphone application (Hallion & Ruscio, 2011; Jones & Sharpe, 2017). Interpretation bias interventions are effective in shifting interpretation bias with reliable effect sizes (e.g., Hallion & Ruscio, 2011; Jones & Sharpe, 2017). However, improvements in interpretation bias (the theorized mechanism of change and proximal outcome) do not always translate to significant improvements in psychological symptoms (e.g., depression symptoms; Fodor et al., 2020; Jones & Sharpe, 2017; anxiety symptoms; Daniel et al., 2020; Salemink et al., 2022). Further, only a small number of studies have evaluated the real-world effectiveness of interpretation bias interventions (i.e., see research on the web-based MindTrails intervention; Eberle et al., 2024; Ji et al., 2021; Larrazabal et al., 2024); and effects tend to be larger when these interventions are administered in the laboratory compared to when self-administered in real-world (non-laboratory) contexts (Cristea et al., 2015; Hallion & Ruscio, 2011). Thus, while these interventions show promise as brief, targeted, and scalable approaches, the mixed evidence for symptom improvements (particularly for disorders other than anxiety) and limited number of trials conducted in real-world contexts underscores the need for further research into their real-world effectiveness (Vrijsen et al., 2024).
There are many potential reasons why interpretation bias interventions may not always demonstrate clinical utility across all users and contexts when tested outside of tightly controlled laboratory conditions. First, completion rates are low when self-guided interpretation bias interventions are tested in real-world contexts (e.g., 13.5% to 42.2%; Eberle et al., 2024; Ji et al., 2021; Larrazabal et al., 2024). This parallels low rates of completion for other self-guided DMHIs for depression and anxiety (see Fleming et al., 2018). As a result, people who begin interpretation bias interventions may not use the intervention sufficiently (i.e., as intended) to experience therapeutic benefits (Menne-Lothmann et al., 2014). Suboptimal usage may occur because some users find interpretation bias exercises repetitive (Beard et al., 2012; Livermon et al., 2025) and thus may be less immersed while using the intervention or lose interest in continuing to use the intervention (Perski et al., 2017). Adding content and features to interpretation bias interventions to promote objective engagement (i.e., usage) and subjective engagement (i.e., usefulness, satisfaction, and ease of use)—such as the ability to track progress, varied content, and fun elements—may help to improve user experience, perceived effectiveness, and motivation to complete the intervention (Perski et al., 2017; Salemink et al., 2022).
Moreover, ambiguous scenario content is not always relevant to users’ targets of concern, which may contribute to worse engagement and outcomes (Beard et al., 2019; Daniel et al., 2025). To this end, research suggests that the ability to personalize DMHI content is associated with better engagement and outcomes (Borghouts et al., 2021; Perski et al., 2017). Consistent with this, during pilot testing of an early version of HabitWorks, participants commented that ambiguous scenario content should be personalized to ensure that the situations have meaning and interest to each person (Beard et al., 2019). Further, one recent meta-analysis found that the relationship between interpretation bias and anxiety symptoms was stronger when the stimuli used to assess interpretation bias were matched to people’s specific anxiety concerns (e.g., social situations, bodily symptoms; Würtz et al., 2023). Accordingly, allowing users to personalize ambiguous scenario content to their specific areas of concern may enhance engagement and outcomes for interpretation bias interventions by focusing on personally relevant situations where interpretation bias is especially problematic or influential. Further, involving people with lived experience in design decisions can enhance DMHI engagement and user experiences, which may help to improve their overall effectiveness (Livermon et al., 2025; McCurdie et al., 2012). Reflecting this, a recent expert opinion from the Association for Cognitive Bias Modification called for researchers to incorporate human-centered design principles and personalized approaches into the design of interpretation bias interventions (Vrijsen et al., 2024).
HabitWorks: A Personalized, Smartphone-Based Interpretation Bias Intervention
In response to these identified areas for improvement, HabitWorks is a Smartphone-based, personalized, transdiagnostic, interpretation bias intervention that was iteratively developed and refined with input from an advisory board of people with lived experience of anxiety and depression, as well as clinicians and experts in the field (the development process is described in detail elsewhere; see Beard et al., 2021). HabitWorks directly targets interpretation bias using the Word Sentence Association Paradigm (WSAP; see Gonsalves et al., 2019 for review). The WSAP presents an ambiguous situation (e.g., “Your boss wants to meet with you”) paired with a word representing either a negative (e.g., “criticize”), neutral (e.g., “appointment”), or positive (e.g., “praise”) interpretation. After reading the sentence and paired word, the person presses a button to indicate whether they think the word is related to the sentence. The task presents corrective feedback (“Correct!) if the person endorses a neutral or positive interpretation, or rejects a negative interpretation, which in turn is theorized to reinforce a more adaptive interpretation style and ultimately improve psychological symptoms.
HabitWorks delivers WSAP interpretation bias exercises alongside features that are expected to enhance engagement, such as personally scheduled notifications, performance feedback, level progression, mood monitoring, and a diary feature. In addition, while most interpretation bias interventions to date have used the same ambiguous scenario content for all users, the HabitWorks app enables users to personalize ambiguous scenario content based on life circumstances, worry domains, and demographic variables, which enhances the app’s personal relevance. Further, the interpretation bias exercises in the app are brief and game-like, which differentiates HabitWorks from previous interpretation bias intervention trials that have used longer scenarios that require more reading (e.g., Eberle et al., 2024; Ji et al., 2021; Larrazabal et al., 2024). HabitWorks has established preliminary feasibility, acceptability, and evidence of improvements in interpretation bias, and anxiety and depression symptoms in pilot open trials among acute psychiatric patients (Beard et al., 2021), Black and Hispanic adults with anxiety or depression symptoms (Ferguson et al., 2024), and anxious parents (Beard et al., 2022). Taken together, these results offer support for evaluating the effectiveness of HabitWorks in a larger randomized controlled trial (RCT).
Overview of Present Study and Hypotheses
The present RCT evaluated the effectiveness of the HabitWorks app in a community sample of adults with at least mild anxiety and/or depression symptoms. Participants were randomized to one of two conditions: (1) the interpretation bias intervention condition in which participants were asked to complete three interpretation bias exercises in the HabitWorks app per week for four weeks (HabitWorks; HW); or (2) a credible control condition in which participants were asked to complete three online anxiety and depression symptom surveys per week for four weeks (Symptom Tracking; ST). After the four-week intervention period, participants in both conditions were given the option to use the HW app as desired. Both conditions completed assessment measures at baseline, every week for 4 weeks, and at one-month follow-up.
All hypotheses were preregistered at Open Science Framework before analyses were undertaken (https://doi.org/10.17605/OSF.IO/EJ89T; Silverman et al., 2025). First, we hypothesized that the HW condition would meet the following benchmark targets: (a) ≥ 25% of participants would still be using the HW app in the fourth week (app retention); (b) ≥ 25% of participants would complete the post-intervention assessment (study retention); (c) ≥ 25% of participants would use the HW app as intended (i.e., completing three HW exercises per week for four weeks; app adherence); (d) the mean across all Exit Questionnaire items would be ≥ 5, indicating at least minimal intervention acceptability; and (e) the average score on the System Usability Scale would be ≥ 60, indicating at least acceptable perceived app usability. Benchmarks were selected based on prior HW studies (Beard et al., 2021; Beard et al., 2022; Ferguson et al., 2024), and clinical judgment based on what would be clinically useful for a low-intensity smartphone-based intervention. Second, we hypothesized that participants randomized to receive HW (vs. ST) would report greater improvements in interpretation bias (indicating target engagement) and clinical outcomes (primary: anxiety and depression symptoms; secondary: functional impairment; clinical global improvement). Third, we hypothesized that HW (vs. ST) participants would report significantly greater subjective engagement with their respective intervention. We also conducted exploratory analyses to evaluate sustainment of clinical outcomes at one-month follow-up, and to compare ST participants who opted in (vs. opted out) of using the HW app during the one-month follow-up period. To our knowledge, this is the first RCT of a personalized, Smartphone-based interpretation bias intervention in a representative sample of U.S. adults.
Method
Design and Participants
Study procedures were approved by the Institutional Review Board at the researchers’ institution. Participants were recruited between March 13, 2024, and February 11, 2025, from study fliers posted in the community, on social media, and on the affiliated healthcare system’s web-based platform that advertises research studies to patients. Interested individuals completed an online eligibility screener via REDCap (Research Electronic Data Capture; Harris et al., 2009), a secure, web-based application designed to capture data for research studies. Eligible participants were automatically invited to provide written informed consent (electronically) and complete a baseline assessment in REDCap within 14 days of screening before re-screening was required. Following completion of the baseline assessment, participants were randomized to either the HW or ST condition according to a 50:50 allocation ratio by a research assistant who was not masked to condition assignment using a random number generator in Excel. Randomized participants were then provided with an information sheet and instructional videos for their assigned condition via REDCap. Participants were asked to complete either three HW interpretation bias exercises per week, or three ST surveys per week, for four weeks.
Participants were eligible to participate if they met the following criteria: (a) valid U.S. mailing address; (b) age ≥ 18 at the time of consent (age 19 in Nebraska and age 21 in Puerto Rico, based on the age of consent); (c) total score ≥ 3 on the Patient Health Questionnaire-2 (Kroenke et al., 2003) or Generalized Anxiety Disorder-2 (Spitzer et al., 2006), indicating at least mild depression and/or anxiety symptoms; (d) ability to read English to provide informed consent and complete the intervention; and (e) access to an Android or Apple Smartphone to complete the intervention. Individuals were excluded if they self-reported a diagnosis of bipolar disorder or schizophrenia, or active symptoms of mania or psychosis.
Figure 1 presents the CONSORT Flow Diagram. Of 694 people screened, 402 were eligible and offered participation. Of these, 349 provided informed consent, completed the baseline assessment, and were randomized. Of these, 340 received their allocated intervention (HW: n=167; ST: n=173), and form the intent-to-treat (ITT) sample. 187 participants completed the intervention protocol (HW: n=73; ST: n=114) and form the completer sample. Table 1 presents demographic and baseline clinical characteristics for the ITT sample.
Figure 1.

CONSORT Flow Diagram
Table 1.
Demographic and Clinical Characteristics for the Intent-to-Treat Sample
| Characteristic | HabitWorks (n=167) | Symptom Tracking (n=173) |
|---|---|---|
|
| ||
| Age: M (SD) | 32.98 (12.37) | 33.10 (13.61) |
| Gender: n (%) | ||
| Woman | 90 (53.9) | 105 (60.7) |
| Man | 59 (35.3) | 54 (31.2) |
| Another gender (e.g., trans, nonbinary) | 16 (9.6) | 11 (6.4) |
| Prefer not to answer | 2 (1.2) | 3 (1.7) |
| Sex assigned at birth: n (%) | ||
| Female | 103 (61.7) | 115 (66.5) |
| Male | 64 (38.3) | 57 (32.9) |
| Unknown | 0 (0.0) | 1 (0.6) |
| Race: n (%) | ||
| White | 107 (64.1) | 99 (57.2) |
| Black | 20 (12.0) | 30 (17.3) |
| Asian | 15 (9.0) | 23 (13.3) |
| American Indian or Alaskan Native | 2 (1.2) | 1 (0.6) |
| Multiracial | 21 (12.5) | 16 (9.3) |
| Unknown | 2 (1.2) | 4 (2.3) |
| Hispanic/Latine ethnicity: n (%) | 38 (22.8) | 37 (21.4) |
| Education: n (%) | ||
| High school graduate or less | 19 (11.4) | 13 (7.5) |
| Some college | 39 (23.4) | 41 (23.7) |
| Four-year college graduate | 59 (35.3) | 63 (36.4) |
| Post-college education | 49 (29.3) | 56 (32.4) |
| Unknown | 1 (0.6) | 0 (0.0) |
| Sexual orientation: n (%) | ||
| Heterosexual | 106 (63.5) | 103 (59.5) |
| Gay/lesbian | 15 (9.0) | 21 (12.1) |
| Bisexual | 17 (10.2) | 21 (12.1) |
| Another sexual orientation (e.g. queer, asexual) | 23 (13.7) | 23 (13.3) |
| Unknown | 6 (3.6) | 5 (3.0) |
| Income: n (%) | ||
| Less than $50,000 | 99 (59.3) | 96 (55.5) |
| $50,000 – $75,000 | 21 (12.5) | 30 (17.3) |
| $75,000 – $100,000 | 18 (10.8) | 18 (10.4) |
| $100,000 – $200,000 | 13 (7.8) | 15 (8.7) |
| Greater than $200,000 | 5 (3.0) | 4 (2.3) |
| Unknown | 11 (6.6) | 10 (5.8) |
|
| ||
| Baseline Depression Symptoms: M (SD) | 10.25 (5.08) | 10.08 (5.21) |
| Baseline Anxiety Symptoms: M (SD) | 9.46 (4.96) | 9.46 (5.02) |
| Baseline Functional Impairment: M (SD) | 20.19 (8.65) | 19.13 (8.73) |
HabitWorks
The HW app development process and specific features are described in detail elsewhere (Beard et al., 2021). In brief, the HW app includes instructional videos that share the app’s rationale and how to complete WSAP exercises; personally scheduled Smartphone notifications to complete WSAP exercises three times per week; symptom tracking surveys at the end of each week and also available to be completed as desired; performance feedback; and a “habit diary” that prompts users to write about their progress in a free-response diary entry. Participants were asked to complete three WSAP exercises per week at their own pace for four weeks. Bonus WSAP exercises were available to be completed as desired. WSAP exercises present participants with relevant ambiguous situations from a pool of 745 potential word-sentence pairs based on their responses to a personalization checklist. Specifically, participants select the types of situations they worry about from a list of concerns (e.g., social interactions, perfectionism, heart racing), as well as relevant demographic characteristics (e.g., employment, romantic relationship status, children; see Beard et al., 2021 for complete checklists). A personalized subset of word-sentence pairs is then generated to match the person’s list of situational and demographic responses. For example, if the person reports worrying about social interactions and finances, and indicates that they have children, a custom subset of word-sentence pairs is generated to include all word-sentence pairs with combinations of these concerns and demographics. Each time the person completes WSAP exercises going forward, word-sentence pairs are randomly chosen from the personalized subset. Each WSAP exercise takes approximately five minutes and consists of either 50 trials (for scheduled WSAP exercises) or 30 trials (for user-initiated bonus exercises). All sessions, regardless of trial number, consist of 50% negative interpretations and 50% benign (neutral or positive) interpretations. Following each trial, participants are provided with immediate feedback (i.e., “Correct!” or “Let’s try another!”) to reinforce benign interpretations.
Symptom Tracking
Similar to the HW condition, ST participants first watched videos describing the treatment rationale (i.e., “Symptom Tracking is designed to help you practice self-monitoring depression and anxiety symptoms”). During the four-week intervention period, ST participants were emailed a link to complete anxiety and depression symptom measures in REDCap three times per week. An email reminder was sent after 24 hours if the survey had not yet been completed. Survey links automatically expired after 48 hours. ST participants also received an email at the end of each week detailing the number of surveys they completed, and summary of their symptom severity (i.e., minimal, mild, moderate, severe) for each completed survey. After four weeks, ST participants were invited to use the HW app as much as desired during the one-month follow-up period, which n=111 ST participants chose to do.
Assessment Schedule
Both conditions completed online assessments of target engagement and clinical outcomes in REDCap at baseline, post-intervention (i.e., after four weeks) and one-month follow-up. Participants received a reminder phone call from a research assistant, as well as up to six reminder emails (sent every two days) to complete REDCap assessments. In addition to the assessment protocol, participants in both conditions completed anxiety and depression symptom surveys (i.e., symptom tracking) as part of their assigned intervention condition during the intervention period. Specifically, HW participants were asked to complete symptom tracking surveys at the end of Weeks 1 through 4 (i.e., days 7, 14, 21, 28); and ST participants were asked to complete three symptom tracking surveys per week, corresponding with the beginning, middle, and end of Weeks 1 through 4.1 Participants in both conditions received an email at the end of each week summarizing their number of completed symptom tracking surveys and, for ST participants, their symptom severity (whereas HW participants could see their scores in the app after completing the surveys). Those who did not complete any surveys that week received a check-in email listing common challenges and solutions to participation (e.g., low motivation, technological issues). Participants received $20 for cellular data plus an additional $10 for completing the baseline, post-intervention, and one-month follow-up assessments, for up to $50 total. Participants were not compensated for using the HW app or completing symptom tracking.
Measures
HabitWorks Retention, Adherence, and Satisfaction Benchmarks
Retention and Adherence.
Retention was calculated based on whether or not HW participants: (a) used any of the HW app features (e.g., WSAP exercises, habit diaries, user-initiated symptom surveys) during Week 4 of the intervention period (HW app retention), and (b) completed the post-intervention assessment (study retention). Consistent with previous HW studies (Beard et al., 2022; Ferguson et al., 2024), adherence was defined based on whether or not HW participants completed at least three interpretation bias exercises per week during the intervention period. App use metrics were automatically recorded in the app and saved on a secure server.
Acceptability.
The Exit Questionnaire is a 5-item self-report measure of intervention acceptability used in prior HW studies (Beard et al., 2021; Beard et al., 2022; Ferguson et al., 2024). The Exit Questionnaire was administered at post-intervention to HW participants only. Participants were asked to rate their satisfaction, perceived helpfulness, relevance, user-friendliness of HW, and likelihood of recommending HW using a 7-point scale from 1 (completely disagree) to 7 (completely agree). The mean across the five items was calculated, with higher scores indicating greater intervention acceptability. Internal consistency was good in the current sample (α and 95% CI=0.87 [0.84, 0.91]).
Usability.
HW participants completed the System Usability Scale (SUS; Brooke, 1996) at post-intervention. The SUS is a 10-item self-report measure that assesses how usable a person found a given technology (i.e., the HW app) on a 5-point scale from 1 (strongly disagree) to 5 (strongly agree). A standard SUS score was calculated from the raw items following existing scoring recommendations (Sauro and Lewis, 2016; see S1.1 for details), with higher scores indicating greater usability. Technologies scoring greater than 68 are classified as having above average usability (Sauro and Lewis, 2016). The SUS has demonstrated excellent reliability across a variety of products and interfaces (αs at 0.90 and above; see Sauro and Lewis, 2016 for review; present sample α=0.94 [0.93, 0.95]).
Target Engagement
Participants completed an assessment version of the WSAP (Beard & Amir, 2008; 2009), a commonly used measure of interpretation bias, at baseline, post-intervention, and one-month follow-up. Word-sentence pairs representing benign or negative interpretations are presented, and participants decide whether they are related to an ambiguous situation. In the assessment version of the task, no feedback is given about the accuracy of participants’ responses and stimuli are not personalized (i.e., all participants see the same set of 50 word-sentence pairs). The task records participants’ responses (yes/no) for each trial type (negative vs. benign) to allow for separate calculations of the percentage of negative and benign interpretations endorsed. Responses were coded as accurate when participants endorsed “yes-related” to benign trials and “no-not related” to negative trials. Extreme outliers for benign and negative scores were identified using median absolute deviation (Leys et al., 2013), resulting in the removal of 4 (0.4%) benign accuracy scores across all measured timepoints (see S1.2). Benign and negative WSAP scores have demonstrated good internal consistency in adults with depression and anxiety (αs from 0.83 to 0.85; see Gonsalves et al., 2019 for review), as well as acceptable split-half reliability across ethnoracial groups (Ferguson et al., 2025). In the current sample, baseline internal consistency for negative WSAP scores was acceptable across conditions (HW: α=0.79 [0.74, 0.83]; ST: α=0.77 [0.72, 0.82]), and baseline internal consistency for benign WSAP scores was poor across conditions (HW: α=0.51 [0.39, 0.61]; ST: α=0.55 [0.45, 0.64]).
Primary Clinical Outcomes
Anxiety and Depression Symptoms.
The Generalized Anxiety Disorder-7 (GAD-7; Spitzer et al., 2006) is a seven-item self-report questionnaire that assesses anxiety symptoms, and the Patient Health Questionnaire-8 (PHQ-8 Kroenke et al., 2009) is an eight-item version of the PHQ-9 (Kroenke et al., 2001) that assesses depression symptoms, but omits the item assessing suicidal ideation. The PHQ-8 has been found to be similar to the PHQ-9 for assessing depression symptom severity and potential diagnosis (Kroenke et al., 2009; Wu et al., 2021) and is commonly used in fully remote research studies when it is not possible to closely monitor and follow-up on suicide risk (Wu et al., 2021). Both the GAD-7 and PHQ-8 use a 4-point scale from 0 (not at all) to 3 (nearly every day). A total score is calculated, with higher scores indicating greater symptom severity. The GAD-7 and PHQ-8 were administered at baseline, post-intervention, and one-month follow-up. Both conditions also completed these measures as part of their intervention condition (at the end of each week and as desired for HW participants; three times per week for ST participants). To allow for more robust modeling of change over time between conditions, scores at baseline, the ends of Weeks 1 through 3, post-intervention, and one-month follow-up were used for analyses. Other DMHI trials with adults have found good internal consistency for the GAD-7 (ɑs of 0.85 to 0.90; Silverman et al., 2025; Terides et al., 2018) and PHQ-8 (α=0.87 and ω=0.88; Kuhn et al., 2017; Lorenzo-Luaces & Howard, 2023). In the current sample, baseline internal consistency was good across conditions: for the GAD-7, α=0.87 [0.84, 0.90] in both conditions; for the PHQ-8, α=0.84 [0.80, 0.87] in HW and α=0.85 [0.81, 0.88] in ST.
Secondary Clinical Outcomes
Functional Impairment.
The Work and Social Adjustment Scale (WSAS; Mundt, 2002) is a five-item self-report questionnaire that assesses a person’s impairment and experiential impact across five domains (work, social life, home life, private life, and close relations) using a nine-point scale from 0 (no impairment at all) to 8 (very severe impairment). Participants completed the WSAS at baseline, post-intervention, and one-month follow-up. A total score was calculated, with higher scores indicating greater functional impairment. Other DMHI studies conducted among adults with anxiety or depression symptoms have found good internal consistency for this score (ɑs from 0.87 to 0.90; Butler et al., 2015; Silverman et al., 2025). Internal consistency for the HW and ST conditions was good at baseline (αs of 0.85 [0.81, 0.89] and 0.86 [0.82, 0.89]).
Clinical Global Improvement.
The Clinical Global Impressions Scale-Improvement Self-Report (CGIS; Guy, 1976) is a single-item measure that assesses respondents’ impressions of their overall improvement on a scale from 1 (very much improved) to 7 (very much worse). Both conditions completed the CGIS at post-intervention and one-month follow-up. This score has demonstrated good concurrent validity and sensitivity to change in adults with symptoms of anxiety and depression (Zaider et al., 2003).
Subjective Engagement
Both conditions completed the Twente Engagement with eHealth Technologies Scale (TWEETS; Kelders et al., 2020) at post-intervention. The TWEETS is a nine-item self-report measure that assesses cognitive, affective, and behavioral engagement with eHealth technologies using a five-point scale from 0 (strongly disagree) to 4 (strongly agree). The TWEETS allows for adaptation to the studied technology by adding the technology, goal, and behavior related to the goal to the items. This was implemented for the present study as “HabitWorks” or “Symptom Tracking”, “to develop healthier mental habits” and “to get more insight into my jumping to conclusions.” The mean was calculated, with higher scores indicating greater subjective engagement. This score demonstrated good internal consistency in a recent DMHI effectiveness study among adults with anxiety and depression (ɑs ranging from 0.88 to 0.92; Silverman et al., 2025). In the present sample, internal consistency was good for the HW condition and excellent for the ST condition (αs of 0.88 [0.86, 0.91] and 0.92 [0.91, 0.94]).
Data Analysis.
Power analyses were conducted using Power Analysis and Sample Size Software (NCSS Statistical Software, 2024). All other analyses were conducted in R (Version 4.4.0; R Core Team, 2024). All significance tests were two-tailed, and the alpha level was .05.
Power Analyses
Assuming 80% power and alpha of .05, during the intervention period, we could expect to detect a mean difference of 2 points (d=0.37) in primary and secondary outcomes, respectively, with 232 (116 per condition) and 290 (145 per condition) participants (see S1.3 for details). Fifty additional participants enrolled before recruitment procedures closed.
Retention, Adherence and Satisfaction Benchmarks (Hypothesis 1)
Descriptive statistics were calculated and compared to a priori benchmarks.
Between-Condition Effect of Time for the Intervention Period (Hypothesis 2)
Missing Data.
The data presented a missing data pattern at the scale level due to participant dropout or noncompliance with surveys. The proportions of scale-level missing data across the five longitudinal outcomes ranged from 10.5% to 20.7% (see Table S2 for the number of missing observations for each outcome). To identify measured variables other than time that may be related to this pattern, we evaluated whether any demographic variables were associated with missingness (see S1.4 for results). Ethnicity was the only significant predictor. As a result, multiple imputation was used to account for missing data, and ethnicity was included as an auxiliary variable in the multiple imputation model to correct for any systematic bias resulting from this variables’ relationship with missingness. Following Grund et al. (2018), a joint multivariate linear mixed model was used to impute missing scale scores for the five incomplete outcome variables (i.e., interpretation bias accuracy for benign trials, interpretation bias accuracy for negative trials, anxiety symptom severity, depression symptom severity, and functional impairment; see S1.5 for additional details).
Multilevel Modeling.
We conducted multilevel models in R using the lme4 package (Bates et al., 2025) to test our hypothesis regarding changes in benign interpretation bias, negative interpretation bias, anxiety symptoms, depression symptoms, and functional impairment during the intervention period, assuming a linear trajectory from baseline to post-intervention. The mitml R package was used to pool results following Rubin’s rules (Grund et al., 2023). In each model, we simultaneously entered fixed effects for condition, time (coded as 0 for Baseline, 1 for end of Week 1, 2 for end of Week 2, 3 for end of Week 3, and 4 for post-intervention at the end of Week 4), and the condition × time interaction, and random effects for intercept and time (see S1.6 for formula). Because the WSAP and WSAS were not assessed during Weeks 1 through 3, imputed data at these time points was removed before analyses of interpretation bias and functional impairment. Further, a random effect for time was not included because these measures were only assessed at two timepoints during the intervention period. In all models, condition (HW vs. ST) was dummy-coded, with ST serving as the reference group. In addition, we assessed the simple effect of time at the two condition levels, using separate models with a fixed effect for time and random effects for intercept and time, to understand the overall change over time in outcomes during the intervention period, regardless of the interactions’ significance. We did this instead of testing the main effect of time across the two conditions because main effects are misleading when interactions are significant (Maxwell et al., 2018).
All multilevel models were conducted with the ITT sample (reported below) and completer sample (secondary; reported in Table S3). Additionally, sensitivity analyses were conducted to examine change over time in depression and anxiety within and between conditions when all depression and anxiety measurement occasions were included (i.e., after including the surveys completed by the ST condition at the beginning and middle of each week; see S2.1 for analysis plan and Table S4 for results).
Effect Sizes.
The size of condition differences for model-estimated means at post-intervention between the HW and ST conditions was computed as growth-modeling analysis d (GMA d; Feingold, 2019), which has the same metric as Cohen’s d. We computed time-specific GMA d and the 95% CI of GMA d (adapted from Feingold, 2019) at post-intervention to describe condition differences at the end of the intervention period. We also computed within-condition GMA ds and 95% CIs at post-intervention to describe changes in outcomes within each condition.
Clinical Significance.
Clinical significance was assessed using Percentage of Maximum Possible (POMP) scores, which enable comparison of effect sizes across measures (Cohen et al., 1999). POMP scores were calculated by dividing the model estimated mean difference between conditions from pre- to post-intervention (observed score) by the maximum total score for the measure (maximum score) and multiplying this value by 100 to yield a percentage.
Clinical Global Improvement (Hypothesis 2) and Subjective Engagement (Hypotheses 3)
To examine the effect of condition on clinical global improvement and subjective engagement, we tested independent samples t-tests with condition (ST vs. HW) entered as the predictor variable and clinical global improvement (CGIS scores) and subjective engagement (TWEETS mean scores) entered into separate models as the dependent variable.
Exploratory Analyses
Analyses of clinical outcomes were repeated to evaluate whether outcomes were sustained within- and between-conditions at one-month follow-up. ST participants who used the HW app during the follow-up period (n=111 of 173; 64.2%) were excluded from these analyses. These analyses were considered exploratory due to reduced power given the smaller sample and to potentially biased estimates given nonrandom exclusion of participants from the ITT sample. See S2.2 and Table S5 for results. Separately, based on a reviewer suggestion, and to better understand clinical context for future real-world implementation, exploratory analyses were conducted to examine differential change in outcomes from baseline to post-intervention, and from post-intervention to one-month follow-up, between ST participants who chose to use the HW app during the follow-up period (n=111) and ST participants who did not (n=62). See S2.3 and Table S6 for results.
Transparency and Openness
The trial’s design, primary outcomes and hypotheses, study measures, and analysis plan were preregistered prior to data collection (https://doi.org/10.17605/OSF.IO/7WX9D; Beard et al., 2024) and registered at ClinicalTrials.gov (Identifier: NCT07025486). Hypotheses for secondary outcomes were added and the analysis plan was revised during data cleaning (after data collection but before data analysis; https://doi.org/10.17605/OSF.IO/EJ89T; Silverman et al., 2025). Deviations from the preregistered analysis plan are described in S3 of the supplement. Video scripts for both interventions, data and analysis code are available through Open Science Framework (https://osf.io/x3ysh/overview). A detailed description of the intervention features is available (Beard et al., 2021). WSAP information and materials can be accessed at https://www.courtneybeardphd.com/wsap. We report how we determined our sample size, all data exclusions, all manipulations, and all study measures (see S1.7 for details on two measures not analyzed for the present study).
Results
Retention, Adherence, and Satisfaction Benchmarks (Hypothesis 1)
The a priori benchmarks for HW app retention (≥ 25% of HW participants still using HW in the fourth week), study retention (≥ 25% of HW participants completing the post-intervention assessment), and app adherence (≥ 25% of HW participants completing at least three HW exercises per week for four weeks) were achieved. Specifically, 77.8% (n=130) of HW participants were still using HW in Week 4, 84.4% (n=141) completed the post-intervention assessment, and 43.7% (n=73) completed the HW intervention protocol as intended. The a priori benchmark for acceptability (average rating ≥ 5 across Exit Questionnaire items) was also achieved (M=5.5, SD=1.1), as was the benchmark for usability (SUS score ≥ 60), with HW participants reporting an average usability score of 69.91 (SD=26.19).
Between-Condition Effect of Time for the Intervention Period (Hypothesis 2)
ITT results for multilevel models testing condition × time interaction effects during the intervention period are reported in Table 2. Completer analyses revealed similar results (for full completer results, see Table S3).
Table 2.
Multilevel Models for Fixed Condition × Time Interaction Effects and Simple Effects of Time in Each Condition During the Intervention Period for the Intent-to-Treat Sample
| Outcome | Fixed Effect | b (SE) | df | t-value | p | d | 95% CI for d |
|---|---|---|---|---|---|---|---|
| Interpretation Bias Accuracy for Benign Trials (WSAP) | Condition × Time | 0.15 (0.02) | 181.90 | 6.74 | <.001 | 1.12 | [0.80, 1.45] |
| TimeHW | 0.15 (0.02) | 107.70 | 9.08 | <.001 | 1.17 | [0.92, 1.43] | |
| TimeST | 0.00 (0.01) | 112.59 | 0.07 | .943 | 0.01 | [−0.20, 0.22] | |
| Interpretation Bias Accuracy for Negative Trials (WSAP) | Condition × Time | 0.33 (0.03) | 243.08 | 13.00 | <.001 | 1.67 | [1.42, 1.93] |
| TimeHW | 0.38 (0.02) | 135.29 | 18.49 | <.001 | 1.84 | [1.64, 2.03] | |
| TimeST | 0.04 (0.02) | 127.21 | 2.64 | .009 | 0.22 | [0.05, 0.38] | |
| Depression Symptoms (PHQ-8) | Condition × Time | −0.19 (0.11) | 575.08 | −1.67 | .094 | −0.14 | [−0.31, 0.03] |
| TimeHW | −0.59 (0.08) | 287.08 | −7.30 | <.001 | −0.46 | [−0.59, −0.34] | |
| TimeST | −0.40 (0.07) | 524.35 | −5.42 | <.001 | −0.31 | [−0.42, −0.20] | |
| Anxiety Symptoms (GAD-7) | Condition × Time | −0.19 (0.11) | 477.59 | −1.69 | .093 | −0.15 | [−0.35, 0.04] |
| TimeHW | −0.55 (0.08) | 207.14 | −6.84 | <.001 | −0.45 | [−0.57, −0.32] | |
| TimeST | −0.37 (0.08) | 431.55 | −4.79 | <.001 | −0.29 | [−0.41, −0.17] | |
| Functional Impairment (WSAS) | Condition × Time | −2.05 (0.91) | 194.86 | −2.27 | .024 | −0.24 | [−0.44, −0.03] |
| TimeHW | −3.14 (0.67) | 114.47 | −4.69 | <.001 | −0.36 | [−0.52, −0.21] | |
| TimeST | −1.09 (0.60) | 118.29 | −1.81 | .074 | −0.13 | [−0.26, 0.01] |
Note. Separate models were fit for the analyses of between-condition interactions and within-condition effects of time. HW = HabitWorks; ST = Symptom Tracking; WSAP = Word-Sentence Association Paradigm; PHQ-8 = Patient Health Questionnaire-8; GAD-7 = Generalized Anxiety Disorder-7; WSAS = Work and Social Adjustment Scale.
As hypothesized, during the intervention period, HW (vs. ST) ITT participants demonstrated significantly greater improvement in interpretation bias accuracy for benign trials (b=0.15, p<.001, d and 95% CI=1.12[0.80, 1.45]), as well as interpretation bias accuracy for negative trials (b=0.33, p<.001, d and 95% CI=1.67[1.42, 1.93]; Figure 2). In terms of POMP scores, HW improved more than ST on benign accuracy scores by an average of 14.6% on a 0–100 scale, and on negative accuracy scores by an average of 33.4% on a 0–100 scale. Further, as hypothesized, HW (vs. ST) ITT participants demonstrated significantly more improvement in functional impairment (b=−2.05, p=.024, d and 95% CI=−0.24[−0.44, −0.03]; Figure 2). Reflected as POMP scores, HW improved by an average of 5.25% more than ST on functional impairment (equal to improving 2.1 points more on the total score ranging from 0 to 40).
Figure 2. Estimated Longitudinal Outcome Means for Intent-to-Treat Sample.

Note. Means (± 1 SE) estimated from the linear multilevel models of between-group effects with Symptom Tracking as the reference group are shown. Estimates were computed from each imputed dataset and then pooled following Rubin’s rules. ST=Symptom Tracking; HW=HabitWorks.
However, contrary to hypotheses, no significant slope differences emerged between the ST and HW conditions for depression symptoms (b=−0.19, p=.094), nor anxiety symptoms (b=−0.19, p=.093; see Figure 2), and between-condition effect sizes were negligible (ds and 95% CIs of −0.14[−0.31, 0.03] and −0.15[−0.35, 0.04]). Instead, ITT participants in both HW and ST improved significantly (ps< .001) during the intervention period in depression (bs=−0.59 and −0.40, ds=−0.46 and −0.31) and anxiety (bs=−0.55 and −0.37, ds=−0.45 and −0.29). In terms of POMP scores, HW improved more than ST on depression symptoms by an average of 3.1% (0.74 more points on total score ranging from 0 to 24), and on anxiety symptoms by an average of 3.6% (0.76 more points on total score ranging from 0 to 21). Sensitivity analyses of depression and anxiety outcomes (with all measurement occasions included in the models) revealed a similar pattern of results between and within conditions (see Table S4).
Clinical Global Improvement (Hypothesis 3) and Subjective Engagement (Hypotheses 4)
As hypothesized, HW participants reported significantly greater clinical global improvement at post-intervention (M=3.05, SD=0.83) compared to ST participants (M=3.44, SD=0.98), t(288.57)=−3.66, p<.001, d and 95% CI=0.43[0.21, 0.64]. Further, HW participants reported significantly greater subjective engagement (M=2.74, SD=0.65) compared to ST participants (M=2.53, SD=0.84), t(282.7)=2.41, p=.017, d and 95% CI =0.28[0.07, 0.49].
Discussion
This study is the first, to our knowledge, to examine the real-world effectiveness of a personalized, transdiagnostic Smartphone-based interpretation bias intervention for depression and anxiety among U.S. adults. The HW app met a priori benchmarks for retention, adherence, and satisfaction. Further, as hypothesized, HW (vs. ST) participants reported significantly greater improvements in interpretation bias and functional impairment during the intervention period, as well as greater clinical global improvement and subjective engagement with their respective intervention at post-intervention. However, contrary to hypotheses, there were no condition differences in depression and anxiety improvements during the intervention period.
Retention, Adherence and Satisfaction
Interpretation bias interventions have been championed as one approach to address the need for low-cost and accessible clinical tools (Vrijsen et al., 2024). However, suboptimal user engagement has been a consistent challenge. For example, in two previous real-world trials of a web-based interpretation bias intervention that did not offer compensation for using the intervention, only 13.5% and 26.7% completed the intervention (Ji et al., 2021; Larrazabal et al., 2024). Similarly low adherence rates have been found for other real-world trials of self-guided DMHIs (completion rates ranging from 0.5% to 28.6%; Fleming et al., 2018). In the present study, the HW completion rate was comparably higher, with 77.8% of HW participants still using the app in the fourth week, and 43.7% achieving perfect adherence (i.e., completing at least three interpretation bias exercises per week for four weeks). This may be because the HW app’s design mirrors how people traditionally use phone apps: interpretation bias exercises can be completed in short (i.e., 5-minute) bursts throughout the week, and people can use or ignore other app features as desired. In contrast, interpretation bias interventions have historically been designed to have participants complete longer exercises following a pre-determined schedule (e.g., eight 20-minute sessions spaced at least two days apart; Ji et al., 2021; five 15-minute sessions spaced at least five days apart; Larrazabal et al., 2024), which may make it difficult for people to find time to use the intervention as intended.
The higher rate of adherence in the current study could also be due to the HW app’s inclusion of features designed to enhance objective and subjective engagement (e.g., personalization, game-like elements; Borghouts et al., 2021; Perski et al., 2017). To this end, the HW app demonstrated strong evidence of acceptability and usability, which is also consistent with prior pilot trials of HW (Beard et al., 2021; Beard et al., 2022; Ferguson et al., 2024). Further, as expected, participants reported significantly greater subjective engagement with the HW app compared to the ST condition. These results are especially encouraging, given that the present sample consisted of demographically diverse community members who are rarely represented in interpretation bias intervention research (Vrijsen et al., 2024). Taken together, these findings highlight the HW app’s promise as an engaging intervention for adults with anxiety and depression symptoms.
Mixed Findings for Intervention Effectiveness
HabitWorks aims to improve clinical outcomes by teaching people to make less negative interpretations of ambiguous information. In line with this, HW (vs. ST) participants demonstrated greater improvements in interpretation bias accuracy and functional impairment, as well as greater self-reported clinical global improvement at post-intervention. To our knowledge, no prior interpretation bias intervention trials with adults have evaluated these important outcomes. However, the effect sizes of −0.24 (functional impairment) and 0.45 (clinical global improvement) in this study are comparable to effect sizes for anxiety and depression found in a meta-analysis of interpretation bias interventions (−0.30 and −0.26, respectively; Fodor et al., 2020). They are also in the range of effect sizes for functional impairment in trials of internet-based CBT (e.g., ds from 0.21 to 0.35; Richards et al., 2020; Titov et al., 2010).
Reflected as POMP scores (using a 0-100 scale), HW (vs. ST) participants improved by an average of 14.6% more in benign bias, 33.4% more in negative bias, and 5.25% more in functional impairment. To put this differential change in context, at baseline, the estimated benign bias score was 67.6% across conditions, and the estimated negative bias score was 47.5% across conditions (where benign and negative bias scores under 70% indicate the presence of at least a mild interpretation bias; Beard & Amir, 2009). Within-group estimates indicate that HW participants increased by an average of 14.7% and 37.6% in their benign and negative bias scores by post-intervention, reaching average benign and negative bias scores of 82.3% and 85.1%, respectively, in the healthy range for interpretation bias accuracy. By contrast, ST participants increased by an average of 0.1% and 4.2% in their benign and negative bias scores by post-intervention, reaching average benign and negative bias scores of 67.7% and 51.7%, respectively, remaining in the clinical range for interpretation bias accuracy. These results suggest that on average, the estimated between-group differences in interpretation bias accuracy are potentially meaningful (Cohen at al., 1999).
This contrasts with the interpretation of the POMP score for differential change in functional impairment. At baseline, the model-estimated estimated functional impairment total score was 19.70 across conditions, indicating moderate functional impairment (where 10 and below is low impairment, 10-19 is moderate impairment, and 20 and greater is severe impairment and moderately severe or worse psychopathology; Mundt et al., 2002). HW participants decreased by an average of 3.20 points in their total WSAS score (or 8% on a 0-100 scale), reaching an estimated mean total score of 16.50 at post-intervention (in the moderate range of functional impairment), while ST participants decreased by an average of 1.10 points in their total WSAS score (or 2.75% on a 0-100 scale), reaching an estimated mean total score of 18.60 at post-intervention (also in the moderate range of functional impairment). Thus, although the between-group difference in functional impairment was statistically significant (and the small effect size is consistent with prior research, including internet CBT trials), the interpretation of the POMP scores does not provide clear evidence of a meaningful difference, as both conditions remained in the moderate range of impairment at post-intervention. Together, these findings highlight key questions facing the field: What effect sizes can be reasonably expected from a brief, highly targeted intervention such as HabitWorks? And, what minimum necessary benefit should be required to justify their use, especially in cases where other established treatments are not available or accessible (Muñoz et al., 2022; Vrijsen et al., 2024)? Future HW studies should use adaptive trial designs (e.g., sequential multiple-assignment randomized trial designs, micro-randomized trials; see Nahum-Shani et al., 2012; Klasnja et al., 2015) to dynamically test intervention components (e.g., dosage, level of guidance, when to prompt individuals to use app features) and optimize the intervention’s effects on functional and clinical outcomes.
Unexpectedly, both conditions experienced comparable improvements in depression and anxiety symptoms. Previous studies have also found significant improvements in anxiety and depression across both active interpretation bias interventions and various comparison conditions (e.g., Daniel et al., 2020; de Voogd et al. 2018), raising questions about whether such improvements reflect the natural course of clinical symptoms, regression to the mean, expectancy effects, or active effects of the comparison conditions. Indeed, we selected an active and credible control condition to provide a rigorous test of HW and expected symptom tracking might improve symptom severity. There are many factors in the present study which may have contributed to the observed improvements in symptoms across conditions. For example, it may be that symptom improvement across conditions was driven by increased self-reflection and self-awareness prompted by symptom tracking (see Boswell et al., 2015; Schueller et al., 2021). However, we cannot evaluate this possibility without a no-symptom-tracking comparison group.
Notably, depression and anxiety symptom measures were completed one or three times per week as part of the intervention protocol for both conditions (i.e., symptom tracking) as well as for the assessment protocol (six total occasions for HW during the intervention period vs. 14 total occasions for ST), whereas secondary outcome measures were only completed as part of the assessment protocol (two total occasions for both conditions). Thus, it is also possible that the use of different assessment schedules to measure primary and secondary outcomes might have contributed to the study’s mixed findings. Future HW trials should include different symptom outcome measures from those that are included as part of the intervention.
The mixed findings across symptom-specific versus global outcomes could also be related to our choice of primary outcome measures for this study and their fit with this specific intervention and sample. HabitWorks is a transdiagnostic intervention that delivers personalized interpretation bias exercises for a variety of target concerns spanning depression and multiple anxiety domains (e.g., social anxiety, anxiety sensitivity). In contrast, the primary outcomes we selected are symptom-focused outcomes. As a result, analyses of symptom outcomes may not adequately capture HW’s actual clinical impact, while analyses of secondary outcomes (which are more global) may provide a more representative picture of HW’s far transfer effects (Vrijsen et al., 2024). The CGIS in particular enables participants to rate their impressions of the impact (or lack thereof) of a given intervention on their own lives in whatever areas are most meaningful to them rather than constraining them to report on domains that may not be relevant (Alpert et al., 2025). Future studies may wish to include assessments of global ratings of improvement across domains, in addition to standardized symptom measures, as both types of assessments capture meaningful and distinct information about symptom change (Alpert et al., 2025). Relatedly, it is also possible that HW would yield stronger effects on primary outcomes if measures that consider a wider range of symptoms were used (e.g., severity and impairment associated with various anxiety disorders rather than generalized anxiety disorder specifically). Alternatively, given that participants only needed to report mild depression or anxiety symptom severity at screening for eligibility, it may be that more robust effects would be observed in a sample with more pronounced clinical symptoms. This is consistent with a recent trial of a web-based interpretation bias intervention, which found that people with greater baseline anxiety experienced greater improvements in the intervention (vs. control) condition (Larrazabal et al., 2024). Future studies should investigate clinically relevant moderators of HW’s effects and examine HW’s effectiveness in real-world clinical settings to better understand HW’s clinical utility and the conditions under which these effects occur.
Finally, while research indicates interpretation bias in the development and maintenance of depression and anxiety (e.g., Hirsch et al., 2016; Vos et al., 2025), it is likely not the key driver of depression and anxiety symptoms for all people. Importantly, eligibility criteria for this study were not based on the targeted mechanism. Thus, interpretation bias may have been a less relevant or influential intervention target for some participants, and this may help explain why HW did not yield stronger effects on symptoms than ST. Future work is needed to determine optimal methods to identify individuals for whom interpretation bias is particularly strong and relevant to their symptoms, as these people presumably stand to benefit most from HW.
Limitations
This study has several methodological strengths, including the randomized controlled design and demographically representative sample. Findings should also be interpreted in light of several limitations. First, both conditions were offered the option to use the HW app during the one-month follow-up period, which limits our ability to draw conclusions about the durability of HW’s effects. Conducting a more rigorous evaluation of HW’s long-term effects represents an important direction for future work. Second, participants were community members with at least mild depression and/or anxiety symptoms who agreed to participate in a research study. They did not have to be diagnosed with an anxiety or depressive disorder (although baseline means surpassed the clinical cut-off for depression and approached it for anxiety, and functional impairment approached the cut-off for severe), nor were they seeking treatment. Thus, the generalizability of the present study’s findings to clinical populations, especially those with more severe symptoms, remains unclear. Further, the samples’ modest baseline depression and anxiety symptom severity may have contributed to the muted effects and lack of between-condition differences for depression and anxiety symptom change. An important future direction will be to examine the HW app’s effectiveness in clinical settings. Third, the internal consistency of the benign interpretation accuracy score was poor. Although the WSAP has shown adequate psychometric properties across many studies (Gonsalves et al., 2019), other research has similarly encountered low internal consistency for different behavioral measures of interpretation bias, particularly benign bias scores (e.g., Ferguson et al., 2025; Eberle et al., 2023). This may reflect genuine within-session variability in individuals’ benign interpretations of ambiguous situations (MacLeod et al., 2019). Nonetheless, even behavioral tasks with lower reliability can still consistently detect average differences between groups (MacLeod et al., 2019). Finally, as already noted, measures of symptom improvement overlapped with the measures used to track symptoms as part of each intervention condition.
Conclusion
To our knowledge, this is the first RCT testing the real-world effectiveness of a personalized, transdiagnostic, app-based interpretation bias intervention for U.S. adults. Relative to a symptom tracking control condition, the HW app was effective at improving interpretation bias, functioning, and clinical global improvement, but not specific symptoms. This study represents an important initial real-world test of HW. Future research is needed to increase our understanding of the HW app’s clinical utility (e.g., moderators of response, durability of effects).
Supplementary Material
Public Health Significance Statement:
HabitWorks is a personalized Smartphone app that trains people (via interpretation bias exercises) to think less negatively about ambiguous situations, which in turn may help improve psychological symptoms. This study indicates that HabitWorks is an engaging and satisfactory app among U.S. adults with depression and anxiety symptoms. Results also suggest that HabitWorks may be more effective for clinical outcomes that are more global rather than symptom-specific.
Funding & Acknowledgements:
This work was supported by the National Institute of Mental Health (R01MH12937) awarded to C. Beard, and by Harvard Medical School’s Livingston Fellowship and McLean Hospital’s Pope-Hintz Endowed Fellowship awarded to A.L. Silverman. We thank the members of the HabitWorks Community Advisory Board and the Harvard Catalyst Coalition for Equity in Research for their guidance on this project. We thank Frances Grace Hart for assistance with data collection.
Appendix 1. Data Transparency
The data reported in this manuscript have not been previously submitted or published as part of separate manuscripts, and no studies using this data are currently in press.
Footnotes
Although multilevel models are robust to unbalanced designs (Maas and Hox, 2005), the inclusion of all measurement occasions for the ST condition would have resulted in HW participants missing more than half of their primary outcome data (because the HW condition had fewer measurement occasions for anxiety and depression symptom measures by study design). Thus, we chose to exclude observations for symptom surveys administered to the ST condition at the beginning and middle of each week. This decision was made to balance participant numbers across conditions to allow for more robust statistical modeling. See Table S1 for the complete measurement schedule, as well as specific measurement occasions used for analyses of depression and anxiety symptoms. Sensitivity analyses were conducted with all measurement occasions for both conditions included and results did not differ (see S2.1 and Table S4).
References
- Alonso J, Liu Z, Evans‐Lacko S, Sadikova E, Sampson N, Chatterji S, Abdulmalik J, Aguilar-Gaxiola S, Al-Hamzawi A, Helena Andrade L, Bruffaerts R, Cardoso G, Cia A, Florescu S, …& WHO World Mental Health Survey Collaborators. (2018). Treatment gap for anxiety disorders is global: Results of the World Mental Health Surveys in 21 countries. Depression and Anxiety, 35, 195–208. doi: 10.1002/da.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alpert E, Fox AB, & Galovski TE (2025). Who defines improvement? Patients’ global reports of improvement compared to standardized measures of improvement in cognitive processing therapy for posttraumatic stress disorder. Journal of Anxiety Disorders, 103027. doi: 10.1016/j.janxdis.2025.103027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates D, Maechler M, Bolker B, Walker S, Christensen RHB, Singmann H, Dai B, Scheipl F, …& Boylan RD. (2025); Linear Mixed-Effects Models Using ‘Eigen’ and S4. https://cran.r-project.org/web/packages/lme4/index.html [Google Scholar]
- Beard C, & Amir N (2008). A multi-session interpretation modification program: Changes in interpretation and social anxiety symptoms. Behaviour Research and Therapy, 46, 1135–1141. doi: 10.1016/j.brat.2008.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beard C, & Amir N (2009). Interpretation in social anxiety: When meaning precedes ambiguity. Cognitive Therapy and Research, 33, 406–415. doi: 10.1007/s10608-009-9235-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beard C, Beckham E, Solomon A, Fenley AR, & Pincus DB (2022). A pilot feasibility open trial of an interpretation bias intervention for parents of anxious children. Cognitive and Behavioral Practice, 29, 860–873. doi: 10.1016/j.cbpra.2021.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beard C, Ferguson I, Hart FG, & Silverman AL (2024). Engagement and Effectiveness of HabitWorks and Symptom Tracking for Anxiety and Depression [Preregistration]. doi: 10.17605/OSF.IO/7WX9D [DOI] [Google Scholar]
- Beard C, Ramadurai R, McHugh RK, Pollak JP, & Björgvinsson T (2021). HabitWorks: Development of a CBM-I smartphone app to augment and extend acute treatment. Behavior Therapy, 52, 365–378. doi: 10.1016/j.beth.2020.04.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beard C, Rifkin LS, Silverman AL, & Björgvinsson T (2019). Translating CBM-I into real-world settings: Augmenting a CBT-based psychiatric hospital program. Behavior Therapy, 50, 515–530. doi: 10.1016/j.beth.2018.09.002 [DOI] [PubMed] [Google Scholar]
- Beard C, Weisberg RB, & Primack J (2012). Socially anxious primary care patients’ attitudes toward cognitive bias modification (CBM): a qualitative study. Behavioural and Cognitive Psychotherapy, 40, 618–633. doi: 10.1017/S1352465811000671 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beck AT, & Clark DA (1997). An information processing model of anxiety: Automatic and strategic processes. Behaviour Research and Therapy, 35, 49–58. doi: 10.1016/S0005-7967(96)00069-1 [DOI] [PubMed] [Google Scholar]
- Borghouts J, Eikey E, Mark G, De Leon C, Schueller SM, Schneider M, Stadnick N, Zheng K, Mukamel D, & Sorkin DH (2021). Barriers to and facilitators of user engagement with digital mental health interventions: systematic review. Journal of Medical Internet research, 23, e24387. doi: 10.2196/24387 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boswell JF, Kraus DR, Miller SD, & Lambert MJ (2015). Implementing routine outcome monitoring in clinical practice: Benefits, challenges, and solutions. Psychotherapy Research, 25, 6–19. doi: 10.1080/10503307.2013.817696 [DOI] [PubMed] [Google Scholar]
- Brooke J (1996). SUS: A ‘quick and dirty’ usability scale. In Jordan P, Thomas B, & Weerdmeester B (Eds.), Usability Evaluation in Industry. London, UK: Taylor & Francis. [Google Scholar]
- Cohen P, Cohen J, Aiken LS, & West SG (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315–346. doi: 10.1207/S15327906MBR3403_2 [DOI] [Google Scholar]
- Cristea IA, Kok RN, & Cuijpers P (2015). Efficacy of cognitive bias modification interventions in anxiety and depression: meta-analysis. The British Journal of Psychiatry, 206, 7–16. doi: 10.1192/bjp.bp.114.146761 [DOI] [PubMed] [Google Scholar]
- Daniel KE, Daros AR, Beltzer ML, Boukhechba M, Barnes LE, & Teachman BA (2020). How anxious are you right now? Using ecological momentary assessment to evaluate the effects of cognitive bias modification for social threat interpretations. Cognitive Therapy and Research, 44, 538–556. doi: 10.1007/s10608-020-10088-2 [DOI] [Google Scholar]
- Daniel EK, Johnco C, & Sicouri G (2025). Content-specificity of a single-session cognitive bias modification of interpretations for social anxiety. Journal of Affective Disorders Reports, 100937. doi: 10.1016/j.jadr.2025.100937 [DOI] [Google Scholar]
- de Voogd L, Wiers RW, de Jong PJ, Zwitser RJ, & Salemink E (2018). A randomized controlled trial of multi-session online interpretation bias modification training: Short-and long-term effects on anxiety and depression in unselected adolescents. PLoS One, 13, e0194274. doi: 10.1371/journal.pone.0194274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eberle JW, Boukhechba M, Sun J, Zhang D, Funk DH, Barnes LE, & Teachman BA (2023). Shifting episodic prediction with online cognitive bias modification: A randomized controlled trial. Clinical Psychological Science, 11, 819–840. doi: 10.1177/21677026221103128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eberle JW, Daniel KE, Baee S, Silverman AL, Lewis E, Baglione AN, Werntz A, French N, Ji J, Hohensee N, Tong Z, Huband J, Boukhechba M, Funk D, Barnes B, & Teachman BA (2024). Web-based interpretation bias training to reduce anxiety: A sequential, multiple-assignment randomized trial. Journal of Consulting and Clinical Psychology, 92, 367–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Everaert J, Podina IR, & Koster EH (2017). A comprehensive meta-analysis of interpretation biases in depression. Clinical Psychology Review, 58, 33–48. doi: 10.1016/j.cpr.2017.09.005 [DOI] [PubMed] [Google Scholar]
- Feingold A (2019). New approaches for estimation of effect sizes and their confidence intervals for treatment effects from randomized controlled trials. The Quantitative Methods for Psychology, 15, 96–111. doi: 10.20982/tqmp.15.2.p096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson I, George G, Narine KO, Turner A, McGhee Z, Bajwa H, Hart FG, Carter S, & Beard C (2024). Acceptability and engagement of a smartphone-delivered interpretation bias intervention in a sample of Black and Latinx adults: Open trial. JMIR Mental Health, 11, e56758. doi: 10.2196/56758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson I, George G, Wu C, Xu I, Passel E, Germine LT, & Beard C (2025). Evaluating the reliability of the Word-Sentence Association Paradigm (WSAP) as an interpretation bias assessment across ethnoracial groups. Cognitive Therapy and Research, 49, 425–432. doi: 10.1007/s10608-024-10523-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fleming T, Bavin L, Lucassen M, Stasiak K, Hopkins S, & Merry S (2018). Beyond the trial: Systematic review of real-world uptake and engagement with digital self-help interventions for depression, low mood, or anxiety. Journal of Medical Internet Research, 20, e199. doi: 10.2196/jmir.9275 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fodor LA, Georgescu R, Cuijpers P, Szamoskozi Ş, David D, Furukawa TA, & Cristea IA (2020). Efficacy of cognitive bias modification interventions in anxiety and depressive disorders: A systematic review and network meta-analysis. The Lancet Psychiatry, 7, 506–514. doi: 10.1016/S2215-0366(20)30130-9 [DOI] [PubMed] [Google Scholar]
- Gonsalves M, Whittles RL, Weisberg RB, & Beard C (2019). A systematic review of the word sentence association paradigm. Journal of Behavior Therapy and Experimental Psychiatry, 64, 133–148. doi: 10.1016/j.jbtep.2019.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grund S, Robitzsch A, & Luedtke O (2023). Mitml: Tools for Multiple Imputation in Multilevel Modeling. https://cran.r-project.org/web/packages/mitml/index.html [Google Scholar]
- Guy W (1976). ECDEU Assessment Manual for Psychopharmacology (Rev.; DHEW Publ. No. ADM 76–338). Rockville, MD: U.S. Department of Health, Education, and Welfare. [Google Scholar]
- Hallion LS, & Ruscio AM (2011). A meta-analysis of the effect of cognitive bias modification on anxiety and depression. Psychological Bulletin, 137, 940–958. doi: 10.1037/a0024355 [DOI] [PubMed] [Google Scholar]
- Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, & Conde JG (2009). Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics, 42, 377–381. doi: 10.1016/j.jbi.2008.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirsch CR, Meeten F, Krahé C, & Reeder C (2016). Resolving ambiguity in emotional disorders: The nature and role of interpretation biases. Annual Review of Clinical Psychology, 12, 281–305. doi: 10.1146/annurev-clinpsy-021815-093436 [DOI] [PubMed] [Google Scholar]
- Ji JL, Baee S, Zhang D, Calicho-Mamani CP, Meyer MJ, Funk D, Portnow S, Barnes L, & Teachman BA (2021). Multi-session online interpretation bias training for anxiety in a community sample. Behavior Research and Therapy, 142. doi: 10.1016/j.brat.2021.103864 [DOI] [PubMed] [Google Scholar]
- Jones EB, & Sharpe L (2017). Cognitive bias modification: A review of meta-analyses. Journal of Affective Disorders, 223, 175–183. doi: 10.1016/j.jad.2017.07.034 [DOI] [PubMed] [Google Scholar]
- Kelders SM, Kip H, & Greeff J (2020). Psychometric evaluation of the TWente Engagement with Ehealth Technologies Scale (TWEETS): Evaluation study. Journal of Medical Internet Research, 22, e17757. doi: 10.2196/17757 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D, Tewari A, & Murphy SA (2015). Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology, 34, 1220–1228. doi: 10.1037/hea0000305 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kroenke K, Spitzer RL, & Williams JB (2003). The Patient Health Questionnaire-2: Validity of a two-item depression screener. Medical Care, 41, 1284–1292. doi: 10.1097/01.MLR.0000093487.78664.3C [DOI] [PubMed] [Google Scholar]
- Kroenke K, Strine TW, Spitzer RL, Williams JB, Berry JT, & Mokdad AH (2009). The PHQ-8 as a measure of current depression in the general population. Journal of Affective Disorders, 114, 163–173. doi: 10.1016/j.jad.2008.06.026 [DOI] [PubMed] [Google Scholar]
- Kuhn E, Kanuri N, Hoffman JE, Garvert DW, Ruzek JI, & Taylor CB (2017). A randomized controlled trial of a smartphone app for posttraumatic stress disorder symptoms. Journal of Consulting and Clinical Psychology, 85, 267–273. doi: 10.1037/ccp0000163 [DOI] [PubMed] [Google Scholar]
- Larrazabal MA, Eberle JW, de la Garza Evia AV., Boukhechba M., Funk DH., Barnes LE., Boker SM., & Teachman BA. (2024). Online cognitive bias modification for interpretation to reduce anxious thinking during the COVID-19 pandemic. Behaviour Research and Therapy, 173, 104463. doi: 10.1016/j.brat.2023.104463 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Livermon S, Michel A, Zhang Y, Petz K, Toner E, Rucker M, Boukhechba M, Barnes LE, & Teachman BA (2025). A mobile intervention to reduce anxiety among university students, faculty, and staff: Mixed methods study on users’ experiences. PLOS Digital Health, 4, e0000601. doi: 10.1371/journal.pdig.0000601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lorenzo-Luaces L, & Howard J (2023). Efficacy of an unguided, digital single-session intervention for internalizing symptoms in web-based workers: randomized controlled trial. Journal of Medical Internet Research, 25, e45411. doi: 10.2196/45411 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maas CJ, & Hox JJ (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86–92. doi: 10.1027/1614-2241.1.3.86 [DOI] [Google Scholar]
- MacLeod C, Grafton B, & Notebaert L (2019). Anxiety-linked attentional bias: is it reliable?. Annual Review of Clinical Psychology, 15, 529–554. doi: 10.1146/annurev-clinpsy-050718-095505 [DOI] [PubMed] [Google Scholar]
- MacLeod C, & Mathews A (2012). Cognitive bias modification approaches to anxiety. Annual Review of Clinical Psychology, 8, 189–217. doi: 10.1146/annurev-clinpsy-032511-143052 [DOI] [PubMed] [Google Scholar]
- Mathews A, & MacLeod C (2005). Cognitive vulnerability to emotional disorders. Annual Review of Clinical Psychology, 1, 167–195. doi: 10.1146/annurev.clinpsy.1.102803.143916 [DOI] [PubMed] [Google Scholar]
- Maxwell SE, Delaney HD, & Kelley K (2018). Designing Experiments and Analyzing Data, Third Edition. New York, NY: Routledge. [Google Scholar]
- McCurdie T, Taneva S, Casselman M, Yeung M, McDaniel C, Ho W, & Cafazzo J (2012). mHealth consumer apps: The case for user-centered design. Biomedical Instrumentation & Technology, 46, 49–56. doi: 10.2345/0899-8205-46.s2.49 [DOI] [PubMed] [Google Scholar]
- Menne-Lothmann C, Viechtbauer W, Höhn P, Kasanova Z, Haller SP, Drukker M, ... & Lau JY. (2014). How to boost positive interpretations? A meta-analysis of the effectiveness of cognitive bias modification for interpretation. PloS One, 9, e100925. doi: 10.1371/journal.pone.0100925 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moitra M, Santomauro D, Collins PY, Vos T, Whiteford H, Saxena S, & Ferrari AJ (2022). The global gap in treatment coverage for major depressive disorder in 84 countries from 2000–2019: A systematic review and Bayesian meta-regression analysis. PLoS Medicine, 19, e100390. doi: 10.1371/journal.pmed.1003901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mundt JC, Marks IM, Shear MK, & Greist JM (2002). The Work and Social Adjustment Scale: A simple measure of impairment in functioning. The British Journal of Psychiatry, 180, 461–464. doi: 10.1192/bjp.180.5.461 [DOI] [PubMed] [Google Scholar]
- Muñoz RF (2022). Harnessing psychology and technology to contribute to making health care a universal human right. Cognitive and Behavioral Practice, 29, 4–14. doi: 10.1016/j.cbpra.2019.07.003 [DOI] [Google Scholar]
- Nahum-Shani I, Qian M, Almirall D, Pelham WE, Gnagy B, Fabiano GA, Wazmonsky J, Yu J, & Murphy SA (2012). Experimental design and primary data analysis methods for comparing adaptive interventions. Psychological Methods, 17, 457–477. doi: 10.1037/a0029372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- NCSS Statistical Software (2024). Power Analysis and Sample Size Software. [Google Scholar]
- Perski O, Blandford A, West R, & Michie S (2017). Conceptualizing engagement with digital behaviour change interventions: A systematic review using principles from critical interpretive synthesis. Translational Behavioral Medicine, 7, 254–267. doi: 10.1007/s13142-016-0453-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team (2024). R: A language and environment for statistical computing. [Google Scholar]
- Richards D, Enrique A, Eilert N, Franklin M, Palacios J, Duffy D, Earley C, Chapman J, Jell G, Sollesse S, & Timulak L (2020). A pragmatic randomized waitlist-controlled effectiveness and cost-effectiveness trial of digital interventions for depression and anxiety. NPJ Digital Medicine, 3, 85. doi: 10.1038/s41746-020-0293-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salemink E, de Jong SR, Notebaert L, MacLeod C, & Van Bockstaele B (2022). Gamification of cognitive bias modification for interpretations in anxiety increases training engagement and enjoyment. Journal of Behavior Therapy and Experimental Psychiatry, 76, 101727. doi: 10.1016/j.jbtep.2022.101727 [DOI] [PubMed] [Google Scholar]
- Sauro J, & Lewis JR (2016). Quantifying the User Experience: Practical Statistics for User Research (2nd Ed.). Burlington, MA: Morgan Kaufmann Publishers. [Google Scholar]
- Schueller SM, Neary M, Lai J, & Epstein DA (2021). Understanding people’s use of and perspectives on mood-tracking apps: Interview study. JMIR Mental health, 8, e29368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silverman AL, Ferguson I, Bullis JR, Bajwa H, Mei S, & Beard C (2025). Program evaluation of internet-delivered cognitive behavioral treatments for anxiety and depression in a digital clinic. Journal of Mood & Anxiety Disorders, 9, 100106. doi: 10.1016/j.xjmad.2025.100106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silverman AL, Kovarsky Rotta G, Shin D, Ferguson I, & Beard C (2025). Randomized controlled trial of a personalized, smartphone-based interpretation bias intervention for anxiety and depression [Preregistration]. doi: 10.17605/OSF.IO/EJ89T [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spitzer RL, Kroenke K, Williams JB, & Löwe B (2006). A brief measure for assessing generalized anxiety disorder: The GAD-7. Archives of Internal Medicine, 166, 1092–1097. doi: 10.1001/archinte.166.10.1092 [DOI] [PubMed] [Google Scholar]
- Terides MD, Dear BF, Fogliati VJ, Gandy M, Karin E, Jones MP, & Titov N (2018). Increased skills usage statistically mediates symptom reduction in self-guided internet-delivered cognitive–behavioural therapy for depression and anxiety: a randomised controlled trial. Cognitive Behaviour Therapy, 47, 43–61. [DOI] [PubMed] [Google Scholar]
- Terlizzi EP, & Zablotsky B (2024). Symptoms of anxiety and depression among adults: United States 2019 and 2022. National Health Statistics Reports, 213. Hyattsville, MD: National Center for Health Statistics. doi: 10.15620/cdc/64018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Titov N, Andrews G, Johnston L, Robinson E, & Spence J (2010). Transdiagnostic Internet treatment for anxiety disorders: A randomized controlled trial. Behaviour Research and Therapy, 48, 890–899. doi: 10.1016/j.brat.2010.05.014 [DOI] [PubMed] [Google Scholar]
- Vos LM, Nieto I, Amanvermez Y, Smeets T, & Everaert J (2025). Do cognitive biases prospectively predict anxiety and depression? A multi-level meta-analysis of longitudinal studies. Clinical Psychology Review, 102552. doi: 10.1016/j.cpr.2025.102552 [DOI] [PubMed] [Google Scholar]
- Vrijsen JN, Grafton B, Koster EH, Lau J, Wittekind CE, Bar-Haim Y, Becker ES, Brotman MA, Joorman J, Lazarov A, MacLeod C, Manning V, Pettit JE, Rinck, …& Wiers RW. (2024). Towards implementation of cognitive bias modification in mental health care: State of the science, best practices, and ways forward. Behaviour Research and Therapy, 179, 104557. doi: 10.1016/j.brat.2024.104557 [DOI] [PubMed] [Google Scholar]
- Wu Y, Levis B, Riehm KE, Saadat N, Levis AW, Azar M, Rice DB, Boruff J, Cuijpers P, Gilbody S, … & Thombs BD. (2020). Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9: A systematic review and individual participant data meta-analysis. Psychological Medicine, 50, 1368–1380. 10.1017/S0033291719001314 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Würtz F, Kunna M, Lindgraf C, Blackwell S, Margraf J, Everaert J, & Woud ML (2023). Interpretation biases in anxiety–A three-level meta-analysis. PsyArXiv. doi: 10.31234/osf.io/7zkvr [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
