Abstract
Although engagement in self-directed wellness exercises typically declines over time, integrating social support such as coaching can sustain it. However, traditional forms of support are often inaccessible due to the high costs and complex coordination. Large Language Models (LLMs) show promise in providing human-like dialogues that could emulate social support. Yet, in-depth, in situ investigations of LLMs to support behavior change remain underexplored. We conducted two randomized experiments to assess the impact of LLM agents on user engagement with mindfulness exercises. First, a single-session study, involved 502 crowdworkers; second, a three-week study, included 54 participants. We explored two types of LLM agents: one providing information and another facilitating self-reflection. Both agents enhanced users’ intentions to practice mindfulness. However, only the information-providing LLM agent, featuring a friendly persona, significantly improved engagement with the exercises. Our findings suggest that specific LLM agents may bridge the social support gap in digital health interventions.
Additional Key Words and Phrases: large language models, mindfulness, behavior change, social support, supportive accountability, well-being, habit formation
1. Introduction
Changing behavior to improve wellness is challenging, even when the potential benefits are obvious [21, 130, 131]. For example, people are often motivated to participate in traditional yoga classes, try meditation routines discovered in books or magazines, participate in communityled mindfulness sessions, or explore wellness exercises on digital platforms such as Instagram and YouTube. However, forming habits and following through with these exercises proves to be a significant hurdle [135, 201]. This lack of perseverance can be attributed to several factors, including lack of accountability and the struggle to overcome ambivalence [21, 82, 83]. Nevertheless, there is evidence that social support can contribute to the success of behavior change initiatives by increasing accountability, motivation, and emotional sustenance [5, 19, 105, 136, 144].
The ‘Supportive Accountability’ model proposed by Mohr et al. [134] suggests that digital health promotion tools are more likely to be effective when complemented by human support. This support goes beyond simple companionship, incorporating benevolence, relevant expertise, and tailored guidance. An example could be a coach that provides information, scaffolding the learning process with factual and procedural knowledge [134, 161]. On the other hand, there can be support that operates through inquiry and reflection, which encourages users to engage in self-exploration and critical thinking about their behaviors and underlying motivations [87, 180]. Integrating social and expertise-based support into technology may facilitate long-term behavioral change.
However, traditional forms of support are inaccessible to many due to the costs and complexities of providing support [3]. Human support is often the most expensive and complex part of a digital health intervention [10, 102, 133]. Some people also have preferences not to interact directly with others or lack existing supportive relationships that can be leveraged to support new practices.
Recently, advances in deep learning have led to the development of LLMs such as GPT-4 [2, 25]. These models are large neural networks trained on vast collections of textual data, enabling them to process inputs in natural language and generate human-like text. Conversations, essential for providing social support among humans, can now be convincingly mimicked by LLM-based conversational agents, simulating the personalized and open-ended dimensions of human-to-human interactions that, until now, have been challenging to achieve in automated conversation systems [128, 164, 174]. LLMs have potential to play a multifaceted role in helping individuals achieve behavior change. Although significant research has explored social support in various contexts, less is known about the effectiveness of LLM-based agents for long-term behavior change in ecologically valid settings [174].
Our research draws from existing CSCW and social support literature to understand the role of LLM-based agents in providing social support to adopt wellness practices [3, 105, 132]. Specifically, we focus on evaluating the impact of LLM agents on two key aspects: the users’ intention to participate in wellness exercises and their actual engagement with these exercises. We also gather user perspectives on the support provided by LLM agents to understand how people think about collaborating with LLM agents around behavior change. We address the following research questions.
RQ1 (Intention to Practice Exercises): How do LLM agents influence users’ intention to engage with wellness exercises?
RQ2 (Engagement with Exercises): What impact do LLM agents have on the actual engagement with wellness exercises, as indicated by the initiation and completion rates of exercises?
RQ3 (Measures of Well-being): What impact do LLM agents have on immediate and long-term well-being measures (such as stress and mindfulness scores) related to wellness exercises?
We focus on mindfulness as a case study, a practice increasingly recognized for its extensive benefits in improving mental health, reducing stress, improving emotional regulation, and fostering overall well-being [103, 110, 141, 177]. A popular method of learning mindfulness is through tutorial videos. This approach allows people to learn about mindfulness and practice it at their own pace and convenience. However, despite the accessibility and flexibility of these self-guided mindfulness programs, they often face challenges with attrition. Studies have shown that consistent engagement in such self-directed mindfulness interventions can be challenging, and many participants do not complete the programs [12, 162, 190]. Estimates suggest that 23–39% people drop out within 10 weeks of the program [113].
There is an opportunity to enhance engagement with self-directed wellness practices through additional social support. The nature of this social support can vary to meet individual needs at any point in the behavior change process [129]. Informational support and validation can play crucial roles in helping people understand the benefits of change and motivate them to begin practicing. This type of support can provide the necessary clarity and encouragement that individuals might need at various times during their wellness journey. In addition, planning support and decisional reinforcement can help individuals stay on track and effectively integrate practices into their daily routines. Self-reflection can allow participants to introspect and evaluate their progress [59, 109, 150]. Reflective support is particularly valuable as it can deepen the understanding of personal experiences and promote sustained behavior change [13, 14, 16]. In this paper, we use LLM agents to provide informational and reflective support to investigate whether these agents can meet varying needs for support throughout the mindfulness journey of an individual.
We conducted a formative study involving online crowdworkers (N = 502) to capture user perspectives and intentions about mindfulness practice when supplemented with different forms of LLM agent support (RQ1). As design probes, we developed two types of LLM agents using GPT-3: the Information Chatbot, designed to provide detailed information about mindfulness practices, and the Reflection Chatbot, designed to help participants reflect on their mindfulness experiences and plan their future mindfulness sessions. Additionally, our study incorporated a tutorial video as a fundamental element of many digital interventions promoting mindfulness and similar wellness behaviors. Findings suggested that while both chatbots were perceived to be useful, users preferred more relaxed and friendly conversational interactions (small talk), which they found particularly lacking in the Information Chatbot. Self-reported intentions to practice mindfulness were higher among participants who first interacted with the Information Chatbot before watching the video and those who engaged with the Reflection Chatbot after the video, compared to the control group that did not use the chatbots. Participants mentioned repetitiveness in the responses as a point of discontent for both chatbots.
As highlighted in previous research [154, 167], such preliminary insights require validation through real-world deployments to gauge the alignment between expressed intentions and actual user behavior. To this end, we conducted a 3-week deployment study with 54 participants interested in learning about mindfulness (RQ2). The core intervention included instructional videos and email reminders every two days to practice mindfulness. In a 2 (Sociable Information LLM agent: present vs. absent) × 2 (Reflection LLM agent post Exercise: present vs. absent) between-subjects experiment, participants were randomized to have access to the two LLM chatbots. Based on the participants’ feedback in the formative study, the Information Chatbot was developed into a sociable informational companion chatbot, Mindy, characterized by a friendly persona. Half of the participants were randomly assigned to receive access to Mindy in their reminder emails. We continued using the Reflection LLM agent, which was enhanced based on initial feedback and delivered immediately after each instructional video. Again, half of the participants were randomly given access to this reflective component after each session. We upgraded to GPT-4 from GPT-3 to address issues like repetitiveness from the formative study. Participants who had access to the Sociable Information LLM, Mindy, showed a notable increase in engagement with a 18% higher initiation rate and a 12% higher completion rate of daily exercises compared to those without access to the agent (p < 0.01). Interestingly, incorporating the Reflection LLM agent following video exercises did not significantly affect initiation or completion rates. This finding was somewhat unexpected, as the Reflection LLM, with its specific objective of aiding in planning and reflection, was expected to play a more active role in enhancing exercise adherence.
These findings suggest that LLM agents can potentially improve people’s intentions to participate in self-guided wellness exercises and their engagement with them. However, the findings also highlight that these LLM agents should be tested in a longitudinal setting to gauge the actual benefits before deployment. Given the increasing need for social support in our society, our research underscores the potential benefits of using LLMs to offer the necessary support, thus contributing to improved well-being and healthier lifestyles.
2. Related Work
Our work builds on long-standing research on influencing engagement in behavior change interventions, the role of social support in maintaining this engagement, technology-enabled mindfulness, conversational agents, and the growing literature on the interaction between humans and LLMs.
2.1. User Engagement in Behavior Change Interventions
Behavior change is a broad concept that can capture changes in how individuals prevent or manage health conditions and their daily practices and habits, which have implications for general wellbeing. Here, we focus on the latter, considering the specific context of initiating and sustaining mindfulness practice.
Most digital behavior change tools are intended for ongoing use across the behavior change process, and require repeated use to achieve effects [81, 201]. Most behavior change interventions, especially those delivered through the digital medium without human support, suffer from high dropout and attrition [51, 88, 201]. Engagement alone may not directly result in behavior change, but it serves as an essential precondition [62, 81, 201]. In an era marked by fragmented attention, distractions, and shifting priorities, a key challenge is designing technologies that maintain user interest without contributing to the noise of daily life [43, 65, 178]. Karapanos [81] proposed the use of ‘Creating checking habits’ powered by instant information rewards, similar to those in social media updates and emails. Although there is a risk that these habits could lead to addictive behaviors [61, 81], they also promise to foster beneficial engagement, such as encouraging more frequent exercise through interactions with an app.
2.2. Social Support in Wellness Behavior Change Journeys
Social support is essential in influencing health behaviors and outcomes, acting as a cornerstone of physical and mental well-being [105, 179]. Studies, such as those conducted by Tay et al. [176], have highlighted the multifaceted benefits of social relationships, including their importance in chronic illness self-management and mitigating suicidal tendencies. The expanding research in this field has led to diverse conceptualizations and operational frameworks for social support, reflecting its complexity and the variety of forms it can take [68, 115].
Positive social interaction may be especially critical in well-being interventions, offering fun and relaxation opportunities essential for a balanced lifestyle. Supportive peer relationships, as found by Roach [156], significantly improved adolescents’ mental health, indicating the broader implications of positive social interactions in different demographics. Although several self-directed programs and apps are available in the particular case of mindfulness, a common challenge remains in the lack of sustained use of these tools or strategies in daily life [113, 190]. Mohr et al. [134] proposed a model for “Supportive Accountability”, highlighting the critical role of social support in enhancing adherence and engagement in behavior change programs. The model suggests that incorporating social support elements in technology-enabled services significantly increases their effectiveness in promoting long-term behavioral changes [175, 189]. Recent HCI research underscores the importance of interpersonal relationships in the management of mental health concerns and the crucial role of technology in providing ongoing support [27, 138].
However, the scalability of providing high-quality social support presents a significant challenge, as traditional methods, such as individual, in-person instruction, are not feasible on a large scale. Our research looks into how LLM-based agents can provide different forms of support, focusing on one support style that is informational versus one that encourages self-reflection. Health ‘coaching’ is often defined as a collaborative, client-centered process focused on facilitating self-discovery, goal setting, and accountability, while maintaining professional boundaries rather than functioning as a friendship [145, 191]. Therefore, our LLM agents aligned in many ways with coaching frameworks as far as their focus on supporting self-discovery and accountability, although they may align less in terms of the typical boundaries that are maintained in coaching.
2.3. Technology-Enabled Mindfulness
The HCI literature has extensively explored the role of technology in supporting meditation to practice mindfulness [177]. Technology serves as a valuable tool, allowing users to practice in their own time and space and aiding users in achieving and sustaining focus on inner or outer experiences, such as breathing [146, 148], tactile movements [73, 158], or even monitoring brain activity [90, 91, 159]. These techniques encompass a spectrum of applications, including Virtual Reality (VR), wearables, and online interventions.
Although wearable devices, including VR head-mounted units, show promise in various applications, their reach among diverse users in daily contexts remains limited, particularly for those who are new to mindfulness and seeking accessible entry points. In contrast, numerous online programs and applications have been developed to promote mindfulness practices, leveraging the convenience of smartphones, tablets, and desktop systems for personal use. These apps offer various features, including guided audio meditations, daily mindfulness activities, and training programs. For instance, apps offering guided audio meditations were associated with reduced blood pressure and significant enhancements in overall well-being, distress reduction, and the alleviation of job-related stress [20]. Additionally, smartphone-based interventions provide users with simple daily mindfulness activities, further contributing to improved mental well-being [70]. Walsh et al. [185] highlighted that mindfulness training through smartphone apps can enhance subjective well-being, attentional control, and interoceptive integration. An intervention using a mindfulness meditation app, Calm, was associated with improved sleep quality, reduced alcohol consumption (particularly binge drinking), increased physical activity, and healthier dietary choices, including increased fruit and vegetable consumption [72]. However, these apps suffer from high attrition rates [113].
2.4. Conversational Agents for Behavioral Interventions
Conversational agents, or chatbots, have been increasingly deployed in various domains to positively influence user behavior, including improving sleep habits [155], offering psychiatric counseling [143], promoting healthier diets and exercise routines [71], and supporting smoking cessation [29]. Lukoff et al. [117] introduced a chat-based food journaling tool to help families achieve their healthy eating goals, exemplifying the utility of chatbots in nurturing healthier lifestyle choices. In addition, chatbots have been used in professional settings to help users reflect on experiences [87] or manage psychological transitions related to work [188], ultimately strengthening recovery and overall well-being. Educational initiatives have also seen the integration of pedagogically focused chatbots designed to improve learning outcomes [95]. For example, Zvereva et al. [207] proposed a dialogue-based approach to assess and improve student motivation.
The role of chatbots extends into the mental health domain, where they have been used to manage symptoms and improve quality of life [1, 118]. In particular, Woebot [54], a chatbot based on Cognitive Behavioral Therapy (CBT), demonstrated effectiveness in a randomized controlled trial, highlighting the potential of chatbots to provide mental health support. In addition, chatbots have facilitated self-disclosure [106, 107] and promoted self-compassion [104], crucial components of personal well-being and behavioral change interventions.
Our investigation is based on existing research surrounding the design and implementation of conversational agents for behavior change and well-being [30]. Using LLMs for chatbots presents promising prospects and novel challenges [75]. LLMs also introduce ethical and safety concerns that surpass those associated with traditional chatbots, necessitating rigorous evaluations to understand their implications fully [44, 58, 199]. There are also questions regarding how users feel about social support delivered by an LLM.
2.5. Situating Work in Broader HCI & CSCW Literature
As we saw in previous sections, the current literature emphasizes the pivotal role of human support in facilitating wellness practices, particularly in self-directed exercises (Section 2.1 and 2.2). Within the CSCW community, there is a growing emphasis on understanding the needs and preferences of people who pursue self-directed wellness and the development of supportive technologies [89, 101, 129, 132]. Human facilitators in various formats, from personal coaches available in popular mindfulness apps to instructors leading group sessions, have significantly improved engagement with interventions and outcomes [60, 105, 134]. These facilitators offer personalized guidance, emotional support, and accountability, aspects crucial to the success of such interventions [60]. Despite their effectiveness, human-led support faces limitations in cost, scalability, and accessibility [3, 10, 102, 133]. In contrast, LLM-based agents, which can be made available everywhere and at a lower cost, emerge as a promising alternative. These agents can provide continuous judgment-free social support without time and location constraints and maintain user anonymity and privacy, making them a practical solution for wide-scale implementations [85, 118, 127, 196, 200].
Trained on extensive datasets, LLMs have the potential to adapt their guidance to various user contexts and challenges, a capability that is still being explored and refined [33]. They can also support people with social anxiety or who, for various reasons, are uncomfortable interacting with human coaches about their well-being [122]. LLMs can serve as informative guides [92], help users acquire new skills by offering precise and detailed explanations that reflect the user’s specific contexts and concerns [204], and resolve uncertainties about effectively carrying out wellness practices [119, 199]. They can provide proactive check-ins, inquire about progress, and assist in planning future wellness activities, thus promoting accountability. Furthermore, their ability to engage in empathetic and supportive dialogues positions them as a ‘friendly presence,’ offering encouragement and reflection [122].
We investigate the capabilities of LLM agents within the context of behavior change interventions through a large-scale single-session study and a longitudinal field deployment. Our investigation focuses on two forms of LLM agents: an informational agent, designed to provide relevant knowledge and guidance, and a reflection-promoting agent, designed to facilitate self-reflection. These functions mirror the crucial roles traditionally fulfilled by human coaches in guiding people through behavior change processes [15, 134].
3. Formative Study Design
We conducted our formative study with 502 participants recruited from Mechanical Turk [86, 165] to understand user intentions regarding use of LLM agents for mindfulness. We asked users about their perceptions of the strengths and weaknesses of the tutorial video and the two LLM agents. The study was conducted in January 2023 and was approved by the ethics board of the local university. Figure 1 summarizes the overall design of our formative study. The methodologies and rationale behind the study design are elaborated in the following sections.
Fig. 1.

Design of formative study. All participants received the core mindfulness message. Participants were then randomly assigned to engage or not (in this order, one after another) with (1) Information Chatbot, which provided information related to Mindfulness. (2) Instructional Video, which guided the participants to practice mindful breathing. (3) Reflection chatbot, which involved participants in a self-reflection exercise to reflect on their understanding of mindfulness. The 2 (Information Chatbot: Present vs Absent) × 2 (Instructional Video: Present vs Absent) × 2 (Reflection Chatbot: Present vs Absent) factorial design allowed us to explore user perspectives and intentions for different combinations of interaction techniques to enhance the core mindfulness message.
3.1. Information Chatbot
(Using LLM to Provide Information on Mindfulness)
Since the release of ChatGPT, there has been a surge in the use of LLMs for information retrieval. LLMs have been increasingly integrated into web search platforms [173], such as You.com1 and PerplexityAI2. However, exploring LLM-based search engines’ implications and capabilities in facilitating behavior change remains limited. In our study, we employed Information Chatbot as a design probe to provide mindfulness-related information to participants via dialogues. Drawing from existing literature on LLM prompts [39, 96], educational theories [37], and extensive preliminary testing, we designed a specific system prompt for the GPT-3 model:
The following is a conversation with a Mindfulness instructor. The instructor teaches and provides information about different mindfulness activities to the Human. The instructor explains different activities clearly and provides examples wherever possible. The instructor has a sense of humor, is fair, and is empathetic.
In addition to the system prompt, the model began with an initial greeting to establish the conversational context: “Hello. I am an AI agent designed to act as your Mindfulness instructor. I can answer any questions you might have related to Mindfulness. How can I help you?”. To guide participants in engaging with the chatbot, we provided example queries such as “Can you help me learn about Mindfulness?”. A typical interaction (taken from this study) between a participant and the chatbot is depicted in Figure 1.
3.2. Instructional/Tutorial Video for Mindful Breathing
The instructional video we used was taken from a previous study validated by experts [186], focusing on mindfulness-based practices. It introduced participants to mindfulness meditation and included exercises emphasizing the internal awareness of bodily sensations or breath in a curious and non-evaluative manner. We specifically selected mindful breathing for its foundational role in mindfulness practices.
3.3. Reflection Chatbot
(Using LLM to Aid in Mindfulness-Related Self-Reflection)
In addition to providing informational support, LLMs can facilitate self-reflection and planning for post-learning activity practice3. Research indicates that students’ learning outcomes improve when they reflect on their metacognitive strategies and beliefs [152, 195]. Similarly, self-reflection has been recognized as an effective mechanism to encourage behavior change [63, 100, 150]. However, the role of LLMs in enhancing self-reflection to change behavior remains relatively unexplored. To investigate this, we used LLM2 (Reflection Chatbot) as a probe to assist participants in reflecting on their understanding of mindfulness and scheduling their daily mindfulness exercises. With GPT-3, we configured this chatbot with a system prompt analogous to that used with Information Chatbot:
The following is a conversation with a Mindfulness instructor. The instructor asks open-ended reflection questions to the Human to solidify the Human’s understanding of Mindfulness and helps them plan when they can practice Mindfulness in their daily lives. The instructor has a sense of humour, is fair, and empathetic.
Additionally, we incorporated an initial greeting into the bot’s prompt, “Hello. I am an AI agent designed to act as your Mindfulness instructor. I am here to help you reflect on your learnings. How can I help you?”. To facilitate the start of the conversation, we provided sample questions to participants such as “Can you help me plan when I can practice Mindfulness?”. An illustrative dialogue between a participant (part of the study) and the bot is presented in Figure 1.
3.4. Participants
866 individuals responded to our Qualtrics survey. Post-attention check filtering resulted in 502 legitimate responses. Table 1 displays the diverse demographic profile of the participants, with a significant majority (93.12%) having prior experience in mindfulness. Participants were randomly assigned to one of eight experimental conditions (2×2×2). They dedicated up to 20 minutes to designated activities and another 10 minutes to subsequent follow-up questions.
Table 1.
Demographic information of the participants in the Formative study.
| Total Participants = 502 | |
|---|---|
| Age | 18–25 (8.17%), 25–34 (58.56%), 35–44 (18.92%), 45–54 (10.56%), 55–64 (3.19%), 65+ (0.60%) |
| Gender | Male (57.57%), Female (43.43%) |
| Employment | Full-time (95.82%), Part-time (1.79%), Unemployed and looking for work (0.40%), A homemaker or stay-at-home parent (0.40%), Student (0.40%), Retired (1.19%) |
| Marital Status | Married/cohabitant (79.89%), Partner but not living together (0.40%), Single/divorced (19.12%), Other (0.60%) |
| Education | Some high school, no diploma (1.39%), High School graduate (2.99%), Some college credit with no degree (2.59%), Associate degree (1.59%), Bachelor’s degree (68.92%), Master’s degree (20.12%), Professional degree (1.79%), Doctorate degree (0.60%) |
| Annual Income | Less than $10K (2.19%), 10–49K (33.07%), 50–99K (55.58%), 100K+ (9.16%) |
| Race | White/Caucasian (86.45%), Black/African American (1.79%), American Indian/Native American or Alaska Native (2.39%), Asian (8.17%), Native Hawaiian or Pacific Islander (0.20%), Prefer not to say (0.40%) |
| Mindfulness Experience | Yes (93.12%), No (6.88%) |
3.5. Study Procedure
Participants navigated to a Qualtrics4 survey, where they were briefed on the study objectives: exploring and evaluating various mindfulness introduction techniques to foster well-being. Upon understanding the purpose of the study, the participants gave their informed consent. Participants reported their experience with mindfulness and were then introduced to the Core Mindfulness Message (Figure 1). The message described the concept of mindfulness and its benefits. This message served as a proxy for the type of mindfulness descriptions that one might typically find on social media platforms or news articles.
Subsequently, the participants were randomly assigned to one of the 2 (Information Chatbot: Present vs. Absent) × 2 (Tutorial Video: Present vs. Absent) × 2 (Reflection Chatbot: Present vs. Absent) experimental conditions (depicted in Figure 1). The order of the bots was not randomized because their nature was linked to their role in mindfulness practice (preparation vs. integration/reflection). They interacted and provided qualitative feedback for each assigned component, one after the other. After interaction with designated conditions, participants were asked to express their likelihood of engaging in mindfulness practices, rated on a scale from 1 (not likely at all) to 7 (extremely likely), with the prompt “How likely are you to practice Mindfulness after undergoing these learning exercises?”. Furthermore, to deepen our understanding of their experience, we posed several open-ended questions, after each bot/video, and after the entire task, such as:
In what contexts would you like to interact with the bot again (or watch the video) when practicing Mindfulness?
What did you like or did not like about the interaction with the bot (or about the contents of the video)?
What else could we have shown or said to you to increase your chances of practicing Mindfulness?
The questions were conditioned on the participant’s assigned condition. We concluded with demographic questions captured in Table 1. To ensure the quality of our data, we used several attention checks throughout the survey to identify and exclude inattentive respondents.
3.6. Ethical Considerations
Conducting research in the domain of mental health inherently presents various ethical considerations. A primary concern is that experimental interventions might cause negative emotional responses, either as a direct result of introspective processes, inadvertently through misunderstandings of the material presented, or through inaccurate responses generated by LLMs. To minimize these risks, we proactively cautioned crowdworkers about potentially encountering unsettling content and communicated their option to discontinue participation at any point. Additionally, we provided mental health support resources, such as contact information for the National Suicide Prevention Lifeline, at the beginning of the study.
3.7. Data Analysis
The qualitative data analysis was carried out by two researchers employing the open coding method [31, 32], which involved a detailed examination of the responses to identify significant patterns and concepts. Subsequently, thematic analysis was applied to distill these patterns into clear, recurring themes [22]. The process included familiarization with the data, generating initial codes, identifying recurring themes, and defining and naming these themes. The example themes include the benefits, challenges, and use cases of practicing mindfulness with chatbots and videos. The researchers engaged in two rounds of collaborative discussions to ensure an unbiased analysis, focusing on validating the coding process and the identified themes.
The composition of our research team reflects a broad spectrum of perspectives, which we acknowledge may influence the interpretation and direction of our findings. The gender representation within the team is approximately balanced. Team members hail from various cultural backgrounds, including North America, eastern Europe, the global south, and Asia. Educational levels among the team range from undergraduate students to doctoral students and professors. The areas of research represented include computer science, psychology, education studies, communication, and behavior change.
To analyze the Likert-style ratings, we conducted a factorial ANOVA with interactions to evaluate the impact of the information chatbot, tutorial video, and reflection chatbot on participants’ self-reported intentions to practice mindfulness. Further, we conducted exploratory post-hoc comparisons of different conditions using the Tukey method, adjusting for multiple comparisons.
4. Formative Study Findings
4.1. Intentions for Practicing Mindfulness Across Different Conditions (RQ1)
Figure 2 compares the intentions of the participants to practice mindfulness in various conditions, including the control group. The results of ANOVA indicated a significant main effect of Information Chatbot (F (1, 494) = 5.489, p < .05) suggesting that the interaction with the information chatbot affected the participants’ intention to practice mindfulness. Similarly, Tutorial Video (F (1, 494) = 5.506, p < .05) and Reflection Chatbot also had significant main effects (F (1, 494) = 7.603, p < .01).
Fig. 2.

Plot depicting the mean intention to practice mindfulness again after engaging in the given exercise(s) on a scale of 1 (not likely at all) to 7 (extremely likely) for all conditions. Each data point represents the mean intention to practice mindfulness again among individuals assigned to the condition, and error bars represent +- one standard error of the mean. Contrasts are calculated using estimated marginal means with ANOVA and significant relationships (p < 0.05) are displayed. * : p ≤ 0.05, *** : p≤ 0.001 (adjusting the p-value using the Tukey method to compare a family of 8 estimates.)
No significant interaction effects were found between Information Chatbot and Tutorial Video (F (1, 494) = 0.458, p = .50), Information Chatbot and Reflection Chatbot (F (1, 494) = 0.524, p = .47), or Tutorial Video and Reflection Chatbot (F (1, 494) = 2.252, p = .13). Additionally, the three-way interaction between Information Chatbot, Tutorial Video, and Reflection Chatbot was not significant (F (1, 494) = 1.509, p = .22).
4.2. User Perspectives on Different Modes of Interaction
4.2.1. Instructional Video.
Most participants shared their willingness to watch the video to relieve stress, concentrate, and practice mindfulness. Participants mentioned social contexts (such as before a work presentation) and specific times in their daily routines when the video would be helpful. This includes contexts throughout the day, such as in the morning, before starting work, after finishing work, and before bedtime.
Participants expressed a positive response to the video. They appreciated the calm and quiet atmosphere in the video, devoid of distracting music, and with soothing vocal tones. Many found the video instrumental for focusing and relaxation.
“I liked the way she used her voice to give me instructions on what I should be doing. I also liked how she gave me things to look for. Also giving me tips on how to fall into mindfulness. I wish the video was longer!”
(P16)
However, a few participants (N=3) provided negative feedback regarding their willingness to rewatch a specific video on mindful breathing, citing reasons such as a lack of perceived need for repetition due to already experiencing the benefits of deep breathing, dissatisfaction with the video’s effectiveness in alleviating stress, and a sense of familiarity with the mindfulness technique, making further viewings unnecessary.
4.2.2. Information Chatbot
.
Participants found the information provided by the chatbot beneficial for practicing mindfulness, especially in stressful situations, and when seeking relaxation, focus, and comfort. The willingness to use chatbots for acquiring knowledge, assistance, and guidance in enhancing mindfulness practice was also discussed.
Participants favored the informational chatbot for its clear, straightforward, and detailed responses, appreciating the fact-based and unbiased nature of the information. The bot’s non-judgmental tone was also highly valued. However, some participants noted occasional irregularities in the information chatbot’s responses. Specifically, repetition of the same definition or response could lead participants to perceive the interaction as mechanical and artificial. P393 and P114 illustrated these mixed sentiments:
“I liked that it had fast responses that totally made sense, and were open and vague enough that it was easy to understand, even if I didn’t know anything about mindfulness. I didn’t like that it felt like perhaps the bot’s knowledge was limited, and that it might have repeated the phrase ‘whatever works for you’ more than twice.”
(P393)
“The instructions from the bot were quite straightforward. However, I don’t see the need to interact with it again. I’m uncertain if repeating the interaction would offer more mindfulness insights; it might only serve to redirect my focus to breathing exercises during stressful moments. Interacting with the bot in such instances could be beneficial for guiding my breathing exercise.”
(P114)
In summary, while the informational chatbot received praise for its informative and non-judgmental demeanor, occasional repetitions and perceived limitations in its responses left some participants questioning its effectiveness in providing ongoing mindfulness support. Section A.1.1 shows an example conversation from the study.
4.2.3. Reflection Chatbot
.
The reflection chatbot was discussed similarly to the informational chatbot, catering to the desire for more information, understanding, improvement, and measurement of mindfulness, while also addressing doubts and setting intentions.
The reflection chatbot mirrored the detailed responses of the information chatbot but provided a more human-like experience. Participants appreciated when the interaction created a sense of “conversing with a human,” characterized by clear explanations and sensible responses. For example, P172 described:
“I liked the conversation very much. It provided me with a clear understanding of mindfulness, addressing all my doubts and solidifying my perspective on mindfulness. Overall, the bot was pleasant to chat with, resembling a human-like interaction.”
(P172)
Moreover, participants appreciated the chatbot’s ability to grasp specific situations and contexts:
“It seemed to quickly and effectively answer my questions without some of the typical aggravations of interacting with chatbots…I can’t think of many negatives other than talking to an actual person would probably be more ideal.”
(P17)
“The chatbot answered my questions and tried to guide me through Mindfulness breathing exercises. The (reflection) chatbot felt less repetitive than the earlier (informational chatbot). I didn’t like that I couldn’t follow along without keeping my eyes open to read the chatbot’s messages.”
(P450)
In conclusion, participants appreciated the chatbot’s ability to tailor responses to their needs, although some noted limitations in the interactive experience. Section A.1.2 shows a conversation with Reflection Chatbot from the study.
4.3. Implications for Deployment Study
To enhance the experience with the Information Chatbot, participants emphasized the importance of friendliness and desired more casual conversation.
“If the bot could engage with me about my thoughts and feelings regarding mindfulness, like a friend or counselor. Additionally, if I encountered problems with mindfulness, expressing them to the chatbot could lead to potential solutions.”
(P393)
“The bot’s impressive level of responsiveness and friendliness stood out. It was polite, and provided prompt, precise responses, but I wished for more small talk or casual conversation.”
(P76)
Moreover, participants valued the chatbot’s non-judgmental approach when discussing various topics beyond mindfulness.
“I appreciated the comprehensive and helpful answers. The responses felt professional and thorough, delivered swiftly without judgment, even for what I perceived as ‘stupid questions.’”
(P461)
The findings of the formative study have crucial implications for our subsequent deployment study. First, while LLMs alone may not sufficiently motivate regular mindfulness practice, they have the potential to enhance the benefits derived from instructional videos. Our findings suggest benefits for both chatbots as well as the instructional video, which is consistent with the existing literature [14, 66]. However, the Information Bot was perceived as limited in its casual conversation, and the participants proposed integrating a friendly demeanor and improving its casual interaction. They also wanted to interact with the chatbot at different times of the day. These findings informed the design of two distinct types of LLM-based conversational agents for our next study:
One aimed at facilitating self-reflection following the mindfulness video.
Another offering information in a friendly and supportive manner that could be accessed throughout the day.
Our formative study shed light on user intentions for employing LLMs in mindfulness practices. However, as previous research indicates, such insights must extend beyond the theoretical to include real-world applications to understand how intentions align with actual user behavior [154, 167]. This holds even greater weight for interventions involving LLMs, where existing empirical data often stem from small-scale user studies or controlled laboratory settings. The field lacks extensive, real-world longitudinal research in Human-LLM collaboration, marking a crucial area for the HCI community to explore and expand upon.
5. Deployment Study Design
Building on the insights from the formative study, we conducted a 3-week deployment study where participants received email reminders to practice mindfulness. The design of the study was 2 (Sociable Information LLM: present vs. absent) × 2 (Reflection LLM post-video: present vs. absent). Figure 3 shows a concise summary of the design of the intervention.
Fig. 3.

Design of the deployment study. Participants were reminded via email every other day, over a period of three weeks, to engage in mindfulness exercises, totaling 10 reminders (A) Each reminder email contained a link to the mindfulness exercise interface. Additionally, half of the participants, selected randomly, were given information and a distinct link to interact with Mindy, the sociable information chatbot (shown in C). (B) Clicking on the mindfulness exercise interface link led participants to the study’s initial stage (see Figure 4), where they were randomly assigned to either receive only the tutorial video for exercises or to experience the tutorial video followed by interaction with the reflection chatbot. The study’s structure was a 2 × 2 factorial design, varying the presence of the Sociable Information LLM (Present vs Absent) and the Reflection LLM post-video (Present vs Absent).
5.1. Design of Email Reminders and Sociable Information LLM Agent for Mindfulness Exercises
We opted for email as the delivery method for mindfulness exercises and resources due to its proven effectiveness in behavior change [24, 80, 139]. Email interventions offer several advantages: they are widely accessible, allow personalized content, support various formatting options, and can be systematically scheduled [202]. Additional benefits include their ability to link directly to resources, easy tracking, and the potential for interactive elements [203]. In this study, participants received email reminders every other day for three weeks.
5.1.1. Email Reminders.
We designed our reminders to include a personalized link to the mindfulness exercise for each participant (Figure 3A). When participants clicked on this link, they were directed to a preliminary questionnaire to gauge their current stress and mindfulness levels before commencing each session (Figure 3B).
5.1.2. Sociable Information LLM Agent (
–> ‘Mindy the Friendly Dinosaur’).
Our formative study indicated that participants may prefer to access the information chatbot at various times throughout the day to support their mindfulness journey. We provided access to this chatbot as a standalone web interface linked in the reminder emails (Figure 3C). Previous research has highlighted the positive impact of integrating a social component into behavior change interventions, suggesting that such additions can enhance effectiveness [134]. Motivated by this evidence, we explored whether an LLM could serve a similar role in providing support, potentially increasing user engagement, and thus the overall efficacy of the intervention (Figure 3C). To assess this, we randomly assigned half of our study’s participants access to the sociable information chatbot. This enabled us to compare engagement levels and effectiveness between those with and without chatbot assistance.
To create the sociable information chatbot, we used GPT-4, configured with a specific system prompt (see appendix, Figure 11 for details). Existing literature suggests that overly human-like agents can evoke feelings of discomfort [163]. Hence, the bot was personified as a dinosaur named “Mindy” with sentences in the system prompt such as “You are Mindy, a mindfulness instructor represented as a friendly and wise Microceratus dinosaur. Mindy specializes in guiding individuals through mindfulness practices…”. The design and characteristics of Mindy were informed by insights from our formative study in which participants expressed a desire to engage in “small talk” and casual and friendly conversations with the Information Chatbot. We also incorporated a prompt to make Mindy check whether participants had completed their assigned Mindfulness exercise for the day with “She inquires if the participant has completed today’s mindfulness exercise in the provided interface, encouraging them with playful nudges if not.”
Formative study feedback emphasized the need to improve the chatbot’s casual interaction capabilities and make it feel friendlier. To address this, we explored prior research on persona design in AI coaching and digital mental health. Studies suggest that social presence and relatability are key predictors of engagement, and that a carefully crafted persona can enhance user experience and adherence to wellness interventions [111, 112]. Given concerns raised in the literature about the uncanny valley, where highly human-like virtual agents can evoke discomfort and reduced engagement, we opted for a playful, non-human persona instead [64, 163]. Research indicates that fictional or animal-based personas are perceived as more approachable and less unsettling than near-human designs [124, 137]. Furthermore, studies on mental health chatbots (e.g., Wysa) show that users feel more comfortable discussing emotions with non-human, playful avatars, perceiving them as less judgmental and more supportive than human-like alternatives [18, 79, 183].
5.2. Design of Mindfulness Exercise Interface and Post-Exercise Reflection LLM
The importance of participating in self-reflection after mindfulness exercises was a prominent finding in our formative study, underscored by self-reported intentions and qualitative feedback from participants. Unlike the Information Chatbot, where participants suggested improvements to enhance casual interaction, feedback on the Reflection LLM was generally positive, and no areas of major revision were identified. Given this strong endorsement, we retained its design with minimal changes in the deployment study. We divided the participants into two groups to empirically evaluate the benefits of this self-reflection guided by LLMs. One group was assigned a Reflection LLM designed to facilitate a reflective process. The other group proceeded directly to post-exercise questions without this reflective step. Figure 4.1 illustrates the instructional video all participants viewed. Following the video, half were directed to the Reflection LLM interface (Figure 4.2), while the rest moved straight to the concluding questions. The configuration details of the Reflection LLM and the system prompt are described in the Appendix (Figure 10). We included instructions in the system prompt such as “Your role is to reinforce their understanding of mindfulness concepts presented in the video and encourage them to plan their own mindfulness practice…” and “Celebrate their plans and intentions for future mindfulness practice, and offer support for any challenges they anticipate.”
Fig. 4.

Mindfulness Exercise Interface. Participants were taken to this interface by clicking the exercise link in the email. Initially, they responded to questions about their current stress levels and mental state. (1) They were then directed to a 10-minute instructional video on mindfulness. Each participant was shown a video randomly selected from a pool of six, each focused on teaching a proven mindfulness exercise. (2) After watching the video, half of the participants, randomly selected at the beginning of the study, were given the option to engage in a dialogue with the Reflection LLM agent.
5.3. Participants
We recruited 54 participants, through a combination of word-of-mouth and snowball sampling; we refer to them as D1-D54. These participants did not have to meet any inclusion criteria other than being 18+ and having a general interest in learning about mindfulness to improve their psychological well-being. Participants were told that we were looking for their feedback on our overall design to inform the future implementation of mindfulness programs. Table 2 shows a summary description of the participants in our deployment study.
Table 2.
Demographic information of the participants in the Formative study.
| Total Participants = 54 | |
|---|---|
| Age | 18–24 (24.53%), 25–34 (49.05%), 35–44 (9.43%), 45–54 (3.77%), 55–64 (1.88%), 65+ (9.43%) |
| Gender | Male (47.16%), Female (43.39%) |
| Employment | Full-time (22.64%), Part-time (1.88%), Unemployed and looking for work (1.88%), Student (62.26%), Retired (1.88%) |
| Education | Some high school, no diploma (1.88%), High School graduate (1.88%), Some college credit with no degree (1.88%), Bachelor’s degree (30.18%), Graduate or professional degree (54.71%) |
| Prior Mindfulness Skill | 1 (Beginner) (33.96%), 2 (24.52%), 3 (18.86%), 4 (5.66%), 5 (5.66%), 6 (1.88%), 7 (Master) (1.88%) |
5.4. Study Procedure
Participants were recruited during the first two weeks of December 2023. They were provided with a link to an onboarding form, which included a consent form, questions about their demographics and prior experience with mindfulness, and a series of psychological measurement scales extensively used in prior mindfulness research.
Mental state measures (pre- and post-intervention).
We collected measures related to well-being that were often measured in previous mindfulness studies. These were collected at the beginning and end of the study. The measurement scales included the Perceived Stress Scale (PSS) [36], the Positive and Negative Affect Scale (PANAS) [187], and the Five-Facet Mindfulness Questionnaire (FFMQ-15) [8]. Each scale offered insight into different aspects of mental state and mindfulness effectiveness. Precisely, PSS measures perceived stress, helping to gauge the level of stress reduction after mindfulness practices. PANAS assesses the participant’s mood states, providing a view of emotional changes. In contrast, FFMQ-15 assesses the multidimensional aspects of mindfulness, which is critical to understanding how mindfulness facets develop and change through the intervention.
At the beginning of the study, participants were randomly assigned to one of the four conditions of the 2 (Reflection LLM after Video: Present vs Absent) × 2 (Sociable Information LLM: Present vs Absent) design. The same condition was consistent throughout the study for each participant, i.e. if a participant was assigned to condition Tutorial Video Only with a Sociable Information Chatbot, they received links to both in each of the reminders throughout the study. The email reminders included a link to the exercise platform and a link to Mindy (if participants were assigned to the Mindy condition). Figure 3 shows the schematic of the introductory email and the follow-up email reminders sent to participants. We scheduled email reminders to be sent every other day for three weeks. This resulted in 10 reminders sent to each of the 54 participants, which is 540 reminders throughout the study.
Session-level measures (collected before and after each session).
Upon accessing the main interface via the provided link, participants were directed to a questionnaire. Participants reported two session-level measures before and after each session to track changes in their mental state. These measures were based on existing empirically validated measures used in prior mindfulness interventions [53, 57]:
Stress: Assessed using the stress slider [53] rating from 0 (no stress) to 10 (extreme stress). Participants were asked to slide to the number that accurately reflected the level of stress they experienced at that moment.
Decentering: Measured using two elements of the Mindfulness Experience Questionnaire (EQ) [57]: “I have the sense that I am fully aware of what is going on around me and inside me” and “I view things from a wider perspective”. Respondents were asked to indicate their agreement on a scale from 1 (strongly disagree) to 5 (strongly agree), reflecting their ability to maintain a distanced perspective on their thoughts and feelings. Decentering is defined as the ability to observe one’s thoughts and feelings, and is often used to gauge the depth of mindfulness and awareness in participants.
We could not add more comprehensive scales related to mindfulness before and after each session as this would have caused significant discomfort or dissatisfaction amongst participants, resulting in a lowering of the engagement rate with exercises, which is the focus of this study.
After the 3-week study, participants completed a post-intervention survey that included the same mindfulness-related measures that they had completed at the start of the study, along with open-ended questions to gather feedback, which included, but were not limited to:
What did you like or did not like about your interaction with [Mindy/Reflection Chatbot]? What could have been improved?
How frequently did you engage with the program’s materials, and were there specific moments or contexts where you felt more inclined to use them?
What enhancements or additional elements do you think could further improve our program’s effectiveness in facilitating mindfulness? For example, you may comment on the quality of resources provided, content, and frequency of reminders.
5.5. Data Analysis
We pre-registered5 the quantitative analysis for the deployment study. For the study-level measures (with baseline and post-intervention measures), we performed an Analysis of Covariance (ANCOVA) with baseline scores as covariates for the PSS, PANAS and FFMQ-15 scores to assess the main effects of the two modes of LLM interaction (reflection chatbot after video and a sociable information chatbot). Due to the limited sample size, we did not look at possible interactions between the experimental conditions. Analysis was performed in R using the aov function for each outcome measure. 20 of 54 participants completed the off-boarding survey. We include only these 20 participants in the pre-post intervention measures analysis.
We performed the Mann-Whitney U test with a significance level of 0.05 for pairwise comparisons, such as for engagement rates and session-level measures. Data from the 54 participants were included in this analysis. We share the complete analysis code and anonymized data for replication in the supplementary materials. We also make the interface code open-source to allow the development of future video-based LLM wellness interventions6.
We used the same method described in Section 3.7 for qualitative analysis. Still, we focused on the following topics: how the tool contributed to reaching a mindful state, suggestions to improve the chatbot’s effectiveness in facilitating mindfulness, satisfaction, and limitations encountered with the program.
6. Deployment Study Findings
6.1. Engagement
Engagement with our mindfulness program was quantified by tracking whether participants initiated and completed their assigned exercises, which were provided in their reminder emails. Differences in engagement, influenced by the availability of the sociable information LLM, Mindy, are shown in Figure 5a. Additionally, Figure 5b displays engagement trends relative to the assignment of the reflection LLM agent after each session.
Fig. 5.

Participant Engagement Rates in Deployment Study Across Different Intervention Conditions. Engagement is defined as whether or not a participant was engaged in a particular activity for a particular reminder. Error bars are +- one standard error of mean.
As shown in Figure 5a, those who had access to Mindy demonstrated a 18% higher average rate of initiating exercises compared to those without Mindy (UMann-Whitney = 90, p < 0.01). This trend was particularly notable in the middle phase of the intervention. In contrast, initiation rates did not differ significantly between participants who were provided the Reflection LLM and those in the No Reflection LLM condition, as shown in Figure 5b. Similarly, the completion rates of daily exercises were 12% higher in the Mindy condition compared to the No Mindy group (UMann-Whitney = 96, p < 0.001). No significant differences were observed in the initiation and completion rates between the Reflection LLM and No Reflection LLM conditions.
6.2. Mental State and Mindfulness
6.2.1. Session-level Measures.
Figure 6 shows the trends in the session-level measures (collected before and after each mindfulness session). Figure 6a compares participants in the Mindy vs. No Mindy conditions. On average, participants in both conditions reported reduced stress and increased mental awareness after each session. Participants with access to Mindy began and ended each session with statistically significant reductions in stress (UMann-Whitney = 2028.5, p < 0.05) and increased awareness (UMann-Whitney = 1137.5, p < 0.05). They also experienced a significant enhancement in perspective (UMann-Whitney = 1155.0, p < 0.001). In contrast, participants without Mindy did not show a significant change in stress (UMann-Whitney = 289.0, p = 0.12) but did report a significant increase in awareness (UMann-Whitney = 138.0, p < 0.05) and perspective (UMann-Whitney = 105.0, p < 0.001).
Fig. 6.

Self-reported stress, awareness, and perspective ratings among participants in the different conditions, measured at the start and end of the mindfulness session.
Figure 6b compares the session-level measures between the participants in the Reflection LLM and the No Reflection LLM condition. Participants in the Reflection LLM condition did not show significant changes in stress (UMann-Whitney = 834.5, p = 0.14) or awareness (UMann-Whitney = 546.5, p = 0.10) and did not see improvement in perspective (UMann-Whitney = 648.0, p = 0.428). On the other hand, participants in the No Reflection LLM condition demonstrated significant improvements in all measured areas: stress (UMann-Whitney = 1122.0, p < 0.05), awareness (UMann-Whitney = 442.0, p < 0.001), and perspective (UMann-Whitney = 338.0, p < 0.001).
6.2.2. Pre- and Post-Intervention Measures.
FFMQ-15.
The ANCOVA measured the impact of baseline scores as covariates and two modes of LLM interaction on the FFMQ-15 mindfulness subscales: Observing, Describing, Acting with Awareness, Non-judgement, and Non-reactivity. Figure 7 shows the summary of measures obtained before and after the study. Baseline scores were significantly related to post-study scores for all subscales (p < 0.001). Presence of Mindy had a significant positive effect on the ‘observing’ and ‘describing’ aspects of mindfulness (p < 0.05). Reflection LLM did not show a significant effect on any aspects of the mindfulness scales. See Section B.1 for detailed test-statistics.
Perceived Stress Score (PSS).
Figure 8 shows the summary of the perceived stress score for both modes of LLM interactions before and after the intervention. Baseline stress levels were found to predict post-intervention stress significantly (F (1, 16) = 10.18, p < .01). The Reflection LLM condition showed a trend towards significance in reducing perceived stress levels (F (1, 16) = 3.74, p = .07), suggesting a potential beneficial effect that merits further exploration. In contrast, Mindy’s presence did not significantly affect post-intervention PSS scores (F (1, 16) = 0.002, p = .96).
PANAS.
Figure 9 shows the summary of positive and negative affect schedules obtained before and after the 3-week intervention. Neither Mindy nor Reflection LLM showed a significant effect on the positive and negative levels of the PANAS scale. Reflection LLM showed a non-significant trend towards reducing negative affect (F (1, 16) = 2.93, p = 0.10).
In summary, we did not find significant differences in well-being measures between conditions, except the effects of Mindy on increased ‘Observing’ and ‘Describing’ of one’s feelings, thoughts, and experiences. One caveat to these findings is that these self-reported measures were obtained from a minority of participants, and mainly from those who were actively engaged. The results should be interpreted accordingly.
6.3. Participants’ Feedback on Sociable Information and Reflection LLM Agents
6.3.1. Mindy: Sociable Information LLM Agent.
Participants had mixed feelings about their interactions with Mindy. Many appreciated its intelligence and how it facilitated a deeper understanding of mindfulness by encouraging users to form a coherent chain of thoughts. The open-minded responses and positive feedback provided by Mindy were highlighted as strengths, particularly its non-judgmental stance towards users’ opinions and practices.
“Very open minded response, lots of positive feedback which was nice, it never said that your opinions or practices were wrong but also provided reasons for your actions and gave you suggestions.”
(D11)
However, some areas for improvement were noted. Technical issues arose when users did not respond to its questions, leading to the chatbot generating strange messages. Furthermore, the lack of memory capability to recall previous conversations was seen as a drawback. Additionally, a sense of tedium crept into interactions due to repeated content and similar chat experiences across multiple sessions, mainly when the initial prompts suggested by the chatbot were used. This repetition limited the utility of the chatbot for some users.
Enhancements such as incorporating a wider variety of content and enabling the chatbot to remember past interactions could significantly improve the user experience with Mindy: “Remembering conversations will be a good improvement. Rest was good.” (D3). Section A.3.2 shows a sample conversation with Mindy that one of the participants had in the study.
6.3.2. Reflection Chatbot.
Some participants found the questions posed by the Reflection LLM helpful. They said, “The questions at the end of the exercise were good at making me reflect on the experience and what I felt during the exercise. Otherwise, I’d just go on with my day without noticing if doing mindfulness was beneficial or not.” (D31).
However, most participants shared negative experiences with the reflection chatbot, highlighting a range of issues from the chatbot’s tendency to ask repetitive questions, leading to a tedious user experience, to merely echoing users’ responses without deepening the conversation.
“I found engaging with the chatbot to be interesting, but not all that helpful. It mainly mirrored what I had said.”
(D16)
The chatbot was sometimes perceived to have a ‘patronizing self-help tone,’ and users reported that repetitive responses made interactions feel predictable and less engaging. Additionally, the lack of open-ended conversations and the chatbot’s inability to remember previous interactions were significant drawbacks, preventing the development of a continuous, evolving dialogue and limiting the potential for meaningful reflection and engagement.
Participants shared insights on areas for improvement regarding the reflection chatbot. To enhance user engagement, Participant D2 suggested introducing a broader array of conversation starters to improve the overall user experience.
“I would have preferred a more dynamic chatbot since it felt a little tedious to interact with, especially near the end of the study. I think having more suggested conversation starters (I used the recommended one each time) would also help improve the experience, since I usually didn’t know what I wanted to discuss with the bot. Alternatively, having the bot start the conversation could also help (though that could be perceived as annoying to some people).”
(D2)
Section A.2.2 shows complete sample conversations that participants had with this Reflection LLM chatbot.
6.4. Suggestions for Program Improvement
Participants offered insights into potential enhancements to improve the mindfulness program. Suggestions primarily focused on improving the accessibility and usability of the program. Participants highlighted the need for a more mobile-friendly user interface (D16), shorter video durations to enhance manageability (D26), and adaptive tools that can tailor the program to individual user habits and preferences (D1, D23).
Some participants stressed the importance of user-centric design and customization to improve engagement and effectiveness in mindfulness practice. For example, some participants (N = 6) suggested incorporating features like reminders aligned with users’ preferred practice times and customizable notification frequencies. Furthermore, participants (N = 5) emphasized the importance of improved customization, such as personalized reminders and resources tailored to their stress levels and preferences.
“I think it might be useful to learn about when the users would like to practice mindfulness within a day. For example, I tend to do the exercise once off work, after 5 pm. The email reminder usually came in the morning, and sometimes when I was off work, I forgot about the email as it is no longer at the top of my email list. If the reminder could be sent at the right time, I could start the practice right away…”
(D13)
This shows the importance of tailoring support tools to individual schedules and preferences, with timely reminders potentially enhancing user engagement and adherence.
7. Discussion
We sought to understand how support from LLM agents can enable the formation of mindfulness habits. In this section, we discuss critical insights synthesized from our studies in the context of previous research, provide implications for LLM-augmented behavior change interventions, and highlight ethical considerations for providing social support through LLM-enabled systems.
7.1. User Perceptions of LLM Agents for Support
Throughout the formative study, users reported that the Information Chatbot was helpful in their mindfulness practice, especially in moments of stress and when seeking relaxation or focus. However, the users desired more friendly and supportive communication. In the deployment study, we saw that a sociable information LLM agent was indeed effective in improving engagement and well-being. This indicates that chat LLMs (such as ChatGPT, Gemini, and Claude) can be perceived more favorably for do-it-yourself wellness practices by incorporating social skills (such as the ability to engage in small talk).
Across all LLM chatbots, users preferred a non-judgmental tone, consistent with prior research showing that this tone can promote self-disclosure [7, 77], especially in the context of chatbots [104]. Our findings suggest that perceptions of LLM chatbots formed in single sessions can be misleading. Reflection chatbot was perceived favorably in the formative study, with participants recognizing its ability to understand and adapt to specific situations. As we transitioned to the deployment study, where interactions with LLMs were more prolonged, we observed a shift in user perceptions. Participants often had less favorable experiences with the reflection chatbot, criticizing its tendency to ask repetitive questions and provide echoing responses. They expressed a desire for the chatbot to have memory of past interactions, as has been done in other LLM health interventions [76, 206].
7.2. Impact of LLM Agents on User Intention to Practice (RQ1)
The findings of our formative study indicate that participants interacting with the video and both types of LLM agents—one offering information and the other facilitating reflection—reported a greater intention to engage in mindfulness practices than those in the control condition. This suggests that LLM agents can potentially elevate participants’ initial motivation levels regardless of function. These agents can effectively influence user intentions, a critical step in the behavior change process [142]. Our work contributes to the long line of research in improving participants’ intentions to practice wellness exercises [93, 116, 167].
Participants preferred different types of agent interaction depending on their immediate needs. For example, users found the information-providing chatbot more beneficial when they sought relaxation, focus, and comfort, highlighting the contextual flexibility of LLM agents. Although the accuracy of the information provided by LLMs has been a significant concern within the health community, our study suggests that focusing solely on accuracy may overlook the broader potential of LLMs. These agents can serve as a medium for users to “talk things out.” The act of reflection and articulation itself can be therapeutic and supportive, reducing the pressure on LLMs to deliver perfectly accurate content [45, 52]. This aligns with the idea that engaging with an LLM agent can be intrinsically valuable, providing a sounding board for users to clarify their thoughts and reinforce their intentions to adopt healthier behaviors.
7.3. Effect of LLM Agents on Engagement (RQ2)
The engagement-enhancing effects observed with Mindy, the sociable information LLM chatbot, which offered a fun and casual form of social interaction, could be attributed to its potential to increase intrinsic motivation, a core concept in Self-Determination Theory [46, 140, 168]. According to this theory, when people engage in inherently enjoyable activities that align with their interests and values, they experience increased intrinsic motivation. Mindy’s informal and engaging dialogue mimics the positive dynamics of peer support, which has been extensively documented to increase engagement in health interventions [19, 48, 134]. Mindy’s role could have expanded from being a mere informational tool to becoming akin to a peer-like companion, contributing to the participants’ willingness to engage with the mindfulness exercises.
Furthermore, the timing of Mindy’s interaction, offered before participants begin the mindfulness exercise, and its constant availability may serve as an immediate reward when they launch the exercise. This immediacy is critical to fostering a habit loop, where the prompt of a Mindy interaction leads to the routine of starting the exercise, followed by the reward of a pleasant chatbot experience. This instant gratification can effectively nudge participants towards the desired action, which in this context is the initiation of mindfulness practice [192–194]. This mechanism aligns with behavioral theories that emphasize the power of immediate rewards in shaping and reinforcing new habits [49, 184], suggesting that an enjoyable element such as Mindy before a task can enhance the likelihood of initiation of the task.
Contrary to our initial hypotheses and the existing literature, it was surprising that Reflection LLM did not improve user engagement with the exercise interface. The literature posits that structured reflection post-activity can facilitate the habit formation process [123, 150]. However, our findings suggest that applying LLMs for this purpose is not straightforward. Participants in our formative study expressed a firm intention to continue practicing mindfulness after engaging in reflective conversations with an LLM (Section 4.1). Yet, during the deployment study, they perceived the Reflection LLM’s responses as repetitive and tedious (Section 6.3.2). A noted deficiency was the LLM’s lack of memory, which impeded the sense of continuity that users desired, disrupting the reflective process that is believed to be critical for consolidating new habits.
Improving the reflection process might require equipping the LLM with access to users’ prior interactions and behavioral data. Research in personal informatics suggests that communicating this interaction and behavioral data can effectively promote behavior change [43, 84, 109]. However, this raises significant privacy concerns, particularly when using closed LLMs such as GPT-4 [197], and presents challenges in ensuring that the emergent behaviors of the LLM are appropriate and beneficial to users [147]. It is also possible that reflections might benefit people farther into their behavior change journey [59, 109, 150]. Considering that many participants in our deployment study were novices in mindfulness with minimal prior experience, this could explain the ineffectiveness of Reflection LLM in driving engagement. Novice users likely need support focusing more on initial guidance and orientation - similar to what the Sociable Information LLM, Mindy, provided - rather than reflection on their limited experiences.
Reflection chatbots might be useful and perceived favorably due to dose/novelty effects if the users have never reflected on their mindfulness practice before. But it may not be something people need to do habitually. In contrast, information/friend bots might always be useful when questions arise or the user needs encouragement. In line with coaching frameworks that emphasize goal-oriented dialogue and accountability [145, 191], the LLM agents in our deployment study supported participants’ self-reflection and adherence to mindfulness exercises. However, professional coaching typically involves well-defined boundaries and structured guidelines–coaches are not ‘friends’ [145]. Our LLM chatbots (in particular, Mindy) blurred these lines, potentially enhancing user comfort but diverging from formal coaching standards, which may have implications for how users perceive and engage with the intervention.
7.4. Effect on Well-being Measures (RQ3)
Across all experimental conditions in our study, there was a notable improvement in stress and related measures, indicating the baseline intervention—comprising reminders and videos—may already be quite effective. Consequently, this high baseline efficacy could mask additional advantages that come from integrating LLM agents. The impact of LLM agents might be more discernible in a setting where the baseline intervention is less robust [120, 153, 182]. A significant limitation of our findings about well-being is that they pertain only to actively engaged participants. Due to the voluntary nature of participation, we lacked self-reported measures from less engaged participants, resulting in a pronounced survivor bias in our results.
For the effect of Reflection LLM, the lack of significant disparities in outcomes across different groups may be attributable to the phenomenon of “response shift bias” [23, 69]. Specifically, participants who engaged with the Reflection LLM might have offered a more nuanced and precise evaluation of their stress levels and well-being after the intervention. This suggests that the act of reflection prompts a recalibration of internal benchmarks, leading to more accurate self-assessment, which can appear as a lack of change when, in fact, there is a deepened self-awareness. An additional consideration is the potential mismatch between the contemplative nature of the conversations prompted by the Reflection LLM and the fundamental principles of mindfulness, which prioritize present-moment awareness over reflective thinking or future planning [17]. This conceptual divergence could suggest that, while reflection is valuable, its timing and context within the mindfulness practice framework require careful calibration to avoid disrupting the immediacy of the mindfulness experience.
The group with Mindy, the sociable information LLM, reported significantly enhanced ‘observing’ and ‘describing’ facets of mindfulness as measured by the FFMQ-15 scale. These improvements may stem from Mindy’s engagement with users in articulating their experiences, thereby refining their ability to observe and describe subtle aspects of these experiences. Research suggests that verbalizing thoughts can improve cognitive processing and perception, improving observational skills [9, 55, 160]. Furthermore, supportive and non-judgmental interaction with Mindy likely provides a conducive environment for users to express and explore their thoughts more freely [104], which is crucial for developing mindfulness skills [8, 78, 157, 169].
Consistent with prior studies in digital mindfulness and mental health, our findings show that chatbots can lead to measurable improvements in well-being outcomes. For example, an RCT of the rule-based Woebot chatbot demonstrated significantly greater reductions in depression two weeks later compared to an information-only control [54], highlighting that even short-term automated interventions can positively affect mental health outcomes [54]. Although most of the previous studies of mental health chatbots have focused on retrieval-based or rule-based approaches [34, 108], our study suggests that the benefits of chatbots can extend to LLM chatbots. Future work is needed to understand whether LLM or rule-based chatbots have similar effects and mechanisms of benefit, or how best to combine these approaches.
7.5. Implications for LLM-Augmented Interventions
Engagement is essential for the efficacy of AI-enhanced interventions in various domains, especially where user participation is crucial but typically low. For example, in educational technology, interaction with AI tutors has historically faced challenges maintaining learner engagement, as exemplified by IBM’s struggles to develop an engaging AI tutor with Watson [181]. This issue goes beyond the accuracy of AI responses or the comprehensiveness of the information provided. For example, a recent analysis highlighted engagement with popular LLM applications like ChatGPT as ‘lackluster,’ with only about 14% of monthly active users engaging daily, a stark contrast to the 60–65% daily engagement rates seen with social media platforms [71]. This suggests that AI agents lack the “stickiness” or ongoing user appeal that other platforms manage to sustain.
Our findings suggest that human-like qualities in LLM agents, such as the ability to engage in small talk, can significantly enhance user interaction and engagement, as demonstrated by increased involvement with video tutorials in our studies (for the group having access to Mindy). This insight has implications for other fields (e.g. mental health [94, 114], environmental campaigns [26], financial management apps [40], and training programs [74]), suggesting that AI “coaches” designed to help with challenging tasks or learning could benefit from adopting more human-like behaviors. Apps such as Character.ai7, which boasts a stickiness score of 41% by allowing users to interact with historical or fictional figures, underscore the appeal of AI agents with human traits [71, 122]. Developers can align the LLM with human behavior to increase engagement, as has been done for other AI systems (e.g., chess engines [126]). However, while imbuing AI agents with human-like qualities may enhance engagement, it also raises potential risks and ethical concerns related to over-anthropomorphizing these agents [38, 125], which can lead to inappropriate reliance on these agents.
One such risk is users becoming overly reliant on the chatbot for social support, potentially undermining their self-sufficiency in behavior change interventions. Notably, Prochaska and Velicer [151] hypothesize that when individuals depend heavily on counselors for social support and monitoring, they may struggle to sustain behaviors once this external influence is withdrawn. Interestingly, this dependency effect was less pronounced with an expert dialogue system compared to a human coach, raising questions about how more human-like or anthropomorphized LLM-based interventions might inadvertently foster reliance and reduce long-term autonomy. These residual effects of LLM-based interactions could be studied further by examining user behavior post-intervention, as done in studies in other domains involving human–LLM interactions [11, 97].
7.6. Ethical and Safety Considerations for Collaborating with LLMs for Behavior Change
Incorporating LLM-based agents into behavior change intervention introduces new ethical and safety considerations [121]. These considerations require careful design and research attention, distinct from earlier technologies. Based on the ethical principles of the Belmont [56] and Menlo [50] reports—principles that are derived from centuries of philosophical exploration of Consequentialism [171] and Deontology [4] — our design process prioritized the autonomy and welfare of individuals. In the specific context of LLMs that offer support, users must be fully aware that they are interacting with an LLM and consent to this engagement. Recently, a company provided mental health support through GPT-3 to thousands without user awareness or consent, drawing significant online condemnation [47, 149].
However, transparency alone does not address all ethical concerns. For example, the persuasive nature of LLMs could lead users to form emotional dependencies [99] or experience negative consequences, such as impacts on real-life relationships or being led towards self-harm [198]. A risk/benefit analysis is imperative for researchers and designers to maximize benefits and minimize potential harms when deploying LLMs for social support. Mitigating these risks involves strategic design choices and rigorous testing, potentially in simulated environments [147], with the involvement of domain experts, before wide-scale deployment. In our study, we chose not to endow the LLM agent with memory to reduce the risk of dependency. Although some participants viewed memory as a desirable feature for continuity, we prioritized risk reduction. Furthermore, the integrity of the information provided by health-related chatbots is critical, especially given the potential for LLMs to disseminate misinformation or “hallucinate” responses [41, 205]. Our approach involved setting clear expectations with participants about the experimental nature of the tool and advising them to verify any information provided by the LLMs. Furthermore, proper guardrails should be developed to mitigate harm to users [35, 170], and appropriate mechanisms should be in place to involve experts (such as clinical psychologists) whenever these guardrails fail [42, 67].
Despite the challenges, the potential benefits of successfully deploying LLMs are immense, given the widespread need for social support. Our findings demonstrate the ability of a simple LLM agent to encourage mindfulness practice, suggesting broader applicability in domains such as mental health where engagement is crucial [28, 134]. As we move forward, it is essential to selectively deploy these systems, ensuring they are not introduced to extremely vulnerable populations prematurely, while still providing access to those who could benefit most from the advancements.
7.7. Limitations & Future Work
One primary limitation of our study is the lack of validation of the responses generated by the LLM agents. In the case of information chatbots, it will be important to assess whether these agents present low-quality, misleading, or even fabricated information related to mindfulness. While these issues are less likely to occur in the case of the Reflection LLM, other potential issues could reflect the ‘sycophantic’ nature of models like GPT-4 [166], which could lead to inappropriate behaviors, such as validating or affirming users’ negative thoughts. Therefore, future work should include mechanisms to verify the accuracy of LLM outputs to prevent the dissemination of incorrect information.
Another limitation is the presence of order effects in the formative study. We did not randomize the bots’ order because their nature was linked to their role in mindfulness practice (preparation vs. integration/reflection). Future studies should examine the effects of varying the order of engagement (e.g., prompting reflection before the mindfulness exercises) to better understand how LLM agents can influence user engagement and outcomes. There is an additional possible bias in the deployment study, as Mindy can be used regardless of whether the users completed the exercise that day. However, the Reflection LLM would only be used by those who completed the exercise. So, our sample for the Reflection LLM agent may over-represent those more adherent to the exercise. The sequential design of our experiments (formative followed by deployment) facilitated iterative refinements of the study design. Due to these refinements, there are significant differences between the two studies, including variations in participant populations, the models used, and the frequency of interactions. These discrepancies introduce complexities in the comparison of the findings of each study. Our formative study did not yield extensive design improvements on reflection-inducing agents, limiting our ability to optimize their interaction style. More careful studies should be conducted to systematically explore user preferences and refine the design of reflection-based LLM chatbots, ensuring they better align with diverse reflection needs [6, 98, 172].
Our research focused on two types of LLM agents: one that provided information and the other that asked questions. Other interaction modalities (such as motivational or emotional support) could also positively impact user engagement and warrant exploration in future studies. The limited sample size and voluntary participation in our deployment study restrict the robustness of our conclusions with respect to measures of mental well-being such as the Perceived Stress Scale (PSS). The reliance on data from only actively engaged participants introduces a survival bias, highlighting the need for larger, more diverse samples in subsequent research. Additionally, participants reported latency issues with LLM responses due to unstable API connections, potentially affecting their initial impressions of the agents.
8. Conclusion
Forming habits to engage in wellness exercises is inherently challenging, often due to inadequate social support. In this work, we used mindfulness as a case study to explore the potential of LLM agents to support and collaborate in behavior change journeys. Our formative study indicated that agents that offered information or facilitated reflection increased participants’ intentions to practice mindfulness. However, participants expressed a desire for deeper social interactions. In the subsequent longitudinal deployment study, we found that a sociable information LLM agent significantly improved engagement with mindfulness exercises. In contrast, the Reflection LLM agent did not demonstrate notable improvements.
Integrating LLMs into well-being technologies represents a promising direction for supporting successful behavior change, offering more engaging and supportive interactions. Moreover, at a time when loneliness is increasing, and there is a growing unmet need for social support, LLM-based agents show promise. However, relying on AI agents for social support is not without risks. Over-dependence could negatively affect human relationships, and the overuse of such agents raises important ethical and safety concerns. These technologies should be designed to augment rather than replace human connections.
CCS Concepts: • Human-centered computing → Empirical studies in HCI; Empirical studies in collaborative and social computing; • Applied computing → Health care information systems.
Acknowledgments
We thank Dr. Norman Farb, Yiyi Wang, Ilya Musabirov, Snehal Prabhudesai, Lillio Mok, and Jessica Bo for thoughtful discussion and feedback. We also thank Xinyuan Wang for help with data visualizations. This work was supported by grants from the National Institute of Mental Health (K01MH125172, P50MH119029), the Office of Naval Research (N00014-18-1-2755, N00014-21-1-2576), National Science Foundation (2209819) and the Natural Sciences and Engineering Research Council of Canada (RGPIN-2019-06968). In addition, we acknowledge a gift from the Microsoft AI for Accessibility program to the Center for Behavioral Intervention Technologies that, in part, supported this work (http://aka.ms/ai4a). We also acknowledge Azure OpenAI credits support through Microsoft Foundation Models Research Program which enabled the deployment of LLM chatbots in this work.
Biography
HARSH KUMAR, University of Toronto, Toronto, ON, Canada
computational social science

A. Appendix
A.1. Example Conversations from Formative Study
A.1.1. Information Chatbot.
Hello - can you explain to me what mindfulness is please?
Mindfulness is the quality or state of being conscious or aware of something. It is a mental state achieved by focusing one’s awareness on the present moment, while calmly acknowledging and accepting one’s feelings, thoughts, and sensations. Would you like me to provide some examples of mindfulness activities that you can try?
yes please
There are many different mindfulness activities that you can try in order to help you focus your awareness on the present moment. Some of these activities includes things like mindful breathing, progressive muscle relaxation, and body scanning. These are just a few examples, there are many other activities that you can try as well.
what is body scanning?
Body scanning is an activity where you focus your awareness on each different part of your body, starting from your toes and working your way up to your head. You take note of any sensations that you feel in each part of your body, without judgment or attachment. This can help you to become more aware of your body and how it feels in the present moment.
how is this helpful?
One of the main goals of mindfulness is to focus your attention on the present moment, without judgment or attachment. Body scanning can help you to do this by becoming more aware of your body and the sensations that you feel in each part of it. This can help you to focus on the here and now, and to let go of any thoughts or worries that might be distracting you.
which is more beneficial body scanning or progressive muscle relaxation?
There is no right or wrong answer to this question, as both body scanning and progressive muscle relaxation can be beneficial in different ways. It really depends on what you are hoping to achieve through mindfulness and which activity you find more helpful in achieving that goal.
can mindfulness reduce blood pressure?
There is some evidence to suggest that mindfulness can help to reduce blood pressure. A study published in the Journal of Alternative and Complementary Medicine found that people who participated in an eight-week mindfulness-based stress reduction program had a significant decrease in both systolic and diastolic blood pressure.
can mindfulness also help to remove toxins from the body?
There is some evidence to suggest that mindfulness can help to reduce levels of toxins in the body. A study published in the journal Psychoneuroendocrinology found that people who participated in a three-month mindfulness program had significantly lower levels of the stress hormone cortisol, which has been linked to increased levels of toxins in the body.
how do you suggest a beginner start practicing?
If you’re a beginner, there are a few things that you can do in order to start practicing mindfulness. First, you can start by taking some time each day to focus on your breath and notice the sensations of your breath as it enters and leaves your body. You can also try progressive muscle relaxation, which involves tensing and relaxing different muscle groups in your body. Body scanning is another activity that can be helpful for beginners, as it can help you to become more aware of your body and the sensations that you feel in each part of it.
do I need an app or music to aid with this or just a quiet place?
There is no right or wrong answer to this question, as everyone is different and will find different things helpful in practicing mindfulness. Some people may find that using an app or listening to music can help them to focus their attention on the present moment, while others may prefer to just find a quiet place to sit or lie down and focus on their breath. ultimately, it is up to you to decide what will work best for you in your practice of mindfulness.
A.1.2. Reflection Chatbot.
What situations in life could I use mindfulness?
You can use mindfulness in any situation where you want to be more present and aware. For example, you might use mindfulness when you’re eating, walking, or interacting with others. When would you like to try using mindfulness?
I think it could be good in stressful situations. I think it could help ease the stress and calm my nerves.
That sounds like a great idea! Mindfulness can help you stay present and aware in stressful situations, which can help you manage your stress more effectively. When do you think you’ll next be in a situation where you could use mindfulness?
I have to take my husband to a physical therapy session, which is hard on him. Do you think it would help in that situation?
Mindfulness can definitely help in that situation! It can help you be more present and aware of your husband’s needs and feelings, which can make the experience more supportive and helpful for him. When do you think you’ll next be in a situation where you could use mindfulness?
Perhaps when I have a meeting with my boss.
That’s a great idea! Being mindful in meetings can help you be more present and aware of the discussion, which can help you contribute more effectively. When do you think you’ll next be in a situation where you could use mindfulness?
Well, what about when I am around loud children? Is that a situation that it wouldn’t necessarily be helpful?
There is no right or wrong answer when it comes to using mindfulness – it is entirely up to you and what you feel would be helpful in any given situation. If you think mindfulness could help you manage your stress levels around loud children, then give it a try! There is no harm in doing so.
I wonder if it could help ease a headache when I have one from stress?
There is no right or wrong answer when it comes to using mindfulness – it is entirely up to you and what you feel would be helpful in any given situation. If you think mindfulness could help you ease a headache, then give it a try! There is no harm in doing so.
When do you use mindfulness?
I use mindfulness any time I want to be more present and aware. For example, when I’m interacting with others or working on a task. When would you like to try using mindfulness?
A.2. Reflection LLM Chatbot in Deployment Study
A.2.1. LLM Model Specification.
model name: gpt-4
model version: 0613
date of use: December 2023 - January 2024
Configuration Settings:
temperature: 0
max tokens: 3925
top-p: 0
frequency penalty: 0.05
presence penalty: 0.1
Figure 10 shows the system prompt used for the Reflection LLM chatbot in the deployment study.
A.2.2. Sample Conversations From Deployment Study.
Participant found the video exercise helpful and then engaged with the bot:
Hello. I am an AI agent designed to act as your Mindfulness instructor. I am here to help you reflect on your learnings. How can I help you?
Can you help me reflect on my understanding of mindfulness?
Absolutely, I’d be thrilled to help you reflect on your understanding of mindfulness! Let’s start with the video you just watched. What were some key points or concepts that really stood out to you? And how did you feel about the content? Remember, there are no wrong answers here, just like there’s no wrong way to eat a Reese’s.
I quite appreciated the approach towards negative thoughts where I can simply acknowledge them and let them be, since I find that directly trying to address them doesn’t seem to work.
That’s a fantastic takeaway! It’s like trying to tell a cat not to knock things off the table - it just makes them want to do it more, right? But in all seriousness, acknowledging negative thoughts without judgment is a key aspect of mindfulness. It allows us to observe our thoughts and feelings without getting swept away by them.
I last did it 2 days ago.
That’s great! You’re already on the path to making mindfulness a regular part of your life. It’s like brushing your teeth, but for your mind. How long did you spend on your mindfulness practice during that session?
I spent 10 minutes.
That’s a solid start! Ten minutes of mindfulness is like a mini-vacation for your brain. Now, let’s think about the future. When do you plan to practice mindfulness next? And what kind of mindfulness activity do you plan to do?
Either tomorrow or in a few days - I’d like to get into the habit of doing it regularly but I’m going to try to get used to the exercises slowly.
That sounds like a wise plan! It’s like starting a new workout routine, you don’t want to jump straight into running a marathon, right? Starting slow and gradually increasing your mindfulness practice can help make it a sustainable part of your routine.
Fig. 7.

Mindfulness subscales as measured by FFMQ-15, pre- and post-intervention across different conditions (scale of 1 to 5). Error bars represent +- one standard error.
Fig. 8.

Average perceived stress scores (PSS) measured pre- and post-intervention. Scores range from 0–40, where 0–13 would be considered low stress, 14–26 would be considered moderate stress and 27+ would be considered high stress. Error bars represent +- one standard error.
Fig. 9.

Average Positive and Negative Affect Schedule scores measured pre- and post-intervention. Scores range from 10–50. Error bars represent ± one standard error of mean.
Participant did not like the video and later expressed concerns regarding it to the chatbot:
Hello. I am an AI agent designed to act as your Mindfulness instructor. I am here to help you reflect on your learnings. How can I help you?
can you help me reflect on my understanding of mindfulness?
Absolutely, I’d be thrilled to help you reflect on your understanding of mindfulness! Let’s start with the video you just watched. What were some key points or concepts that really stood out to you? And how did you feel about the content overall?
it felt a little too long and the audio was slightly to sharp sounding
Oh, I see! It sounds like the video was a bit like a meditation session with a gong that’s just a tad too loud, huh? But hey, kudos to you for sticking with it! Despite the length and the sharp audio, were there any key points or concepts that you managed to pick up?
it prompted to imaging a flowing river, but i’ve always had a hard time with imagining images in my head so it was harder for me to focus on trying to imagine
Ah, the old “imagine a flowing river” trick. It’s like trying to imagine a purple elephant when you’ve only ever seen grey ones, right? But don’t worry, not everyone finds visualization easy. Mindfulness is all about finding what works best for you.
i think having calming sounds help, but also being in the right headspace to do it
Absolutely, being in the right headspace is like finding the perfect spot on the couch for a Netflix marathon - it can make all the difference!
i usually like to think for 30 mins before bed or during a 30 min run
Ah, a thinker and a runner! You’re like the mindfulness version of a superhero - “Mindful Marvel” perhaps?
i plan on trying to go on a 1 mile run everyday and reflect during that time
That sounds like a fantastic plan! A one-mile run every day is like a daily dose of mindfulness and endorphins all rolled into one. It’s like a healthy smoothie for the mind and body!
Fig. 10.

System Prompt for the Mindfulness Reflection Chatbot (not visible to participants).
A.3. Sociable Information Chatbot (‘Mindy’) from Deployment Study
A.3.1. LLM Model Specification.
model name: gpt-4
model version: 0613
date of use: December 2023 - January 2024
Configuration Settings:
temperature: 0
max tokens: 3925
top-p: 0
frequency penalty: 0.05
presence penalty: 0.1
Figure 11 shows the system prompt used for Mindy in the deployment study.
A.3.2. Sample conversations from deployment study.
Hello, who are you?
Hello there, fellow earthling! I’m Mindy, your friendly Microceratus dinosaur and mindfulness instructor. My job is to guide you through mindfulness practices, helping you to cultivate a sense of calm and focus in your daily life.
Sure let’s get started
Fantastic! Let’s get this Jurassic journey started. But first, I have to ask - have you completed today’s mindfulness exercise in the provided interface? If not, no worries at all. Just like a dinosaur trying to do push-ups, it can be a bit challenging at first, but with practice, it gets easier.
B. Detailed Statistical Analysis
B.1. Deployment Study
B.1.1. FFMQ-15.
Results from ANCOVA:
Observing: Baseline scores showed a significant effect (F (1, 16) = 30.15, p < .001). The presence of Mindy also demonstrated a significant effect (F (1, 16) = 4.55, p < .05). Reflection LLM did not have a significant effect (F (1, 16) = 0.13, p = .73).
Describing: Significant effect of the baseline scores (F (1, 16) = 19.82, p < .001) and the presence of Mindy (F (1, 16) = 6.6, p < .05). Reflection LLM did not have a significant effect (F (1, 16) = 2.15, p = .16).
Acting with Awareness: Significant effect of baseline scores (F (1, 16) = 21.05, p < .001). Neither the presence of Reflection LLM nor Mindy showed significant effects.
Non-judgement: Baseline scores had a significant effect (F (1, 16) = 40.16, p < .001). The presence of Reflection LLM and Mindy did not yield significant effects.
Non-reactivity: Baseline scores significantly affected the Non-reactivity subscale (F (1, 16) = 12.33, p < .01). Neither Reflection LLM nor Mindy’s presence had significant effects.
Fig. 11.

System Prompt for Mindy, the Mindfulness Instructor Dinosaur (not visible to participants).
B.1.2. PANAS.
Results from ANCOVA are summarized as follows:
Positive Affect: The effect of baseline positive affect scores on post-intervention outcomes approached significance, suggesting a potential predictive value of initial positive affect levels on the effectiveness of the interventions (F (1, 16) = 4.07, p = 0.06). Neither the Reflection LLM (F (1, 16) = 2.16, p = 0.16) nor Mindy’s presence (F (1, 16) = 0.40, p = 0.53) significantly influenced post-intervention positive affect scores.
Negative Affect: Baseline negative affect scores had a significant effect on post-intervention outcomes, indicating that initial levels of negative affect are strong predictors of intervention success (F (1, 16) = 6.77, p < 0.05). Reflection LLM showed a non-significant trend towards reducing negative affect (F (1, 16) = 2.93, p = 0.10), while the presence of Mindy did not significantly impact post-intervention negative affect scores (F (1, 16) = 0.03, p = 0.85).
Footnotes
Note on Terminology: Throughout this paper, we use the terms “Reflection LLM,” “Reflection Chatbot,” and “Reflection Agent” interchangeably to refer to our LLM-based chatbot that asks reflective questions to support user self-reflection, rather than merely providing direct answers.
Contributor Information
HARSH KUMAR, University of Toronto, Canada.
SUHYEON YOO, University of Toronto, Canada.
ANGELA ZAVALETA BERNUY, McMaster University, Canada.
JIAKAI SHI, University of Toronto, Canada.
HUAYIN LUO, University of Toronto, Canada.
JOSEPH JAY WILLIAMS, University of Toronto, Canada.
ANASTASIA KUZMINYKH, University of Toronto, Canada.
ASHTON ANDERSON, University of Toronto, Canada.
RACHEL KORNFIELD, Northwestern University, USA.
References
- [1].Abd-Alrazaq Alaa Ali, Rababeh Asma, Alajlani Mohannad, Bewick Bridgette M, and Househ Mowafa. 2020. Effectiveness and safety of using chatbots to improve mental health: systematic review and meta-analysis. Journal of medical Internet research 22, 7 (2020), e16021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Achiam Josh, Adler Steven, Agarwal Sandhini, Ahmad Lama, Akkaya Ilge, Aleman Florencia Leoni, Almeida Diogo, Altenschmidt Janko, Altman Sam, Anadkat Shyamal, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023). [Google Scholar]
- [3].Agapie Elena, Colusso Lucas, Munson Sean A, and Hsieh Gary. 2016. Plansourcing: Generating behavior change plans with friends and crowds. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 119–133. [Google Scholar]
- [4].Alexander Larry and Moore Michael. 2007. Deontological ethics. (2007). [Google Scholar]
- [5].Alrobai Amen, Dogan Huseyin, Phalp Keith, and Ali Raian. 2018. Building online platforms for peer support groups as a persuasive behavior change technique. In Persuasive Technology: 13th International Conference, PERSUASIVE 2018, Waterloo, ON, Canada, April 18–19, 2018, Proceedings 13. Springer, 70–83. [Google Scholar]
- [6].Arakawa Riku and Yakura Hiromu. 2024. Coaching copilot: blended form of an LLM-powered chatbot and a human coach to effectively support self-reflection for leadership growth. In Proceedings of the 6th ACM Conference on Conversational User Interfaces. 1–14. [Google Scholar]
- [7].Audet Cristelle T and Everall Robin D. 2010. Therapist self-disclosure and the therapeutic relationship: A phenomenological study from the client perspective. British Journal of Guidance & Counselling 38, 3 (2010), 327–342. [Google Scholar]
- [8].Baer RA, Smith GT, Lykins E, Button D, Krietemeyer J, Sauer S, Walsh E, Duggan D, Williams JMG, Gu J, et al. 2008. FFMQ-15: 15-item five-facet mindfulness questionnaire. Assessment 15 (2008), 329–342. [DOI] [PubMed] [Google Scholar]
- [9].Bandura Albert. 2012. Cultivate self-efficacy for personal and organizational effectiveness. Handbook of principles of organizational behavior: indispensable knowledge for evidence-based management (2012), 179–200. [Google Scholar]
- [10].Barras Colin. 2019. Mental health apps lean on bots and unlicensed therapists. Nature Medicine (2019). [Google Scholar]
- [11].Bastani Hamsa, Bastani Osbert, Sungu Alp, Ge Haosen, Kabakcı Ozge, and Mariman Rei. 2024. Generative ai can harm learning. Available at SSRN 4895486 (2024). [Google Scholar]
- [12].Baumel Amit, Muench Frederick, Edan Stav, and Kane John M. 2019. Objective user engagement with mental health apps: systematic search and panel-based usage analysis. Journal of medical Internet research 21, 9 (2019), e14567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Baumer Eric PS. 2015. Reflective informatics: conceptual dimensions for designing technologies of reflection. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 585–594. [Google Scholar]
- [14].Baumer Eric PS, Khovanskaya Vera, Matthews Mark, Reynolds Lindsay, Sosik Victoria Schwanda, and Gay Geri. 2014. Reviewing reflection: on the use of reflection in interactive system design. In Proceedings of the 2014 conference on Designing interactive systems. 93–102. [Google Scholar]
- [15].Behrendt Peter, Mühlberger Christina, Göritz Anja S, and Jonas Eva. 2021. Relationship, purpose, and change—An integrative model of coach behavior. Consulting Psychology Journal: Practice and Research 73, 2 (2021), 103. [Google Scholar]
- [16].Bentvelzen Marit, Woźniak Paweł W, Herbes Pia SF, Stefanidi Evropi, and Niess Jasmin. 2022. Revisiting reflection in hci: Four design resources for technologies that support reflection. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1–27. [Google Scholar]
- [17].Blanke Elisabeth S, Schmidt Mirjam J, Riediger Michaela, and Brose Annette. 2020. Thinking mindfully: How mindfulness relates to rumination and reflection in daily life. Emotion 20, 8 (2020), 1369. [DOI] [PubMed] [Google Scholar]
- [18].Blut Markus, Wang Cheng, Wünderlich Nancy V, and Brock Christian. 2021. Understanding anthropomorphism in service provision: a meta-analysis of physical robots, chatbots, and other AI. Journal of the Academy of Marketing Science 49 (2021), 632–658. [Google Scholar]
- [19].Boothroyd Renée I and Fisher Edwin B. 2010. Peers for progress: promoting peer support for health around the world. Family practice 27, suppl_1 (2010), i62–i68. [DOI] [PubMed] [Google Scholar]
- [20].Bostock Sophie, Crosswell Alexandra D, Prather Aric A, and Steptoe Andrew. 2019. Mindfulness on-the-go: Effects of a mindfulness meditation app on work stress and well-being. Journal of occupational health psychology 24, 1 (2019), 127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Bouton Mark E. 2014. Why behavior change is difficult to sustain. Preventive medicine 68 (2014), 29–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Braun Virginia and Clarke Victoria. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101. [Google Scholar]
- [23].Bray James H, Maxwell Scott E, and Howard George S. 1984. Methods of analysis with response-shift bias. Educational and Psychological measurement 44, 4 (1984), 781–804. [Google Scholar]
- [24].Brown Menna, Hooper Nic, James Phillip, Scott Darren, Bodger Owen, John Ann, et al. 2020. A web-delivered acceptance and commitment therapy intervention with email reminders to enhance subjective well-being and encourage engagement with lifestyle behavior change in health care staff: randomized cluster feasibility stud. JMIR formative research 4, 8 (2020), e18586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Brown Tom, Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared D, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. [Google Scholar]
- [26].Brulle Robert J. 2010. From environmental campaigns to advancing the public dialog: Environmental communication for civic engagement. Environmental Communication 4, 1 (2010), 82–98. [Google Scholar]
- [27].Burgess Eleanor R, Ringland Kathryn E, Nicholas Jennifer, Knapp Ashley A, Eschler Jordan, Mohr David C, and Reddy Madhu C. 2019. “ I think people are powerful” The Sociality of Individuals Managing Depression. Proceedings of the ACM on Human-computer Interaction 3, CSCW (2019), 1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Burns Jane and Birrell Emma. 2014. Enhancing early engagement with mental health services by young people. Psychology research and behavior management (2014), 303–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Calvaresi Davide, Calbimonte Jean-Paul, Dubosson Fabien, Najjar Amro, and Schumacher Michael. 2019. Social network chatbots for smoking cessation: agent and multi-agent frameworks. In IEEE/WIC/ACM International Conference on Web Intelligence. 286–292. [Google Scholar]
- [30].Cameron Gillian, Cameron David, Megaw Gavin, Bond Raymond, Mulvenna Maurice, Siobhan O’Neill Cherie Armour, and McTear Michael. 2018. Best practices for designing chatbots in mental healthcare–A case study on iHelpr. In Proceedings of the 32nd International BCS Human Computer Interaction Conference 32. 1–5. [Google Scholar]
- [31].Charmaz Kathy. 2006. Constructing grounded theory: A practical guide through qualitative analysis. sage. [Google Scholar]
- [32].Charmaz Kathy, Belgrave Liska, et al. 2012. Qualitative interviewing and grounded theory analysis. The SAGE handbook of interview research: The complexity of the craft 2 (2012), 347–365. [Google Scholar]
- [33].Chaszczewicz Alicja, Shah Raj Sanjay, Louie Ryan, Arnow Bruce A, Kraut Robert, and Yang Diyi. 2024. Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors. arXiv preprint arXiv:2403.15482 (2024). [Google Scholar]
- [34].Cho Young-Min, Rai Sunny, Ungar Lyle, Sedoc João, and Guntuku Sharath Chandra. 2023. An integrative survey on mental health conversational agents to bridge computer science and medical perspectives. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, Vol. 2023. 11346. [Google Scholar]
- [35].Clusmann Jan, Kolbinger Fiona R, Muti Hannah Sophie, Carrero Zunamys I, Eckardt Jan-Niklas, Laleh Narmin Ghaffari, Löffler Chiara Maria Lavinia, Schwarzkopf Sophie-Caroline, Unger Michaela, Veldhuizen Gregory P, et al. 2023. The future landscape of large language models in medicine. Communications Medicine 3, 1 (2023), 141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Cohen Sheldon, Kamarck Tom, Mermelstein Robin, et al. 1994. Perceived stress scale. Measuring stress: A guide for health and social scientists 10, 2 (1994), 1–2. [Google Scholar]
- [37].Cooper Bridget, Brna Paul, and Martins Alex. 2000. Effective affective in intelligent systems–building on evidence of empathy in teaching and learning. In International Workshop on Affective Interactions. Springer, 21–34. [Google Scholar]
- [38].Dacey Mike. 2017. Anthropomorphism as cognitive bias. Philosophy of Science 84, 5 (2017), 1152–1164. [Google Scholar]
- [39].Dang Hai, Mecke Lukas, Lehmann Florian, Goller Sven, and Buschek Daniel. 2022. How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. doi: 10.48550/ARXIV.2209.01390 [DOI] [Google Scholar]
- [40].Davtyan Arman. 2010. College students and personal finance: Exploring the relationships among financial well-being, money management practices, and engagement in personal finance education. University of California, Los Angeles. [Google Scholar]
- [41].De Angelis Luigi, Baglivo Francesco, Arzilli Guglielmo, Privitera Gaetano Pierpaolo, Ferragina Paolo, Tozzi Alberto Eugenio, and Rizzo Caterina. 2023. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Frontiers in Public Health 11 (2023), 1166120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].De Choudhury Munmun and Kiciman Emre. 2018. Integrating artificial and human intelligence in complex, sensitive problem domains: experiences from mental health. Ai Magazine 39, 3 (2018), 69–80. [Google Scholar]
- [43].De Choudhury Munmun, Kumar Mrinal, and Weber Ingmar. 2017. Computational approaches toward integrating quantified self sensing and social media. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 1334–1349. [Google Scholar]
- [44].De Choudhury Munmun, Pendse Sachin R, and Kumar Neha. 2023. Benefits and harms of large language models in digital mental health. arXiv preprint arXiv:2311.14693 (2023). [Google Scholar]
- [45].de Jong Steven, Hasson-Ohayon Ilanit, van Donkersgoed Rozanne, Aleman André, and Pijnenborg Gerdina Hendrika Maria. 2020. A qualitative evaluation of the effects of Metacognitive Reflection and Insight Therapy:’Living more consciously’. Psychology and Psychotherapy: Theory, Research and Practice 93, 2 (2020), 223–240. [Google Scholar]
- [46].Deci Edward L and Ryan Richard M. 2012. Self-determination theory. Handbook of theories of social psychology 1, 20 (2012), 416–436. [Google Scholar]
- [47].DeepLearning.AI. 2023. Bot therapy and informed consent discord’s Kokobot triggers an ethics controversy. https://www.deeplearning.ai/the-batch/discords-kokobot-triggers-an-ethics-controversy/
- [48].Dennis Cindy-Lee. 2003. Peer support within a health care context: a concept analysis. International journal of nursing studies 40, 3 (2003), 321–332. [DOI] [PubMed] [Google Scholar]
- [49].Dickinson Anthony and Pérez Omar D. 2018. Actions and habits: Psychological issues in dual-system theory. In Goal-directed decision making. Elsevier, 1–25. [Google Scholar]
- [50].Dittrich David, Kenneally Erin, et al. 2012. The Menlo Report: Ethical principles guiding information and communication technology research. Technical Report. US Department of Homeland Security. [Google Scholar]
- [51].Eysenbach Gunther et al. 2005. The law of attrition. Journal of medical Internet research 7, 1 (2005), e402. [Google Scholar]
- [52].Fang Hao, Cheng Hao, Sap Maarten, Clark Elizabeth, Holtzman Ari, Choi Yejin, Smith Noah A, and Ostendorf Mari. 2018. Sounding board: A user-centric and content-driven social chatbot. arXiv preprint arXiv:1804.10202 (2018). [Google Scholar]
- [53].Fiodorova Alexandra and Farb Norman. 2022. Brief daily self-care reflection for undergraduate well-being: a randomized control trial of an online intervention. Anxiety, Stress, & Coping 35, 2 (2022), 158–170. [DOI] [PubMed] [Google Scholar]
- [54].Fitzpatrick Kathleen Kara, Darcy Alison, and Vierhile Molly. 2017. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial. JMIR mental health 4, 2 (2017), e7785. [Google Scholar]
- [55].Foa Edna B, Huppert Jonathan D, and Cahill Shawn P. 2006. Emotional processing theory: An update. (2006).
- [56].United States. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. 1978. The Belmont report: ethical principles and guidelines for the protection of human subjects of research. Vol. 1. Department of Health, Education, and Welfare, National Commission for the …. [Google Scholar]
- [57].Fresco David M, Moore Michael T, van Dulmen Manfred HM, Segal Zindel V, Ma S Helen, Teasdale John D, and Williams J Mark G. 2007. Initial psychometric properties of the experiences questionnaire: validation of a self-report measure of decentering. Behavior therapy 38, 3 (2007), 234–246. [DOI] [PubMed] [Google Scholar]
- [58].Gabriel Iason, Manzini Arianna, Keeling Geoff, Hendricks Lisa Anne, Rieser Verena, Iqbal Hasan, Tomašev Nenad, Ktena Ira, Kenton Zachary, Rodriguez Mikel, et al. 2024. The Ethics of Advanced AI Assistants. arXiv preprint arXiv:2404.16244 (2024). [Google Scholar]
- [59].Gao Feng. 2012. Design for reflection on health behavior change. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces. 379–382. [Google Scholar]
- [60].Geerts Julia, de Wit Jan, and de Rooij Alwin. 2021. Brainstorming with a social robot facilitator: better than human facilitation due to reduced evaluation apprehension? Frontiers in Robotics and AI 8 (2021), 657291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Gouveia Rúben, Pereira Fábio, Karapanos Evangelos, Munson Sean A, and Hassenzahl Marc. 2016. Exploring the design space of glanceable feedback for physical activity trackers. In Proceedings of the 2016 ACM international joint conference on pervasive and ubiquitous computing. 144–155. [Google Scholar]
- [62].Graham Andrea K, Lattie Emily G, and Mohr David C. 2019. Experimental therapeutics for digital mental health. JAMA psychiatry 76, 12 (2019), 1223–1224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Grant Anthony M. 2001. Rethinking psychological mindedness: Metacognition, self-reflection, and insight. Behaviour Change 18, 1 (2001), 8–17. [Google Scholar]
- [64].Gray Kurt and Wegner Daniel M. 2012. Feeling robots and human zombies: Mind perception and the uncanny valley. Cognition 125, 1 (2012), 125–130. [DOI] [PubMed] [Google Scholar]
- [65].Greenfield Adam. 2017. Radical technologies: The design of everyday life. Verso Books. [Google Scholar]
- [66].Hieftje Kimberly, Edelman E Jennifer, Camenga Deepa R, and Fiellin Lynn E. 2013. Electronic media–based health interventions promoting behavior change in youth: a systematic review. JAMA pediatrics 167, 6 (2013), 574–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Holzinger Andreas. 2016. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Informatics 3, 2 (2016), 119–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].House James S, Umberson Debra, and Landis Karl R. 1988. Structures and processes of social support. Annual review of sociology 14, 1 (1988), 293–318. [Google Scholar]
- [69].Howard George S and Dailey Patrick R. 1979. Response-shift bias: A source of contamination of self-report measures. Journal of Applied Psychology 64, 2 (1979), 144. [Google Scholar]
- [70].Howells Annika, Ivtzan Itai, and Eiroa-Orosa Francisco Jose. 2016. Putting the ‘app’in happiness: a randomised controlled trial of a smartphone-based mindfulness intervention to enhance wellbeing. Journal of happiness studies 17 (2016), 163–185. [Google Scholar]
- [71].Huang Chin-Yuan, Yang Ming-Chin, Huang Chin-Yu, Chen Yu-Jui, Wu Meng-Lin, and Chen Kai-Wen. 2018. A chatbot-supported smart wireless interactive healthcare system for weight control and health promotion. In 2018 IEEE international conference on industrial engineering and engineering management (IEEM). IEEE, 1791–1795. [Google Scholar]
- [72].Huberty Jennifer, Green Jeni, Glissmann Christine, Larkey Linda, Puzia Megan, Lee Chong, et al. 2019. Efficacy of the mindfulness meditation mobile app “calm” to reduce stress among college students: Randomized controlled trial. JMIR mHealth and uHealth 7, 6 (2019), e14273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Ahmed Mahmoud Mohamed Hussien, Silpasuwanchai Chaklam, Niksirat Kavous Salehzadeh, and Ren Xiangshi. 2017. Understanding the role of human senses in interactive meditation. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 4960–4965. [Google Scholar]
- [74].Iacono Saverio, Vallarino Mario, and Vercelli Gianni. 2020. Gamification in corporate training to enhance engagement: An approach. International Journal of Emerging Technologies in Learning (iJET) 15, 17 (2020), 69–84. [Google Scholar]
- [75].Jo Eunkyung, Epstein Daniel A, Jung Hyunhoon, and Kim Young-Ho. 2023. Understanding the benefits and challenges of deploying conversational AI leveraging large language models for public health intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16. [Google Scholar]
- [76].Jo Eunkyung, Jeong Yuin, Park SoHyun, Epstein Daniel A, and Kim Young-Ho. 2024. Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention. arXiv preprint arXiv:2402.11353 (2024). [Google Scholar]
- [77].Jolley Helen Kathryn. 2019. I’m human too: Person-centred counsellors’ lived experiences of therapist self-disclosure. European Journal for Qualitative Research in Psychotherapy 9 (2019), 12–26. [Google Scholar]
- [78].Kabat-Zinn Jon. 2003. Mindfulness-based interventions in context: past, present, and future. (2003).
- [79].Kang Jin and Wei Lewen. 2020. Promises of anthropomorphism in virtual coaches: current research and future directions. In Proceedings of the 14th EAI International Conference on Pervasive Computing Technologies for Healthcare. 243–246. [Google Scholar]
- [80].Kaptein Maurits and Van Halteren Aart. 2013. Adaptive persuasive messaging to increase service retention: using persuasion profiles to increase the effectiveness of email reminders. Personal and Ubiquitous Computing 17 (2013), 1173–1185. [Google Scholar]
- [81].Karapanos Evangelos. 2015. Sustaining user engagement with behavior-change tools. Interactions 22, 4 (2015), 48–52. [Google Scholar]
- [82].Kelly Michael P and Barker Mary. 2016. Why is changing health-related behaviour so difficult? Public health 136 (2016), 109–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Kelly Robert B, Zyzanski Stephen J, and Alemagno Sonia A. 1991. Prediction of motivation and behavior change following health promotion: Role of health beliefs, social support, and self-efficacy. Social science & medicine 32, 3 (1991), 311–320. [DOI] [PubMed] [Google Scholar]
- [84].Kersten-van Dijk Elisabeth T, Westerink Joyce HDM, Beute Femke, and IJsselsteijn Wijnand A. 2017. Personal informatics, self-insight, and behavior change: A critical review of current literature. Human–Computer Interaction 32, 5–6 (2017), 268–296. [Google Scholar]
- [85].Kim Taewan, Bae Seolyeong, Hyun Ah Kim Su-woo Lee, Hong Hwajung, Yang Chanmo, and Kim Young-Ho. 2024. MindfulDiary: Harnessing Large Language Model to Support Psychiatric Patients’ Journaling. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–20. [Google Scholar]
- [86].Kittur Aniket, Chi Ed H, and Suh Bongwon. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the SIGCHI conference on human factors in computing systems. 453–456. [Google Scholar]
- [87].Kocielnik Rafal, Avrahami Daniel, Marlow Jennifer, Lu Di, and Hsieh Gary. 2018. Designing for workplace reflection: a chat and voice-based conversational agent. In Proceedings of the 2018 designing interactive systems conference. 881–894. [Google Scholar]
- [88].Kohl Leonie FM, Crutzen Rik, and de Vries Nanne K. 2013. Online prevention aimed at lifestyle behaviors: a systematic review of reviews. Journal of medical Internet research 15, 7 (2013), e146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [89].Kornfield Rachel, Zhang Renwen, Nicholas Jennifer, Schueller Stephen M, Cambo Scott A, Mohr David C, and Reddy Madhu. 2020. “ Energy is a Finite Resource”: Designing Technology to Support Individuals across Fluctuating Symptoms of Depression. In Proceedings of the 2020 CHI Conference on Human factors in Computing systems. 1–17. [Google Scholar]
- [90].Kosunen Ilkka, Ruonala Antti, Salminen Mikko, Simo Järvelä Niklas Ravaja, and Jacucci Giulio. 2017. Neuroadaptive meditation in the real world. In Proceedings of the 2017 ACM Workshop on An Application-oriented Approach to BCI out of the laboratory. 29–33. [Google Scholar]
- [91].Kosunen Ilkka, Salminen Mikko, Simo Järvelä Antti Ruonala, Ravaja Niklas, and Jacucci Giulio. 2016. RelaWorld: neuroadaptive and immersive virtual reality meditation system. In Proceedings of the 21st International Conference on Intelligent User Interfaces. 208–217. [Google Scholar]
- [92].Krause Markus, Garncarz Tom, Song JiaoJiao, Gerber Elizabeth M, Bailey Brian P, and Dow Steven P. 2017. Critique style guide: Improving crowdsourced design feedback with a natural language model. In Proceedings of the 2017 CHI conference on human factors in computing systems. 4627–4639. [Google Scholar]
- [93].Krinski Kleverton, Machado Daniel GS, Lirani Luciana S, DaSilva Sergio G, Costa Eduardo C, Hardcastle Sarah J, and Elsangedy Hassan M. 2017. Let’s walk outdoors! Self-paced walking outdoors improves future intention to exercise in women with obesity. Journal of Sport and Exercise Psychology 39, 2 (2017), 145–157. [DOI] [PubMed] [Google Scholar]
- [94].Kumar Harsh, Li Tong, Shi Jiakai, Musabirov Ilya, Kornfield Rachel, Meyerhoff Jonah, Bhattacharjee Ananya, Karr Chris, Nguyen Theresa, Mohr David, et al. 2024. Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 22906–22912. [Google Scholar]
- [95].Kumar Harsh, Musabirov Ilya, Reza Mohi, Shi Jiakai, Kuzminykh Anastasia, Williams Joseph Jay, and Liut Michael. 2023. Impact of guidance and interaction strategies for LLM use on Learner Performance and perception. arXiv preprint arXiv:2310.13712 (2023). [Google Scholar]
- [96].Kumar Harsh, Musabirov Ilya, Shi Jiakai, Lauzon Adele, Choy Kwan Kiu, Gross Ofek, Kulzhabayeva Dana, and Williams Joseph Jay. 2022. Exploring The Design of Prompts For Applying GPT-3 based Chatbots: A Mental Wellbeing Case Study on Mechanical Turk. doi: 10.48550/ARXIV.2209.11344 [DOI] [Google Scholar]
- [97].Kumar Harsh, Vincentius Jonathan, Jordan Ewan, and Anderson Ashton. 2024. Human Creativity in the Age of LLMs: Randomized Experiments on Divergent and Convergent Thinking. arXiv preprint arXiv:2410.03703 (2024). [Google Scholar]
- [98].Kumar Harsh, Xiao Ruiwei, Lawson Benjamin, Musabirov Ilya, Shi Jiakai, Wang Xinyuan, Luo Huayin, Williams Joseph Jay, Rafferty Anna N, Stamper John, et al. 2024. Supporting self-reflection at scale with large language models: Insights from randomized field experiments in classrooms. In Proceedings of the eleventh ACM conference on learning@ scale. 86–97. [Google Scholar]
- [99].Laestadius Linnea, Bishop Andrea, Gonzalez Michael, Illenčík Diana, and Campos-Castillo Celeste. 2022. Too human and not human enough: A grounded theory analysis of mental health harms from emotional dependence on the social chatbot Replika. New Media & Society (2022), 14614448221142007. [Google Scholar]
- [100].Laireiter Anton-Rupert and Willutzki Ulrike. 2003. Self-reflection and self-practice in training of cognitive behaviour therapy: An overview. Clinical Psychology & Psychotherapy: An International Journal of Theory & Practice 10, 1 (2003), 19–30. [Google Scholar]
- [101].Lattie Emily G, Kornfield Rachel, Ringland Kathryn E, Zhang Renwen, Winquist Nathan, and Reddy Madhu. 2020. Designing mental health technologies that support the social ecosystem of college students. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15. [Google Scholar]
- [102].Lattie Emily G, Stiles-Shields Colleen, and Graham Andrea K. 2022. An overview of and recommendations for more accessible digital mental health services. Nature Reviews Psychology 1, 2 (2022), 87–100. [Google Scholar]
- [103].Laurie James and Blandford Ann. 2016. Making time for mindfulness. International journal of medical informatics 96 (2016), 38–50. [DOI] [PubMed] [Google Scholar]
- [104].Lee Minha, Ackermans Sander, Nena Van As Hanwen Chang, Lucas Enzo, and IJsselsteijn Wijnand. 2019. Caring for Vincent: a chatbot for self-compassion. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. [Google Scholar]
- [105].Lee Yeoreum and Lim Youn-kyung. 2015. Understanding the roles and influences of mediators from multiple social channels for health behavior change. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 1070–1079. [Google Scholar]
- [106].Lee Yi-Chieh, Yamashita Naomi, and Huang Yun. 2020. Designing a chatbot as a mediator for promoting deep self-disclosure to a real mental health professional. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–27. [Google Scholar]
- [107].Lee Yi-Chieh, Yamashita Naomi, Huang Yun, and Fu Wai. 2020. “ I Hear You, I Feel You”: encouraging deep self-disclosure through a chatbot. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–12. [Google Scholar]
- [108].Li Han, Zhang Renwen, Lee Yi-Chieh, Kraut Robert E, and Mohr David C. 2023. Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. NPJ Digital Medicine 6, 1 (2023), 236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [109].Li Ian, Dey Anind K, and Forlizzi Jodi. 2011. Understanding my data, myself: supporting self-reflection with ubicomp technologies. In Proceedings of the 13th international conference on Ubiquitous computing. 405–414. [Google Scholar]
- [110].Li Jingjin and Leshed Gilly. 2022. Beyond Meditation: Everyday Mindfulness and Technology Use. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–6. [Google Scholar]
- [111].Li Qingchuan, Luximon Yan, and Zhang Jiaxin. 2023. The influence of anthropomorphic cues on patients’ perceived anthropomorphism, social presence, trust building, and acceptance of health care conversational agents: within-subject web-based experiment. Journal of medical Internet research 25 (2023), e44479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [112].Limpanopparat Sucharat, Gibson Erin, and Harris Andrew. 2024. User engagement, attitudes, and the effectiveness of chatbots as a mental health intervention: A systematic review. Computers in Human Behavior: Artificial Humans (2024), 100081. [Google Scholar]
- [113].Linardon Jake. 2023. Rates of attrition and engagement in randomized controlled trials of mindfulness apps: Systematic review and meta-analysis. Behaviour Research and Therapy (2023), 104421. [DOI] [PubMed] [Google Scholar]
- [114].Lipschitz Jessica M, Pike Chelsea K, Hogan Timothy P, Murphy Susan A, and Burdick Katherine E. 2023. The engagement problem: A review of engagement with digital mental health interventions and recommendations for a path forward. Current treatment options in psychiatry 10, 3 (2023), 119–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [115].Liu Jiaxin, Weitzman Elissa R, and Chunara Rumi. 2017. Assessing behavior stage progression from social media data. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1320–1333. [Google Scholar]
- [116].LOCK JEANQ and WISTER ANDREWV. 1992. Intentions and changes in exercise and behaviour: a life-style perspective. Health Promotion International 7, 3 (1992), 195–208. [Google Scholar]
- [117].Lukoff Kai, Li Taoxi, Zhuang Yuan, and Lim Brian Y. 2018. TableChat: mobile food journaling to facilitate family support for healthy eating. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–28. [Google Scholar]
- [118].Ma Zilin, Mei Yiyang, Long Yinru, Su Zhaoyuan, and Gajos Krzysztof Z. 2024. Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–15. [Google Scholar]
- [119].Ma Zilin, Mei Yiyang, and Su Zhaoyuan. 2023. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. In AMIA Annual Symposium Proceedings, Vol. 2023. American Medical Informatics Association, 1105. [PMC free article] [PubMed] [Google Scholar]
- [120].Manthey Leonie, Vehreschild Viktor, and Renner Karl-Heinz. 2016. Effectiveness of two cognitive interventions promoting happiness with video-based online instructions. Journal of Happiness Studies 17 (2016), 319–339. [Google Scholar]
- [121].Manzini Iason Gabriel and Arianna. 2024. The ethics of advanced AI assistants. https://deepmind.google/discover/blog/the-ethics-of-advanced-ai-assistants/
- [122].Maples Bethanie, Cerit Merve, Vishwanath Aditya, and Pea Roy. 2024. Loneliness and suicide mitigation for students using GPT3-enabled chatbots. npj Mental Health Research 3, 1 (2024), 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [123].Marcovitch Stuart, Jacques Sophie, Boseovski Janet J, and Zelazo Philip David. 2008. Self-reflection and the cognitive control of behavior: Implications for learning. Mind, Brain, and Education 2, 3 (2008), 136–141. [Google Scholar]
- [124].Mathur Maya B and Reichling David B. 2016. Navigating a social world with robot partners: A quantitative cartography of the Uncanny Valley. Cognition 146 (2016), 22–32. [DOI] [PubMed] [Google Scholar]
- [125].McIlroy-Young Reid, Kleinberg Jon, Sen Siddhartha, Barocas Solon, and Anderson Ashton. 2022. Mimetic models: Ethical implications of ai that acts like you. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 479–490. [Google Scholar]
- [126].McIlroy-Young Reid, Sen Siddhartha, Kleinberg Jon, and Anderson Ashton. 2020. Aligning superhuman ai with human behavior: Chess as a model system. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1677–1687. [Google Scholar]
- [127].Memmert Lucas and Tavanapour Navid. 2023. Towards human-AI-collaboration in brainstorming: empirical insights into the perception of working with a generative AI. (2023).
- [128].Meyerhoff Jonah, Beltzer Miranda, Popowski Sarah, Karr Chris J, Nguyen Theresa, Williams Joseph J, Krause Charles J, Kumar Harsh, Bhattacharjee Ananya, Mohr David C, et al. 2024. Small Steps over time: A longitudinal usability test of an automated interactive text messaging intervention to support self-management of depression and anxiety symptoms. Journal of Affective Disorders 345 (2024), 122–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [129].Meyerhoff Jonah, Kornfield Rachel, Mohr David C., and Reddy Madhu. 2022. Meeting Young Adults’ Social Support Needs across the Health Behavior Change Journey: Implications for Digital Mental Health Tools. Proc. ACM Hum.-Comput. Interact 6, CSCW2, Article 312 (nov 2022), 33 pages. doi: 10.1145/3555203 [DOI] [Google Scholar]
- [130].Middleton Kathryn R, Anton Stephen D, and Perri Michal G. 2013. Long-term adherence to health behavior change. American journal of lifestyle medicine 7, 6 (2013), 395–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [131].Milkman Katherine L, Gromet Dena, Ho Hung, Kay Joseph S, Lee Timothy W, Pandiloski Pepi, Park Yeji, Rai Aneesh, Bazerman Max, Beshears John, et al. 2021. Megastudies improve the impact of applied behavioural science. Nature 600, 7889 (2021), 478–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [132].Miller Andrew, Poole Erika, Xu Yan, Eiriksdottir Elsa, Kestranek Daniel, Catrambone Richard, and Mynatt Elizabeth. 2012. The work of play: supporting a pervasive health behavior change intervention for us middle school students. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 897–900. [Google Scholar]
- [133].Mohr David C, Azocar Francisca, Bertagnolli Andrew, Choudhury Tanzeem, Chrisp Paul, Frank Richard, Harbin Henry, Histon Trina, Kaysen Debra, Nebeker Camille, et al. 2021. Banbury forum consensus statement on the path forward for digital mental health treatment. Psychiatric Services 72, 6 (2021), 677–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [134].Mohr David C, Cuijpers Pim, and Lehman Kenneth. 2011. Supportive accountability: a model for providing human support to enhance adherence to eHealth interventions. Journal of medical Internet research 13, 1 (2011), e30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [135].Roffarello Alberto Monge and De Russis Luigi. 2019. The race towards digital wellbeing: Issues and opportunities. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–14. [Google Scholar]
- [136].Moore Sarah E, McMullan Megan, McEvoy Claire T, McKinley Michelle C, and Woodside Jayne V. 2019. The effectiveness of peer-supported interventions for encouraging dietary behaviour change in adults: a systematic review. Public health nutrition 22, 4 (2019), 624–644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [137].Mori Masahiro, MacDorman Karl F, and Kageki Norri. 2012. The uncanny valley [from the field]. IEEE Robotics & automation magazine 19, 2 (2012), 98–100. [Google Scholar]
- [138].Murnane Elizabeth L, Walker Tara G, Tench Beck, Voida Stephen, and Snyder Jaime. 2018. Personal informatics in interpersonal contexts: towards the design of technology that supports the social ecologies of long-term mental health management. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–27. [Google Scholar]
- [139].Neff Roni, Fry Jillian, et al. 2009. Periodic prompts and reminders in health promotion and health behavior interventions: systematic review. Journal of medical Internet research 11, 2 (2009), e1138. [Google Scholar]
- [140].Ng Johan YY, Ntoumanis Nikos, Thøgersen-Ntoumani Cecilie, Deci Edward L, Ryan Richard M, Duda Joan L, and Williams Geoffrey C. 2012. Self-determination theory applied to health contexts: A meta-analysis. Perspectives on psychological science 7, 4 (2012), 325–340. [DOI] [PubMed] [Google Scholar]
- [141].Niksirat Kavous Salehzadeh, Silpasuwanchai Chaklam, Cheng Peng, and Ren Xiangshi. 2019. Attention regulation framework: designing self-regulated mindfulness technologies. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 6 (2019), 1–44. [Google Scholar]
- [142].Ogden Jane, Karim Lubna, Choudry Abida, and Brown Kerry. 2007. Understanding successful behaviour change: the role of intentions, attitudes to the target and motivations and the example of diet. Health education research 22, 3 (2007), 397–405. [DOI] [PubMed] [Google Scholar]
- [143].Oh Kyo-Joong, Lee Dongkun, Ko Byungsoo, and Choi Ho-Jin. 2017. A chatbot for psychiatric counseling in mental healthcare service based on emotional dialogue analysis and sentence generation. In 2017 18th IEEE international conference on mobile data management (MDM). IEEE, 371–375. [Google Scholar]
- [144].O’Leary Kathleen, Schueller Stephen M, Wobbrock Jacob O, and Pratt Wanda. 2018. “Suddenly, we got to become therapists for each other” Designing Peer Support Chats for Mental Health. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–14. [Google Scholar]
- [145].Olsen Jeanette M. 2014. Health coaching: a concept analysis. In Nursing forum, Vol. 49. Wiley Online Library, 18–29. [DOI] [PubMed] [Google Scholar]
- [146].Paredes Pablo E, Zhou Yijun, Hamdan Nur Al-Huda, Balters Stephanie, Murnane Elizabeth, Ju Wendy, and Landay James A. 2018. Just breathe: In-car interventions for guided slow breathing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1–23. [Google Scholar]
- [147].Park Joon Sung, O’Brien Joseph, Cai Carrie Jun, Morris Meredith Ringel, Liang Percy, and Bernstein Michael S. 2023. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–22. [Google Scholar]
- [148].Patibanda Rakesh, Mueller Florian’Floyd’, Leskovsek Matevz, and Duckworth Jonathan. 2017. Life tree: understanding the design of breathing exercise games. In Proceedings of the annual symposium on computer-human interaction in play. 19–31. [Google Scholar]
- [149].Paul Andrew. 2023. Popular Youth Mental Health Service faces backlash after experimenting with ai-chatbot advice. https://www.popsci.com/technology/koko-ai-chatbot-mental-health/
- [150].Ploderer Bernd, Reitberger Wolfgang, Oinas-Kukkonen Harri, and van Gemert-Pijnen Julia. 2014. Social interaction and reflection for behaviour change. 1667–1676 pages. [Google Scholar]
- [151].Prochaska James O and Velicer Wayne F. 1997. The transtheoretical model of health behavior change. American journal of health promotion 12, 1 (1997), 38–48. [DOI] [PubMed] [Google Scholar]
- [152].Quintana Chris, Zhang Meilan, and Krajcik Joseph. 2018. A framework for supporting metacognitive aspects of online inquiry through software-based scaffolding. In Educational Psychologist. Routledge, 235–244. [Google Scholar]
- [153].Renfrew Melanie Elise, Morton Darren Peter, Morton Jason Kyle, Hinze Jason Scott, Beamish Peter James, Przybylko Geraldine, and Craig Bevan Adrian. 2020. A web-and mobile app–based mental health promotion intervention comparing email, short message service, and videoconferencing support for a healthy cohort: randomized comparative study. Journal of medical Internet research 22, 1 (2020), e15592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [154].Rhodes Ryan E and de Bruijn Gert-Jan. 2013. How big is the physical activity intention–behaviour gap? A meta-analysis using the action control framework. British journal of health psychology 18, 2 (2013), 296–309. [DOI] [PubMed] [Google Scholar]
- [155].Rick Steven R, Goldberg Aaron Paul, and Weibel Nadir. 2019. SleepBot: encouraging sleep hygiene using an intelligent chatbot. In Proceedings of the 24th international conference on intelligent user interfaces: companion. 107–108. [Google Scholar]
- [156].Roach Ashley. 2018. Supportive peer relationships and mental health in adolescence: An integrative review. Issues in mental health nursing 39, 9 (2018), 723–737. [DOI] [PubMed] [Google Scholar]
- [157].Ryan Richard M, Brown Kirk Warren, and Creswell J David. 2007. How integrative is attachment theory? Unpacking the meaning and significance of felt security. Psychological Inquiry 18, 3 (2007), 177–182. [Google Scholar]
- [158].Niksirat Kavous Salehzadeh, Silpasuwanchai Chaklam, Ahmed Mahmoud Mohamed Hussien, Cheng Peng, and Ren Xiangshi. 2017. A framework for interactive mindfulness meditation using attention-regulation process. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2672–2684. [Google Scholar]
- [159].Sas Corina and Chopra Rohit. 2015. MeditAid: a wearable adaptive neurofeedback-based system for training mindfulness state. Personal and Ubiquitous Computing 19 (2015), 1169–1182. [Google Scholar]
- [160].Schooler Jonathan W, Ohlsson Stellan, and Brooks Kevin. 1993. Thoughts beyond words: When language overshadows insight. Journal of experimental psychology: General 122, 2 (1993), 166. [Google Scholar]
- [161].Schueller Stephen M and Mohr David C. 2015. Initial field trial of a coach-supported web-based depression treatment. In 2015 9th international conference on pervasive computing technologies for healthcare (pervasivehealth). IEEE, 25–28. [Google Scholar]
- [162].Schwartz Katrin, Fabienne Marie Ganster, and Ulrich S Tran. 2023. Mindfulness-based mobile apps and their impact on well-being in nonclinical populations: Systematic review of randomized controlled trials. Journal of Medical Internet Research 25 (2023), e44638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [163].Seyama Jun’ichiro and Nagayama Ruth S. 2007. The uncanny valley: Effect of realism on the impression of artificial human faces. Presence 16, 4 (2007), 337–351. [Google Scholar]
- [164].Shanahan Murray, McDonell Kyle, and Reynolds Laria. 2023. Role play with large language models. Nature 623, 7987 (2023), 493–498. [DOI] [PubMed] [Google Scholar]
- [165].Shapiro Danielle N, Chandler Jesse, and Mueller Pam A. 2013. Using Mechanical Turk to study clinical populations. Clinical psychological science 1, 2 (2013), 213–220. [Google Scholar]
- [166].Sharma Mrinank, Tong Meg, Korbak Tomasz, Duvenaud David, Askell Amanda, Bowman Samuel R, Cheng Newton, Durmus Esin, Hatfield-Dodds Zac, Johnston Scott R, et al. 2023. Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548 (2023). [Google Scholar]
- [167].Sheeran Paschal and Webb Thomas L. 2016. The intention–behavior gap. Social and personality psychology compass 10, 9 (2016), 503–518. [Google Scholar]
- [168].Sheldon Kennon M, Williams Geoffrey, and Joiner Thomas. 2008. Self-determination theory in the clinic: Motivating physical and mental health. Yale University Press. [Google Scholar]
- [169].Siegel Daniel J. 2007. Mindfulness training and neural integration: Differentiation of distinct streams of awareness and the cultivation of well-being. Social cognitive and affective neuroscience 2, 4 (2007), 259–263. [Google Scholar]
- [170].Singhal Karan, Azizi Shekoofeh, Tu Tao, Mahdavi S Sara, Wei Jason, Chung Hyung Won, Scales Nathan, Tanwani Ajay, Cole-Lewis Heather, Pfohl Stephen, et al. 2023. Large language models encode clinical knowledge. Nature 620, 7972 (2023), 172–180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [171].Sinnott-Armstrong Walter. 2003. Consequentialism. (2003).
- [172].Song Inhwa, Park SoHyun, Pendse Sachin R, Schleider Jessica Lee, De Choudhury Munmun, and Kim Young-Ho. 2024. Exploreself: Fostering user-driven exploration and reflection on personal challenges with adaptive guidance by large language models. arXiv preprint arXiv:2409.09662 (2024). [Google Scholar]
- [173].Spatharioti Sofia Eleni, Rothschild David M, Goldstein Daniel G, and Hofman Jake M. 2023. Comparing traditional and llm-based search for consumer choice: A randomized experiment. arXiv preprint arXiv:2307.03744 (2023). [Google Scholar]
- [174].Stade Elizabeth C, Stirman Shannon Wiltsey, Ungar Lyle H, Boland Cody L, Schwartz H Andrew, Yaden David B, Sedoc João, DeRubeis Robert J, Willer Robb, and Eichstaedt Johannes C. 2024. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. npj Mental Health Research 3, 1 (2024), 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [175].Tang Yi-Yuan. 2017. The neuroscience of mindfulness meditation: How the body and mind work together to change our behaviour. Springer. [Google Scholar]
- [176].Tay Louis, Tan Kenneth, Diener Ed, and Gonzalez Elizabeth. 2013. Social relations, health behaviors, and health outcomes: A survey and synthesis. Applied Psychology: Health and Well-Being 5, 1 (2013), 28–78. [DOI] [PubMed] [Google Scholar]
- [177].Terzimehić Nađa, Häuslschmid Renate, Hussmann Heinrich, and Schraefel MC. 2019. A review & analysis of mindfulness research in HCI: Framing current lines of research and future opportunities. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. [Google Scholar]
- [178].Thackara John. 2006. In the bubble: Designing in a complex world. MIT press. [Google Scholar]
- [179].Thoits Peggy A. 2011. Mechanisms linking social ties and support to physical and mental health. Journal of health and social behavior 52, 2 (2011), 145–161. [DOI] [PubMed] [Google Scholar]
- [180].Tian Xiaoyi, Risha Zak, Ahmed Ishrat, Narayanan Arun Balajiee Lekshmi, and Biehl Jacob. 2021. Let’s talk it out: A chatbot for effective study habit behavioral change. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [181].Toppo Greg. 2024. A cautionary AI Tale: Why IBM’s dazzling Watson supercomputer made a lousy tutor. https://www.the74million.org/article/a-cautionary-ai-tale-why-ibms-dazzling-watson-supercomputer-made-a-lousy-tutor/
- [182].Torniainen-Holm Minna, Pankakoski Maiju, Lehto Tuomas, Saarelma Osmo, Mustonen Pekka, Joutsenniemi Kaisla, and Suvisaari Jaana. 2016. The effectiveness of email-based exercises in promoting psychological wellbeing and healthy lifestyle: a two-year follow-up study. BMC psychology 4 (2016), 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [183].Tsai Wan-Hsiu Sunny, Liu Yu, and Chuan Ching-Hua. 2021. How chatbots’ social presence communication enhances consumer engagement: the mediating role of parasocial interaction and dialogue. Journal of Research in Interactive Marketing 15, 3 (2021), 460–482. [Google Scholar]
- [184].Vlaev Ivo and Dolan Paul. 2015. Action change theory: A reinforcement learning perspective on behavior change. Review of General Psychology 19, 1 (2015), 69–95. [Google Scholar]
- [185].Walsh Kathleen Marie, Saab Bechara J, and Farb Norman AS. 2019. Effects of a mindfulness meditation app on subjective well-being: active randomized controlled trial and experience sampling study. JMIR mental health 6, 1 (2019), e10844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [186].Wang Yiyi and Farb Norman AS. 2023. Web-based training for post-secondary student well-being during the pandemic: a randomized trial. Anxiety, Stress, & Coping 36, 1 (2023), 1–17. [DOI] [PubMed] [Google Scholar]
- [187].Watson David, Clark Lee Anna, and Tellegen Auke. 1988. Development and validation of brief measures of positive and negative affect: the PANAS scales. Journal of personality and social psychology 54, 6 (1988), 1063. [DOI] [PubMed] [Google Scholar]
- [188].Williams Alex C, Kaur Harmanpreet, Mark Gloria, Thompson Anne Loomis, Iqbal Shamsi T, and Teevan Jaime. 2018. Supporting workplace detachment and reattachment with conversational intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13. [Google Scholar]
- [189].Wilson Jenna M, Weiss Audrey, and Shook Natalie J. 2020. Mindfulness, self-compassion, and savoring: Factors that explain the relation between perceived social support and well-being. Personality and Individual Differences 152 (2020), 109568. [Google Scholar]
- [190].Winter Natalie, Russell Lahiru, Ugalde Anna, White Victoria, and Livingston Patricia. 2022. Engagement strategies to improve adherence and retention in web-based mindfulness programs: systematic review. Journal of Medical Internet Research 24, 1 (2022), e30026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [191].Wolever Ruth Q, Simmons Leigh Ann, Sforzo Gary A, Dill Diana, Kaye Miranda, Bechard Elizabeth M, Southard Mary Elaine, Kennedy Mary, Vosloo Justine, and Yang Nancy. 2013. A systematic review of the literature on health and wellness coaching: defining a key behavioral intervention in healthcare. Global advances in health and medicine 2, 4 (2013), 38–57. [Google Scholar]
- [192].Woolley Kaitlin and Fishbach Ayelet. 2016. For the fun of it: Harnessing immediate rewards to increase persistence in long-term goals. Journal of Consumer Research 42, 6 (2016), 952–966. [Google Scholar]
- [193].Woolley Kaitlin and Fishbach Ayelet. 2017. Immediate rewards predict adherence to long-term goals. Personality and Social Psychology Bulletin 43, 2 (2017), 151–162. [DOI] [PubMed] [Google Scholar]
- [194].Woolley Kaitlin and Fishbach Ayelet. 2018. It’s about time: Earlier rewards increase intrinsic motivation. Journal of personality and social psychology 114, 6 (2018), 877. [DOI] [PubMed] [Google Scholar]
- [195].Wu Longkai and Looi Chee-Kit. 2012. Agent prompts: Scaffolding for productive reflection in an intelligent learning environment. Journal of Educational Technology & Society 15, 1 (2012), 339–353. [Google Scholar]
- [196].Wu Ruolan, Yu Chun, Pan Xiaole, Liu Yujia, Zhang Ningning, Fu Yue, Wang Yuhan, Zheng Zhi, Chen Li, Jiang Qiaolei, et al. 2024. MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–24. [Google Scholar]
- [197].Wu Xiaodong, Duan Ran, and Ni Jianbing. 2023. Unveiling security, privacy, and ethical concerns of chatgpt. Journal of Information and Intelligence (2023). [Google Scholar]
- [198].Xiang Chloe. 2023. “He Would Still Be Here”: Man Dies by Suicide After Talking with AI Chatbot, Widow Says. https://www.vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says
- [199].Xu Xuhai, Yao Bingsheng, Dong Yuanzhe, Gabriel Saadia, Yu Hong, Hendler James, Ghassemi Marzyeh, Dey Anind K, and Wang Dakuo. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, 1 (2024), 1–32. [Google Scholar]
- [200].Yang Diyi, Ziems Caleb, Held William, Shaikh Omar, Bernstein Michael S., and Mitchell John. 2024. Social Skill Training with Large Language Models. arXiv:2404.04204 [cs.CL] [Google Scholar]
- [201].Yardley Lucy, Spring Bonnie J, Riper Heleen, Morrison Leanne G, Crane David H, Curtis Kristina, Merchant Gina C, Naughton Felix, and Blandford Ann. 2016. Understanding and promoting effective engagement with digital behavior change interventions. American journal of preventive medicine 51, 5 (2016), 833–842. [DOI] [PubMed] [Google Scholar]
- [202].Bernuy Angela Zavaleta, Han Ziwen, Shaikh Hammad, Zheng Qi Yin, Lim Lisa-Angelique, Rafferty Anna, Petersen Andrew, and Williams Joseph Jay. 2022. How can email interventions increase students’ completion of online homework? A case study using A/B comparisons. In LAK22: 12th International Learning Analytics and Knowledge Conference. 107–118. [Google Scholar]
- [203].Bernuy Angela Zavaleta, Ye Runlong, Tran Elexandra, Sibia Naaz, Mandal Abhijoy, Shaikh Hammad, Simion Bogdan, Liut Michael, Petersen Andrew, and Williams Joseph Jay. 2023. Do Students Read Instructor Emails? A Case Study of Intervention Email Open Rates. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research. 1–12. [Google Scholar]
- [204].Zhang Jesse, Zhang Jiahui, Pertsch Karl, Liu Ziyi, Ren Xiang, Chang Minsuk, Sun Shao-Hua, and Lim Joseph J. 2023. Bootstrap your own skills: Learning to solve new tasks with large language model guidance. arXiv preprint arXiv:2310.10021 (2023). [Google Scholar]
- [205].Zhang Yue, Li Yafu, Cui Leyang, Cai Deng, Liu Lemao, Fu Tingchen, Huang Xinting, Zhao Enbo, Zhang Yu, Chen Yulong, et al. 2023. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219 (2023). [Google Scholar]
- [206].Zhong Wanjun, Guo Lianghong, Gao Qiqi, Ye He, and Wang Yanlin. 2024. Memorybank: Enhancing large language models with long-term memory. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19724–19731. [Google Scholar]
- [207].Zvereva K, Deviatkov V, Smirnova E, and Manyashev E. 2020. Method of the Student’s Motivation Assessment Using Smart Chatbot. In INTED2020 Proceedings. IATED, 627–633. [Google Scholar]
