Abstract
Mental health concerns are prevalent among college students, highlighting the need for effective interventions that promote self-awareness and holistic well-being. MindScape explores a novel approach to AI-powered journaling by integrating passively collected behavioral patterns such as conversational engagement, sleep, and location with Large Language Models (LLMs). This integration creates a highly personalized and context-aware journaling experience, enhancing self-awareness and well-being by embedding behavioral intelligence into AI. We present an 8-week exploratory study with 20 college students, demonstrating the MindScape app’s efficacy in enhancing positive affect (7%), reducing negative affect (11%), loneliness (6%), and anxiety and depression, with a significant week-over-week decrease in PHQ-4 scores (−0.25 coefficient). The study highlights the advantages of contextual AI journaling, with participants particularly appreciating the tailored prompts and insights provided by the MindScape app. Our analysis also includes a comparison of responses to AI-driven contextual versus generic prompts, participant feedback insights, and proposed strategies for leveraging contextual AI journaling to improve well-being on college campuses. By showcasing the potential of contextual AI journaling to support mental health, we provide a foundation for further investigation into the effects of contextual AI journaling on mental health and well-being.
Additional Key Words and Phrases: Passive Sensing, Large Language Models, Journaling, Self-reflection, Behavioral Sensing, Mental Health, Well-being, AI, Smartphones
1. Introduction
The significance of struggles with mental health among college students is becoming increasingly apparent, impacting students’ academic performance, social engagement, and overall personal development. Research, including findings from the American College Health Association (ACHA)–National College Health Assessment, highlights a concerning prevalence of anxiety, depression, and related issues among students [4, 7, 53, 55]. Students face a range of pressures, from academic challenges to social and personal hurdles, which affect not only their mental health but also their emotional resilience and personal growth [33, 72, 75, 76]. While traditional mental health interventions administered by clinicians do provide personalized and context-specific support, emerging technologies present an opportunity to extend this support, making it more readily available, automated, and able to potentially overcome considerable institutional barriers. In addition, there is a need for innovative solutions that align with the digital habits of today’s students. We propose a novel study, MindScape, that integrates the traditional practice of journal writing with mobile technology and large language models (LLM) [54] to create a contextually-aware journaling application. The MindScape Android application benefits from on-device sensors and data to provide insights into the user’s daily life. It tracks physical activity, social interactions, and location to understand the individuals’ behavior and environment. By analyzing these data in real-time, the app can provide personalized, context-sensitive journaling prompts designed to provoke thought and reflection. The prompts aim to remind users to introspect and commit time to digitally record their thoughts, thus establishing regular self-reflection habits that are contextualized by their daily lives. MindScape represents a novel application class that incorporates behavioral intelligence into AI. We believe that integrating time-series data obtained from mobile phones and wearables, capturing real-time behaviors and patterns of users, with the capabilities of LLMs will give rise to a new category of AI applications driven by mobile sensing.
Journaling has long been recognized as a potent tool for self-reflection, enabling individuals to externalize thoughts, consolidate disjointed experiences, and identify patterns in their behavior and emotional states. This practice of regular introspection has been linked to a range of psychological benefits, from reducing distress symptoms to enhancing overall well-being [26, 71]. In this study, we explore the potential gains realized through the inclusion of personalization and context-awareness in journaling. We define ‘context’ as the comprehensive set of behavioral, environmental, and temporal factors shaping a user’s daily experience and mental state. For the purpose of our study, this includes physical activities, sleep patterns, social interactions, digital behaviors, and location data, considering both current and historical patterns. The inclusion of personalization and context-awareness in journaling is more than just a technological novelty. It addresses certain inherent limitations in human introspection and memory recall abilities. People may not readily identify certain behavioral patterns or come to particular conclusions about their daily lives without some form of guidance or external input. This is where personalized and context-aware prompts can be valuable, as they may highlight aspects of users’ lives they may have overlooked. Additionally, human memory recall can be biased towards more recent experiences (recency bias) and peak emotional experiences (peak-recency bias), sometimes at the expense of equally significant past events [22, 44, 66]. Context-aware journaling can help counteract this limitation by bringing forward relevant circumstances, events, or feelings from different timeframes in the users’ life. Lastly, by addressing these user limitations, personalized and context-aware journaling could not just improve the process of journaling, but also potentially enhance the mental health benefits associated with this practice.
Herein lies the novelty of our approach: using mobile sensing to capture behavioral data that reflects the user’s context and emotional state, and employing an LLM to generate journaling prompts that are highly relevant to the user’s current contextual situation and surroundings. To complement our context-aware journaling prompts, we introduce daily check-ins as a novel feature in our study app. These brief, simple texts are triggered four times daily and aim to encourage users to pause and reflect on their current experiences. For instance, a check-in might say, “Your morning seemed to include more than just tapping screens – a bit of chitchat too!”. Users can respond with a quick thumbs up or thumbs down, allowing for a low-burden, high-engagement interaction. By leveraging contextual intelligence, our check-ins aim to increase user attachment and engagement with the journaling app, while also making their reflections more meaningful and potentially amplifying the mental health benefits of journaling. The primary goal of these check-ins is to facilitate fleeting moments of self-reflection, helping users develop greater awareness of their thoughts, emotions, and behaviors throughout the day. By incorporating thumbs up/down responses, we simplify the reflection process, making it more accessible and increasing the likelihood of users engaging in regular self-reflection. Furthermore, the MindScape journaling app integrates additional contextual factors such as students’ mood while journaling, their academic stress levels, and temporal variables like weekdays or weekends. While previous studies have explored various aspects of digital journaling and context-aware applications, MindScape uniquely combines several elements: (a) it integrates a wide range of passive sensing data specifically for mental health journaling, going beyond simple activity or location tracking; (b) it employs LLMs to generate personalized, context-aware journaling prompts, a novel application in the mental health domain; (c) it introduces frequent, low-burden check-ins to complement deeper journaling sessions, encouraging continuous self-reflection throughout the day; and (d) it focuses on the specific needs and contexts of college students, a population particularly vulnerable to mental health challenges.
In summary, our study addresses prior limitations in supporting mental health in young adults. We explore whether a more comprehensive, context-aware, and AI-driven approach to journaling can lead to deeper self-reflection and significant improvements in well-being. Early in our development, we conduct a qualitative user study with undergraduate students to understand their journaling habits and preferences. Insights from this study, revealing students’ desires for personalized, context-aware prompts aimed at fostering reflection on daily experiences, significantly influenced our app’s design. We believe our holistic approach allows for a more tailored and responsive tool, capable of providing meaningful support in the unique, often high-pressure, fast paced environment of college life. Our paper makes the following contributions:
We design MindScape – an AI-driven journaling app that integrates behavioral sensing and LLM to deliver personalized, adaptive journaling prompts. We conduct an 8-week study with 20 college students to evaluate the efficacy of this system. By the end of the study, participants report up to an 11% improvement in well-being scores, with statistically significant enhancements in affect, loneliness, mindfulness, self-reflection, anxiety, and depression.
We examine the check-ins and journaling prompts generated by the app, analyzing their topic coverage, and the frequencies of categories to which the prompts belong. We find that the morning check-ins often revolve around social and communication app usage, while afternoon check-ins shift towards academic and social life experiences.
We analyze linguistic differences between journals from contextual and generic prompts using the Linguistic Inquiry and Word Count (LIWC) [11]. Our findings indicate that responses to contextual prompts exhibit more personal language, greater references to personal experiences and relationships, whereas broader emotional expressions (such as affect) are more prevalent in journals from generic prompts.
We review participant feedback concerning their experience and the app’s usability and provide recommendations for future research. Overall, 85% of participants rate MindScape’s usability as good or excellent. Seventy percent consider the journal prompts to be moderately-to-very relevant, and 85% report that the contextual prompts sometimes, often, or always lead to more in-depth reflection compared to generic prompts, demonstrating the effectiveness of the MindScape app.
It is important to note that our objective is to introduce and evaluate a new journaling paradigm that integrates behavioral sensing and contextual awareness. This research conducts a proof-of-concept study on contextual journaling, specifically focusing on its effectiveness as a unique journaling method. We do not perform controlled trials to determine which approach is more beneficial. Ours is an exploratory study designed to potentially augment the classic benefits of journaling by utilizing the latest advancements in LLMs to provide an unobtrusive, effective tool for users to manage their well-being and growth. This approach is closely aligned with the Human-Computer Interaction (HCI) community’s interests, highlighting the significance of AI in enriching user-centric digital experiences. Bridging into Ubiquitous Computing (UbiComp), our research focuses on integrating these technologies into everyday routines. Our goal is for this tool to offer benefits and support and assist students in developing lasting self-reflection and emotional mindfulness skills. We hope that this study will contribute significantly to the ongoing dialogue in HCI and UbiComp, particularly regarding the seamless integration of technology to enhance personal well-being, offering a comprehensive view of its practical application and user impact. Note that this paper significantly expands upon our preliminary work presented as an extended abstract at Conference on Human Factors in Computing Systems (CHI) 2024 [56]. While that initial publication introduced the conceptual framework and basic methodology of our research, this full paper presents a comprehensive study that includes substantial developments in our methodology, a complete implementation of the proposed system, and an extensive evaluation with statistically significant results. This current work represents a thorough investigation of MindScape, providing the research community with a definitive reference that supersedes our earlier publication. As such, this paper should be considered the primary source for understanding and citing our contributions in this area.
2. Related Work
Journaling is a reflective practice where individuals record their thoughts, feelings, and experiences. The act of journaling promotes self-awareness [1, 78], processing of emotions [70], and cognitive organization of experiences [71]. Studies have consistently shown that journaling can improve mood, provide stress relief, and overall, enhance mental well-being [37, 52, 71]. As mobile devices and computers become more prevalent, they have reshaped the practice of journaling. The transition to digital journaling platforms brings conveniences that traditional paper-based methods lack. These include enhanced accessibility—ensuring that users can journal anytime and anywhere, heightened privacy—as entries are secured behind digital safeguards, and the ability to enrich journal entries with multimedia elements.
Journaling can be prompted or unprompted. Unprompted journaling allows for free expression without specific guidelines, giving users freedom to explore their thoughts and feelings. In contrast, prompted journaling uses specific questions or suggestions to guide the journaling process, providing a structure that can help focus and inspire the user. Such prompts are designed to encourage self-reflection, personal growth, and exploration of various topics and experiences. Several digital journaling platforms offer a wide range of prompts to initiate the writing and reflection journey, providing daily reminders to ensure users stay on track with their journaling. This approach can be particularly helpful for users who are new to journaling or those looking to explore new areas of self-discovery and creativity. However, most prompted journaling applications rely on generic prompts not tailored to the user’s situation. Several studies demonstrate that question prompts are one of the main factors positively affecting reflection quality [17, 18, 30, 31]. Thus, generic prompts, while useful, may reduce reflection quality due to their broad nature [3, 63].
Our study focuses on context-aware journaling, where journaling prompts are derived from behavioral data collected via smartphones. This approach enhances traditional journaling by offering prompts that closely align with users’ daily experiences and mental states. By using mobile sensing technology, capable of tracking activities, sociability, locations, and app usage, we generate dynamic prompts that reflect the nuanced aspects of an individual’s life. This approach differs from previous studies that have explored a broader range of personal informatics systems for reflection [9, 20], by integrating these insights into the journaling process. For example, Kocielnik et al. [43] leverage mobile based step count for reflection on activity level whereas Bakker and Rickard [6] use the MoodPrism app to help in mood tracking. Our method aims to mirror the reflective goals of such apps and to offer deeper insights into users’ lifestyles and emotional patterns through personalized journaling. In addition, our study uses a wide range of contextual cues to facilitate journaling, a feature that sets it apart even from its closest counterparts like Apple’s journal application [2]. While Apple’s offering leverages contextual data such as photos and location to generate prompts, our approach extends beyond conventional context-awareness to include an amplified set of signals such as: screen time; social, entertainment, and communication app usage; sleep habits, in-person conversations, calls, and text message exchanges.
Our study additionally integrates both sleep information, such as duration and timing, as well as physical fitness metrics like activity levels, distance travelled, and time spent at the gym. Furthermore, we consider location-based semantics like time spent in a cafeteria, Greek spaces, and other similar locations. This comprehensive approach provides a more nuanced and detailed context for generating personalized journaling prompts, compared to previous studies. Our study also leverages LLM capabilities to enable the creation of intelligent, personalized journaling prompts. AI-driven tools have been used in therapy chatbots, virtual agents, and behavior change systems, offering personalized advice and support [10, 19, 35, 39, 46, 47, 57, 67, 80, 81]. These applications demonstrate the capacity of AI to understand and respond to a wide range of emotional and psychological states [50]. Existing studies have leveraged LLMs for AI-mediated journaling [29, 41, 42]. However, to our knowledge, none of the existing studies have integrated objective and passively observed behavioral data into AI-mediated journaling. While previous work has incorporated behavioral sensing signals into LLM prompts for various applications [27, 38, 79], our approach is novel in its specific application to mental health journaling. Our method integrates rich contextual information to generate personalized, privacy-conscious journaling prompts aimed at improving self-awareness and mental well-being. The novelty lies in the application of these techniques to mental health, the specific combination of contextual factors we consider, and our focus on generating reflective prompts rather than predicting behaviors or app usage. By using an LLM framework to analyze behavioral data and generate relevant journaling prompts, we aim to investigate the potential for a nuanced, data-driven augmentation of the journaling process. Our study seeks to reinforce the benefits of journaling, while simultaneously exploring the effectiveness of context-aware prompts for highly reflective self-expression. Thus, we aim to optimize the impact of personalized digital journaling.
3. Methodology
In this section, we detail our study methodology, which encompasses the study design, participant demographics, the mobile sensing behavioral data collected by our system, and the design of the personalized journaling prompts and check-ins.
3.1. Study Design
At the beginning of our study, we engage in a User Study focused on capturing participants’ perspectives using user-centered design principles. We conduct interviews with students to illuminate their needs and experiences with journaling, providing a foundational understanding for our research approach. The stages of the study are detailed in Figure 1. As we transition into the Development and Testing phase, we refine our methodology and initiate participant recruitment. We employ various channels such as posters, class-wide emails, Computer Science majors and minors email chains, student mailing lists, and collaborations with mental health-related campus clubs to reach potential participants. Out of 91 respondents expressing interest, 26 qualify for the study (the majority of them have Apple phones while our app only supports Android), with 20 ultimately signing the consent form to participate.
During the recruitment and onboarding process, we prioritize transparency regarding data collection and usage. Participants are provided with comprehensive information about the types of data collected, the methods and frequency of collection, and how this data will be used within the app. This information is presented in detail on our recruitment form and reiterated verbally during individual onboarding sessions. During these sessions, participants have the opportunity to ask questions and seek clarification on any aspect of the data collection process. We emphasize that participation is voluntary and that users can withdraw at any time without consequence. We also take care to exclude individuals with high depression scores, as indicated by elevated Patient Health Questionnaire-8 (PHQ8) survey results, to ensure safety due to the unmoderated nature of the reflection prompts. Once enrolled, participants install the MindScape Android app on their phones and enable the permissions for the signals. We clearly communicate during the onboarding process that while the app is designed to function optimally with full data access, users have the right to manage their privacy settings through their device’s permission controls. Users are informed that they can adjust permissions at the system level, and the app will adapt accordingly, ensuring they maintain control over their data sharing while participating in the study.
The central six weeks of our study involve participants interacting with contextual AI-driven journal prompts delivered through the app. This begins with an onboarding process, where participants complete a baseline survey that captures their initial journaling habits, demographic details, and psychological states via standard surveys focused on well-being and self-reflection. At the conclusion of this six-week contextual journaling phase, we administer a follow-up survey using the same standard questionnaires. This enables us to gauge changes in well-being, personal growth, and reflection, assessing whether AI-driven contextual journaling contributes positively to participants’ development. Additionally, we conduct weekly Ecological Momentary Assessment (EMA) – a research methodology that involves repeatedly collecting self-reported data from participants to capture dynamic changes and patterns over time – to monitor changes in participants’ well-being and reflection. Please see Appendix A for the list of surveys and questions we ask participants.
Following the initial contextual journaling phase, the participants enter a two-week period of generic journaling, receiving a uniform prompt via the MindScape app: “What’s on your mind today? Use this journal entry to explore freely any thoughts, feelings, memories, or experiences—anything you’d like.” Due to a limited sample size, a full randomized controlled trial was not feasible. Nevertheless, this phase provides an opportunity to compare and contrast traditional journaling with our AI-driven contextual method. After completing the full eight-week study duration, participants receive the final study feedback survey, which collects their insights on their journaling experience. This includes thoughts on the app’s usability and performance as well as any additional feedback or suggestions. Participants are compensated up to USD 130 for their involvement. The study has received approval from Dartmouth College’s Internal Review Board, ensuring all procedures meet ethical standards.
3.2. Demographics
We recruit 20 students from Dartmouth College for our study. Out of these participants, a majority, 60% (N=12), identify as female, while 35% (N=7) identify as male, and one participant (5%) identifies as non-binary. The cohort comprises 12 graduate students and 8 undergraduate students. When examining racial demographics, 35% (N=7) of participants identify as White or Caucasian, 25% (N=5) as Asian, 20% (N=4) as Black or African American, 15% (N=3) report belonging to multiple racial categories, and 5% (N=1) report ‘Other’. Age distribution among the participants shows that 65% (N=13) are within the 18–24 age bracket, 30% (N=6) fall into the 25–34 age range, and 5% (N=1) are 45 years old or above. Regarding journaling experience, 55% (N=11) of the participants currently maintain a journal, 20% (N=4) do not keep a journal at present though they have journaled in the past, and 25% (N=5) have never engaged in journaling.
3.3. Mobile Sensing based Behavioral Data
The MindScape app automatically infers user activities, like movement and rest, analyzes conversation lengths, and gathers data on screen usage and location (see Table 1). This provides an integrated view of a user’s daily patterns, social interactions, and digital habits. For example, the sensing data might reveal patterns in how often participants attend social functions, dine at campus facilities, or go to the gym. This information allows us to tailor the journaling prompts to align with the participant’s current experiences and to support their emotional well-being. As part of gathering this data, we create a semantic map of the college campus, with locations such as dining areas and gyms marked, allowing the app to accurately infer the context of participants’ activities. This allows for prompts to be customized, encouraging reflection on particular events of the day. The integration of the GPT-4 LLM enables the translation of this rich, multi-faceted behavioral data into personalized and contextually relevant journaling prompts and frequent check-ins that enhance positive introspection and participant engagement. All data collected are temporarily stored on the participant’s phone and then securely uploaded to the MindScape cloud. We then leverage the GPT-4 model through OpenAI’s API [59], allowing us to process the collected behavioral data and additional contexts to generate tailored prompts. Addressing potential concerns relating to participant privacy, we ensure all data sent for processing via OpenAI’s GPT-4 model are de-identified and consist only of high-level metadata. This approach includes stripping any potentially personally identifiable information before the data is utilized to generate tailored prompts. We acknowledge that a locally hosted open-source model could offer an alternative to mitigate privacy concerns further, albeit with possible performance trade offs. In this study, our focus is oriented towards understanding the potential and efficacy of this novel application of AI in journaling practices. Given this emphasis, we decided to utilize OpenAI’s GPT-4 model for its robust performance and scalability capabilities.
Table 1.
Physical Fitness | Physical activity (walking, running, and sedentary duration) | Your running routine has really taken off! How’s that influencing your day? |
Distance travelled | ||
Time spent at the Gym | ||
Sleep | Sleep duration | Your sleep pattern has shifted recently. Could this change be affecting your daytime energy and focus? |
Sleep schedule (start time and end time) | ||
Digital Habits | Screen time | You’ve been clocking less screen time lately. What have you been doing instead that you’ve found rewarding or enjoyable? |
App use (Freq. of social media, communication, & entertainment apps use) | ||
Social Interaction | Phone logs (incoming calls, outgoing calls, incoming SMS, outgoing SMS) | Your call patterns are up; any conversations lately that Number of significant places visited brought a smile to your face? |
In-person conversations (number and duration of conversations) | ||
Number of significant places visited | ||
Time spent at frats/sororities partying | ||
Misc. locations (Time spent at leisure, social, study places, cafeteria & home) |
3.4. Personalized Journaling Prompts
Upon installing MindScape app, participants are prompted to allow the app permission for data collection. Then, they rank their journaling interests in four key areas — Social Interaction, Sleep, Digital Habits, and Physical Fitness. We identify these four key areas through interviews with students on campus (See Section 4.1). Because we collect many different types of data, we want to ensure the journaling prompts we provide are actually helpful to participants. Thus, we use these categories to identify what matters most to each individual participant. We also include the user’s preferences (i.e., category ranking) in the prompt for GPT-4 [58] to generate more relevant journaling prompts. During their enrollment, each user provides us with their usual bedtime for both weekdays and weekends. Journaling notifications are triggered two hours before their reported bedtime. When a notification is tapped, participants are redirected to the app’s journaling screen. There, they are first asked how their day was, followed with a one-minute breathing exercise, and finally, they are asked to write or record (i.e., audio) their journal entry. Only at this point can the participants see the personalized journaling prompt. Participants can also open the app and journal whenever they prefer. Note, the one-minute deep breathing exercise before journaling is based on findings that short relaxation techniques can improve mental clarity and emotional readiness [8, 82]. This step aims to help users transition to a reflective mood, enhancing their focus for more insightful journaling. It is intended to make the journaling process a calming, enriching routine. Figure 2 shows different screens of the application.
Contexts The GPT-4 prompt composition process incorporates several layers of contextual data:
Personal Priorities: The user’s preferences across the four journaling categories ensure that the journaling prompts mirror individual interests.
Prompt Variability: The system ensures that new prompts are different from the previous two, generating diverse and engaging content.
Temporal Data Analysis: Behavioral data from weekdays are contrasted with a 30-day historical average to establish context. On Saturdays, the app encourages users to reflect on general themes from the preceding week, rather than daily behaviors (for example, “Recall a recent academic success. How did you achieve it and what did it teach you about your resilience or strategy?”). Sundays are used for a comprehensive review including additional data points - such as Greek house attendance and sleep quality - to capture weekend patterns pertinent to college life. Note: In the U.S., ‘Greek houses’ are fraternity or sorority residences, where social and organizational activities are hosted.
Academic Calendar Awareness: As the academic term structure influences stress, the current week of the term is considered during prompt generation, intending to offer supportive content during high-stress phases.
Mood Consideration: If a participant reports a low mood, GPT-4 is prompted to offer journaling prompts that evoke self-compassion or gratitude—strategically fostering a nurturing journaling environment. By guiding users towards reflecting on aspects they are grateful for or encouraging kindness towards themselves, the hope is that these prompts can shift focus from negative thoughts to more positive, affirming ones. It is a strategic, evidence-based approach aimed at offering immediate emotional relief while contributing to long-term emotional well-being, resilience, and mental health [23].
Our methodology emphasizes customization, employing both user preferences and behavioral signals to empower participants in their reflective journaling practice. In Figure 3, we show how all these come together to form the input to the GPT-4 LLM.
3.5. Context-aware Check-ins
The check-ins are “micro context-aware nudges” based on users’ data, and are answered with a quick thumbs up or thumbs down response. For example, “Caught up with some calls and social apps this morning - digital world kept you busy, I bet!”. The MindScape app offers such check-ins four times a day at 12.30 PM, 3.30 PM, 6.30 PM and 11 PM. These times are strategically selected to suit the daily rhythms of college students, ensuring the interaction remains brief and unobtrusive.
Each check-in is designed to incorporate the behavioral data gathered during the time period extending from the previous check-in up to the current one. For instance, the 3:30 PM check-in uses data collected from 12:00 PM to 3:30 PM, while the 6:30 PM check-in uses data gathered from 3:30 PM to 6:30 PM. This approach ensures that each check-in is responsive to the most recent behavioral data captured for the participant. The goal of these check-ins is to both increase the visibility of the app (as opposed to users seeing it just once a day for journaling) as well as to increase reflection on behavior through a casual, quick touchpoints. Please refer to Appendix B for the complete GPT-4 prompt we use to generate check-ins. Important to note: like journal entries, the responses to check-ins (i.e., thumbs up/down) are not utilized as feedback to inform the GPT-4 model or processed further to influence subsequent prompts. They serve solely as a simple engagement mechanism for users.
4. Results
In the following section, we present the results from our study. We begin by examining the journaling prompts and check-in messages, followed by an analysis of the linguistic content of the journals, including a comparison between contextual and generic journals. We then review the changes in well-being and personal growth scores, as observed in the follow-up survey conducted after the study. Finally, we discuss participant usability and feedback, and offer recommendations for future researchers.
4.1. User Study
We conduct qualitative user studies through in-depth interviews with five undergraduate students at Dartmouth College, with the goal of understanding their journaling habits, preferences, and expectations for potential personalized prompts that could be generated by the MindScape app. The participants, aged between 18–24 and comprising 3 males and 2 females, are selected through targeted invitations extended by our team to ensure a range of insights into the efficacy and impact of personalized journaling within the university context.
During these interviews, students are introduced to the various types of data that could be captured via their smartphones. Based on the signals that we can feasibly track, such as location data, physical activity, app usage, and others, students identify four main areas of interest that they believe would be most beneficial for personalized journaling prompts. These preferences include:
Social Interactions: Reflecting on social activities and relationships, influenced by data on in-person conversations, phone logs and time spent at different locations (such as fraternities, social places)
Sleep Patterns: Insights derived from sleep tracking data to encourage better sleep habits and reflections on the impact of sleep on daily functioning.
Physical Fitness: Using activity tracking data, and time spent at the gym to monitor progress, and reflect on the connection between physical health and overall well-being.
Digital Habits: Observations on app usage and screen time to encourage healthier digital interactions and balance.
Participants also share their preferred contexts and times for engaging with the app—highlighting a tendency to journal during quieter moments of the day, or when experiencing stress, suggesting that prompts should be adaptive to their emotional states and academic schedules.
Motivations and Barriers: Participants note journaling as a helpful tool for emotional processing and stress management. However, common barriers cited include uncertainties about what to write, time constraints, and inconsistent journaling habits. These insights underscore the opportunity for MindScape to incorporate features like structured prompts and integrated reminders to help users navigate these challenges.
Adaptive Features: There is a strong interest in receiving adaptive journaling prompts based on sensed emotional state or specific stressors, such as exam periods or significant personal events, demonstrating the need for adaptive AI functionalities within the app.
Self-reports and Sensing-based Prompts: Participants are willing to provide self-reports at different times of the day, suggesting that multiple daily check-ins and end-of-day journaling are feasible. They respond positively to location-specific prompts, such as those related to meals in the cafeteria or academic work in the library.
Academic Stress: The user study validates our understanding of academic stress among students, particularly highlighted during periods like exams and project deadlines. Students report increased stress levels that affect their sleep, social interactions, and overall well-being.
These insights are instrumental in tailoring the development of the MindScape app’s prompting mechanisms. We integrate additional contextual factors such as awareness of the academic calendar, mood fluctuations, and personal priority tracking to enrich the user’s engagement with reflective practices meaningfully. With these adaptive and user-centric features, the MindScape app aims to enhance users’ well-being through tailored, data-informed interactions.
4.2. Contextual Journaling Prompts and Check-ins
The MindScape study yielded 661 journaling entries over 8 weeks: 533 from contextual prompts in the first six weeks and 128 from generic prompts in the last two weeks. Participants engaged for an average of 6.5 weeks, submitting 33.05 entries each. We collected 2,985 check-ins, with afternoon and evening check-ins being the most frequent. Night check-ins, despite lowest participation, showed the most favorable response ratio (4.8 thumbs up per thumbs down). At the beginning of the study, we offer participants the opportunity to personalize their experience through the MindScape app by prioritizing journaling categories based on their individual goals. The categories are as follows: Social Interaction, Sleep, Physical Fitness, and Digital Habits. Upon installing the app, participants rank these categories in order of importance, tailoring the contextual journaling prompts they receive during the first six weeks of the study. The order of these categories are randomized when displayed to the user, and participants reorder the categories to rank them according to their preferences. Figure 4a visualizes these preferences across four priority ranks. A clear preference for Social Interaction emerges, with seven participants ranking it as their top priority and eight as their second. This is followed by Digital Habits, Sleep, and Physical Fitness. Please see Appendix E for examples of the contextual journaling prompts delivered by MindScape.
Figure 4b shows the distribution of the prompts between categories. Social Interactions dominated with 42% of prompts, aligning with participant preferences. Digital Habits followed at 22%. Surprisingly, Physical Fitness (15%) surpassed Sleep (7%) in prompt frequency, likely due to its broader range of signals. Physical Fitness includes things like daily exercise, distance traveled, and physical activities such as standing, running, and walking. In contrast, Sleep considers only two signals: total sleep duration and schedule. With a wider array of signals, there is greater potential to highlight changes or improvements in more signals than just two for sleep. 14% of prompts were broader, weekend-focused topics outside the four main categories.
Following this, we perform topic modeling on check-in prompts to understand their content. The process involves extracting embeddings using the all-mpnet-base-v2 sentence-transformer model [36], reducing dimensionality with Uniform Manifold Approximation and Projection (UMAP) [51], clustering data using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [15], tokenizing topics with class-based term frequency-inverse document frequency (c-TF-IDF), and refining topics using GPT-4. We utilize the BERTopic [32] python library throughout this procedure. Figure 5 displays the top 10 topics identified through topic modeling, sorted according to the time of day when the check-in prompts were issued: morning, afternoon, evening, and night. This organization offers insights into the contextual relevance of each topic to the students’ daily routines, as reflected in their interactions with the prompts. In the morning, most of the prompts are related to Daily Usage of Social & Communication Apps. It is likely that there are few other significant activities in the morning hours (6–11:45 AM) other than students engaging in communication via their phones or still being in their dorms, presumably sleeping, so the topics predominantly revolve around these aspects. However, as the day progresses, the nature of the highlighted topics shifts, reflecting changes in students’ focus and activities. For instance, topics identified during the afternoon, such as Study Spot Experiences, increase significantly, suggesting that students are attending classes, working, or visiting libraries. Dorm Life & Social Interactions related prompts also see a rise during the afternoon and peak in the evening, likely due to increased proximity to peers and social interactions. Interestingly, Daily Usage of Social & Communication Apps reaches its lowest point during the evening prompts, whereas Engaging Conversations Throughout the Day hits its peak. At night, Daily Usage of Social & Communication Apps increases again from its evening lows, but Study Spot Experiences decreases, indicating that students are preparing to end their day. Note that since we have a lower response rate for the nighttime prompts, it possibly influences the limited topics we identify for nighttime prompts. Please see Appendix F for a sample of check-ins delivered by the MindScape app.
4.3. Journaling Responses Deeper Dive
In this section, we dive deeper into the journal entries submitted by participants. We analyze and compare responses to both contextual and generic prompts.
Journaling Showdown:
Contextual vs. Generic:
For the initial six weeks of the study, the MindScape app sent contextual journaling prompts that were dynamic and tailored day-to-day based on the participants’ passively collected behavior. After this period, the subsequent two weeks featured generic, static prompts that consistently asked, “What’s on your mind today? Use this journal entry to explore freely any thoughts, feelings, memories, or experiences – anything you’d like.” We now compare participants’ responses to these generic prompts with those to the contextual prompts using Linguistic Inquiry and Word Count (LIWC). LIWC is a research tool that provides insights into the psychological and emotional underpinnings of language use. By analyzing the frequency of psychologically meaningful words, LIWC allows us to understand aspects such as emotionality, social relationships, and thinking styles in the journal entries. For clarity, we refer to entries from contextual and generic prompts as “contextual journals” and “generic journals”.
The LIWC analysis in Table 2 reveals nuanced differences in how participants engage with contextual and generic journaling prompts. Notably, generic prompts elicit slightly longer responses (Mean = 44.51, SD = 15.78) compared to contextual prompts (Mean = 43.67, SD = 17.62). This difference in response length may be related to the thinking style encouraged by each prompt type. Generic prompts yield higher analytic thinking scores (Mean = 40.50, SD = 21.72), indicating a more formal and logical thinking style. In contrast, contextual prompts result in lower analytic thinking scores (Mean = 33.39, SD = 27.34), suggesting a more personal and spontaneous writing approach. This difference in thinking styles is also reflected in the Clout scores, which reveal a disparity in the level of confidence and expertise conveyed through language. Contextual journal entries express less confidence and authority (Mean = 10.78, SD = 13.23) compared to generic journals (Mean = 9.43, SD = 9.53), resulting in a more tentative and exploratory writing tone. Furthermore, the two prompt types also differ in terms of authenticity, emotional tone, and cognitive processes. Contextual journals score lower in authenticity (Mean = 83.78, SD = 18.76) compared to generic journals (Mean = 85.11, SD = 20.94), but show similar emotional tone scores (contextual Mean = 67.49, SD = 28.45; generic Mean = 68.09, SD = 25.41). But, generic prompts have noticeably higher affective content (Mean = 8.17, SD = 3.80) compared to contextual prompts (Mean = 5.84, SD = 4.36), suggesting that generic prompts may encourage broader emotional expressions. In addition, generic journals have a higher positive tone (e.g., good, well, new, love) and reduced negative tone (e.g., bad, wrong, too much, hate) as well. However, contextual journals have higher cognition scores (Mean = 14.69, SD = 7.26) than generic journals (Mean = 11.41, SD = 6.44), indicating a greater emphasis on thinking, problem-solving, and memory recall. Moreover, the language used in contextual journals reveals a greater focus on personal experiences and relationships, with more pronouns used (Mean = 18.14, SD = 5.95) compared to generic journals (Mean = 14.66, SD = 5.02). This is consistent with the finding that contextual prompts encourage more social references (Mean = 3.34, SD = 3.75) than generic journals (Mean = 1.87, SD = 2.09), indicating a strong focus on social bonds and community. Finally, generic prompts encourage a broader temporal focus (Mean = 7.38, SD = 4.41) compared to contextual prompts (Mean = 5.63, SD = 4.25), particularly in the higher scores for past and future focus. This suggests that generic prompts may encourage participants to link their current experiences with past memories or future aspirations more frequently than contextual prompts. Note that due to the differing time periods associated with each type of journaling, we normalized the scores to ensure comparability. To do this, we first calculated the LIWC scores per week for each participant during the 6-week contextual prompts period and the 2-week generic prompts period, separately. Then, we averaged each set of scores separately to obtain a final weekly average for both journaling experiences.
Table 2.
Categories | Contextual Journals | Generic Journals | ||
---|---|---|---|---|
Mean | SD | Mean | SD | |
Word count | 43.67 | 17.62 | 44.51 | 15.78 |
Analytical thinking | 33.39 | 27.34 | 40.50 | 21.72 |
Clout | 10.78 | 13.23 | 9.43 | 9.53 |
Authenticity | 83.78 | 18.76 | 85.11 | 20.94 |
Emotional Tone | 67.49 | 28.45 | 68.09 | 25.41 |
Affect | 5.84 | 4.36 | 8.17 | 3.80 |
Positive tone | 4.67 | 3.82 | 6.20 | 4.17 |
Negative tone | 1.03 | 1.70 | 1.86 | 1.41 |
Pronouns | 18.14 | 5.95 | 14.66 | 5.02 |
Cognition | 14.69 | 7.26 | 11.41 | 6.44 |
Insight | 3.43 | 3.12 | 3.19 | 2.57 |
Drives | 4.67 | 3.82 | 3.33 | 2.49 |
Social processes | 7.95 | 6.37 | 3.64 | 3.29 |
Social behavior | 4.00 | 3.90 | 1.31 | 1.77 |
Social referents | 3.34 | 3.75 | 1.87 | 2.09 |
Time orientation | 5.63 | 4.25 | 7.38 | 4.41 |
Past focus | 5.17 | 4.62 | 6.76 | 3.98 |
Present focus | 5.53 | 4.30 | 5.73 | 3.76 |
Future focus | 1.38 | 1.97 | 2.60 | 3.49 |
Following this, we deepen our understanding of the thematic content within these generic journals by employing the same topic modeling approach as detailed in the previous section. We identify four primary topics: Daily Experiences, Daily Activities, Productivity Management, and Academic & Personal Growth, as shown in Figure 6. Daily Experiences dominates 40 journals, featuring words like “happy,” “today,” “stressed,” and “progress,” revealing diverse emotional content. One journal entry exemplifies this, describing conflicting emotions of homesickness and joy from new friendships. Daily Activities appears in 38 entries, with words such as “exciting,” “fun,” “enjoy,” and “conversations.” A notable entry recounts an exhilarating day skiing with friends. Productivity Management is the focus of 35 journals, emphasizing efficiency and accomplishment with terms like “productive,” “working,” “progress,” and “rest.” One entry details balancing work-from-home productivity with self-care. Academic & Personal Growth, the least represented with 15 journals, centers on academic advancement and personal development. A standout entry describes an inspiring symposium that boosted the participant’s research confidence. This provides insights into participants’ reflections, ranging from emotional experiences to productivity concerns and personal growth, demonstrating the diverse ways students engage with open-ended journaling prompts.
4.4. Exploring Changes in Well-being and Emotional Growth
In this section, we evaluate the changes in the participants’ well-being and personal growth following the contextual journaling phase. We administer several standardized surveys to participants at multiple stages of the study: baseline, weekly intervals, and follow-up. These surveys are designed to assess changes in their behavior and well-being.
Changes in Baseline vs. Follow-up Survey:
We administer the same set of standard surveys to participants at the beginning of the study and at the six-week follow-up, when the contextual journaling phase ends and generic journaling begins. We compare the responses and detail the mean differences on Table 3, which includes the baseline mean (start of the study), mean at follow-up (at the six-week mark), the mean change in value, mean change expressed as a percentage, the effect size (Cohen’s d), and the 95% confidence interval (C.I.) of the effect size. We also conduct a paired t-test, shading non-statistically significant values in grey. It is essential to note that, while we report statistical significance in adherence to standard result reporting practices, considering outcomes regardless of statistical significance is valuable given our small sample size. This limitation often leads to fewer statistically significant results. Therefore, we also report effect sizes, which reveal notable effects despite a lack of significance. Given the exploratory nature of our study, dismissing potential relationships solely based on statistical significance is not advisable. Moreover, the wide confidence intervals for all values—attributable to the small sample size—present intriguing results that warrant further investigation with a larger sample in future studies.
Table 3.
Facet | Baseline Mean | Follow-up Mean | Mean Change | Percentage Change | Effect Size | C.I. |
---|---|---|---|---|---|---|
Personality [62] | ||||||
⬆ Extraversion | 5.15 | 5.25 | 0.10 | 1.94% | 0.08 | (−0.39, 0.54) |
⬆ Agreeableness | 6.60 | 6.80 | 0.20 | 3.03% | 0.21 | (−0.25, 0.67) |
⬆ Conscientiousness | 7.15 | 7.35 | 0.20 | 2.80% | 0.17 | (−0.30, 0.63) |
⬇ Neuroticism | 7.20 | 6.35 | −0.85*** | −11.81% | −0.63 | (−1.09, −0.16) |
— Openness | 6.90 | 6.90 | 0.00 | 0.00% | 0.00 | (−0.46, 0.46) |
Emotion Regulation [61] | ||||||
⬆ Cognitive reappraisal | 12.90 | 13.25 | 0.35 | 2.71% | 0.07 | (−0.39, 0.54) |
⬆ Expressive suppression | 11.65 | 12.40 | 0.75 | 6.44% | 0.15 | (−0.32, 0.61) |
Affect [77] | ||||||
⬆ Positive affect | 31.45 | 33.70 | 2.25* | 7.15% | 0.39 | (−0.07, 0.86) |
⬇ Negative affect | 25.00 | 22.35 | −2.65** | −10.60% | −0.62 | (−1.08, −0.15) |
Stress & Anxiety | ||||||
⬇ Perceived stress [21] | 7.60 | 6.75 | −0.85 | −11.18% | −0.28 | (−0.74, 0.19) |
⬇ State-trait anxiety [49] | 45.83 | 42.17 | −3.67 | −8.00% | −0.30 | (−0.77, 0.16) |
⬆ Resilience [69] | 2.91 | 2.94 | 0.03 | 1.03% | 0.10 | (−0.36, 0.57) |
Psychological Well-being [65] | ||||||
⬇ Autonomy | 8.05 | 7.70 | −0.35 | −4.35% | −0.16 | (−0.62, 0.31) |
⬆ Personal growth | 5.45 | 5.50 | 0.05 | 0.92% | 0.03 | (−0.43, 0.49) |
⬇ Positive relations | 7.00 | 6.10 | −0.90** | −12.86% | −0.48 | (−0.95, −0.01) |
⬆ Purpose | 5.50 | 5.75 | 0.25 | 4.55% | 0.22 | (−0.24, 0.69) |
⬇ Self-acceptance | 6.95 | 6.60 | −0.35 | −5.04% | −0.19 | (−0.66, 0.27) |
⬆ Life satisfaction [24] | 22.35 | 22.70 | 0.35 | 1.57% | 0.09 | (−0.38, 0.55) |
⬆ Flourishing [25] | 42.60 | 43.45 | 0.85 | 2.00% | 0.14 | (−0.32, 0.61) |
Social and Interpersonal Well-being | ||||||
⬇ Social provision [16] | 16.80 | 16.25 | −0.55 | −3.27% | −0.18 | (−0.64, 0.29) |
⬇ Loneliness [64] | 8.50 | 7.95 | −0.55* | −6.47% | −0.42 | (−0.88, 0.05) |
Cognition and Self-Awareness | ||||||
⬆ Mindfulness [5] | 44.35 | 47.35 | 3.00** | 6.76% | 0.55 | (0.07, 1.01) |
⬆ Self-reflection [68] | 29.30 | 31.00 | 1.70** | 5.80% | 0.47 | (0.00, 0.93) |
⬆ Insight [68] | 25.75 | 27.70 | 1.95* | 7.57% | 0.36 | (−0.10, 0.82) |
p-value ≤ .01,
.01 < p-value ≤ .05,
.05 < p-value ≤ .10.
We observe several positive outcomes from the study. Interestingly, we find a significant decrease of 11.81% in the personality trait neuroticism, which is typically associated with negative emotions, with a medium effect size (p-value = 0.001, effect size (d) = −0.63). We consider effect sizes of 0.2, 0.5, and 0.8 as small, medium, and large, respectively, regardless of the sign, which merely indicates the direction of change. Although changes in agreeableness are not statistically significant, we observe a modest increase of 3.03% with a small effect size. Changes in other personality traits do not reach statistical significance, and their effect sizes remain small. Given that personality is generally stable, significant alterations in traits like neuroticism and agreeableness within a short timeframe are noteworthy. We do not observe any significant changes in emotion regulation, neither through statistical significance nor through effect sizes. However, we note promising indicators of improved well-being at the follow-up, including an increase in positive affect and a decrease in negative affect. Specifically, positive affect, which reflects the extent to which individuals experience positive moods such as joy, interest, and alertness, increases by 7.15% (p-value = 0.05, d = 0.39). Conversely, negative affect, which encompasses a range of negative emotional states including anxiety, depression, stress, and sadness, decreases by 10.60% (p-value = 0.05, d = −0.62). Both changes are statistically significant and exhibit moderate to large effect sizes.
We observe notable changes in various psychological metrics. Stress and anxiety decrease by 11.18% and 8.00%, respectively, although these reductions are not statistically significant and exhibit small effect sizes. Resilience increases by 1.03%, but this change is not statistically significant and demonstrates a very low effect size. In terms of psychological well-being, the results are mixed. Autonomy, defined as being self-determining and independent, decreases by 4.35% (p-value = 0.49, d = −0.16). Positive relations with others, which encompass warm, satisfying, trusting relationships, decrease by 12.86% (p-value = 0.04, d = −0.48), and self-acceptance, referring to a positive attitude toward oneself, decreases by 5.04% (p-value = 0.40, d = −0.19). Only the decrease in positive relations is statistically significant and exhibits a medium effect size. However, other elements within psychological well-being show positive changes. We observe a 0.92% increase in personal growth (p-value = 0.89, d = 0.03), which involves seeing improvement in oneself and behavior over time. Purpose in life, defined as having goals in life and a sense of directedness, increases by 4.55% (p-value = 0.32, d = 0.22). Life satisfaction, an evaluation of a person’s quality of life, increases by 1.57% (p-value = 0.69, d = 0.09). Flourishing—self-perceived success in important areas such as relationships, self-esteem, purpose, and optimism—increases by 2.00% (p-value = 0.53, d = 0.14). Although these results are statistically insignificant and associated with small effect sizes, they indicate promising trends. Additionally, we find a statistically insignificant decrease of 3.27% in social provision, specifically perceived social support, with a very small effect size (p-value = 0.44, d = −0.18). On the other hand, subjective feelings of loneliness show statistically significant improvement, decreasing by 6.47% with a medium effect size (p-value = 0.07, d =−0.42).
We observe exclusively positive outcomes in cognition and self-awareness. Mindfulness, self-reflection—defined as the inspection and evaluation of one’s thoughts, feelings, and behaviors—and insight, which refers to a clear understanding of one’s mental and emotional processes, all show significant increases. Each of these dimensions demonstrates statistically significant improvements with small to medium effect sizes. Specifically, we observe a 6.76% surge in mindfulness with a medium effect size (p-value = 0.02, d = 0.55); self-reflection rises by 5.80% with a medium effect size (p-value = 0.04, d = 0.47); and insight grows by 7.57% with a small effect size (p-value = 0.10, d = 0.36). These results indicate that the contextual journaling integral to the study may be able to substantially enhance the key factors we aimed to influence: self-awareness, self-monitoring, and clarity of self-perception.
Weekly Changes:
We employ the MindScape app to administer weekly ecological momentary assessments (EMA) to participants. These assessments comprise the Patient Health Questionnaire-4 (PHQ4) [45], Self-reflection and Insight Scale (SRIS) [68], 5-item Mindful Attention Awareness Scale (MAAS) [14], and the 10-item Positive and Negative Affect Schedule (PANAS) [73]. Every Sunday, the app sends notifications to participants, prompting them to complete the surveys. We utilize a mixed-effects model to examine changes in participants’ scores over the weeks, accounting for their self-reported gender, student status (graduate or undergraduate), past journaling experience, and race (‘multiple’ race category is merged into ‘other’ for simplicity). The mixed-effects model we apply to analyze the outcome scores is formulated as follows:
(1) |
outcome_scoreij refers to the scores obtained from the {PHQ4,SRIS, MAAS, PANAS} surveys for the i-th subject at the j-th week.
β0, β1, …, β8 are the fixed coefficients for intercept, week, and other covariates.
b0i is the random intercept for the i-th subject.
b1i is the random slope for the i-th subject associated with the effect of week.
ϵij is the residual error.
We compute the outcomes by first utilizing the total PHQ4 score as-is, and then deriving two subscores: anxiety, which is the sum of the first two items, and depression, which is the sum of the last two items. Additionally, we generate scores for positive affect and negative affect from the PANAS, and calculate scores for self-reflection and insight from the SRIS. Furthermore, we compute a total mindfulness score from the MAAS. We then incorporate all these variables as outcomes in the mixed-effects model, examining their changes and relationships over time. By doing so, we can gain a comprehensive understanding of how the participants’ mental health and well-being evolve throughout the study.
We display the results in Table 4, which reveals several remarkable findings. Note that the number outside the brackets represents the coefficients, while the number inside the brackets indicates the standard error. Notably, anxiety levels consistently decrease each week, with a statistically significant reduction (p-value = 0.01, β = −0.12). Interestingly, this decrease is more pronounced in males (p-value = 0.06, β = −1.05). Our analysis also shows a significant decrease in depression scores over time (p-value = 0.01, β = −0.131), particularly among graduate students compared to undergraduates (p-value = 0.01, β = −1.55). This suggests that journaling may be more effective for graduate students, although no other demographic factors like gender show significant effects. The overall PHQ4 score also demonstrates a decreasing trend during the study, more so in males. In terms of affective states, we observe no significant changes in negative affect over the study period. However, graduate students report higher levels of positive affect compared to undergraduate students. This stability in affect contrasts with findings from the baseline and follow-up surveys reported in Section 4.4, where we observe a statistically significant increase in positive affect and a decrease in negative affect. Similarly, MindScape might boost self-reflective capacities, as evident from a significant weekly increase in self-reflection scores (p-value = 0.01, β = 0.39). Although gender differences in self-reflection are not statistically significant, we observe a notable trend suggesting that race may influence outcomes. Participants with prior journaling experience benefit more (p-value = 0.01, β = 14.23), showing greater score in self-reflection. This aligns with our earlier comparison of baseline and follow-up scores in self-reflection, which yielded positive and statistically significant results. In contrast, insights and mindfulness, measured by their respective total scores, do not exhibit significant changes throughout the study. However, participants with prior journaling experience and graduate students experience enhanced mindfulness benefits compared to others (p-value = 0.10, β = 1.44).
Table 4.
EMA | Week | Gender Male | Gender Nonbinary | Race White | Race Black | Race Other | Journaling Experience | Student Status |
---|---|---|---|---|---|---|---|---|
⬇ PHQ4 [45] | −0.25 (0.08)*** | −1.74 (0.98)* | −1.85 (2.54) | 0.72 (1.35) | −0.12 (1.35) | 1.46 (1.29) | −0.98 (1.46) | −1.74 (1.22) |
⬇ Anxiety [45] | −0.12 (0.05)*** | −1.05 (0.54)* | −1.18 (1.42) | 0.93 (0.76) | 0.34 (0.75) | 0.71 (0.72) | 0.04 (0.81) | −0.18 (0.68) |
⬇ Depression [45] | −0.131 (0.05)*** | −0.69 (0.52) | −0.66 (1.37) | −0.19 (0.73) | −0.44 (0.72) | 0.74 (0.69) | −1.01 (0.78) | −1.55 (0.65)*** |
⬆ Positive affect [73] | 0.040 (0.10) | 1.67 (1.10) | 2.31 (2.87) | −0.01 (1.53) | 1.51 (1.52) | 2.15 (1.44) | 0.69 (1.64) | 2.45 (1.37)* |
⬇ Negative affect [73] | −0.15 (0.12) | −2.03 (1.52) | −2.60 (3.93) | −0.39 (2.10) | −0.36 (2.09) | 0.10 (1.99) | −2.50 (2.26) | −2.22 (1.89) |
⬆ Mindfulness [14] | 0.04 (0.04) | 0.84 (0.61) | 2.20 (1.58) | 0.72 (0.85) | 0.78 (0.84) | −0.49 (0.81) | 1.44 (0.91)* | 1.33 (0.76)* |
⬆ Self-reflection [68] | 0.39 (0.14)*** | 2.02 (3.22) | 7.56 (8.23) | 10.92 (4.42) | −1.31 (4.42) | 5.33 (4.24)*** | 14.23 (4.78)*** | 2.34 (4.02) |
⬆ Insight [68] | 0.04 (0.15) | 3.76 (4.15) | 12.56 (10.56) | 1.64 (5.68) | 2.66 (5.68) | −3.36 (5.46) | 3.22 (6.14) | 6.13 (5.18) |
p-value ≤ .01,
.01 < p-value ≤ .05,
.05 < p-value ≤ .10.
We illustrate the increase in self-reflection scores and the decrease in PHQ-4 scores in Figures 10a and 7b, respectively. The numbers at the bottom of the figures in boxes represent the number of participants from whom we receive EMA responses in each specific week. On average, we receive responses from 15 participants over the course of the first six weeks. However, in weeks 7 and 8, which mark the beginning of the generic journaling period, the numbers drop to 6 and 3, respectively. The shaded areas represent the 95% confidence intervals. Overall, it appears that participants experience several positive changes during the study period. These improvements across various psychological dimensions, particularly in cognition and self-awareness, might demonstrate the efficacy of contextual journaling. While some areas show minimal changes or declines, the significant positive trends may indicate the potential beneficial impact of the study on enhancing participants’ mental well-being and self-related cognitions. This overall positive shift indicates promising paths for future applications and studies aimed at further understanding and supporting mental health and cognitive awareness through contextual journaling.
4.5. MindScape App Performance
Upon concluding the study, we solicit participant feedback regarding the MindScape app’s performance and their experience with contextual journaling. Our analysis of these responses reveals insights into the app’s usability, relevance, and impact on users’ daily lives.
We first administer the standard System Usability Scale (SUS) survey to assess the perceived usability of the MindScape app. Our analysis shows that 50% (N=10) of participants rate the system as excellent, with a score of 80 or higher. Additionally, 35% (N=7) rate it as good (score of 68 or higher), while 15% (N=3) consider it poor, scoring below 68. We then present participants with several other questions on their experience with the app. We discuss some responses here and visualize these responses in Figure 9. Please see Appendix A for a complete list of questions and participant responses. Regarding the relevance of journaling prompts generated by the app, 10% (N=2) found them very relevant, 60% (N=12) moderately relevant, and 30% (N=6) slightly relevant. When assessing check-in prompts specifically, 5% (N=1) indicate they are not at all relevant, 40% (N=8) find them slightly relevant, 40% (N=8) think they are moderately relevant, and 15% (N=3) consider them very relevant.
To enrich our understanding of participants’ experiences using the app, we supplement our quantitative analysis with qualitative, open-ended questions. We ask participants if they recall any specific prompts from the MindScape app that significantly resonated with them or were particularly relevant to their experiences. This inquiry aims to uncover how the app’s context-aware prompts facilitate deeper self-understanding or self-awareness. Participants report various responses that illustrate the personalized impact of these prompts. Many highlight how specific questions related to their daily routines or habits prompt meaningful reflection. For instance, one participant mentions a prompt about their walking routine, which coincides with a new meditative practice: “About a week or two after I started it, I got a question from the app about my walking routine, and it gave me an opportunity to reflect on how the meditative practice had been going!”. This example underscores how timely and relevant prompts enhance mindfulness and self-awareness. Other participants appreciate prompts that encourage proactive planning and goal-setting, such as those urging them to think about weekly exercise goals. This not only makes them contemplate their physical health but also motivates them to set concrete plans. Additionally, reminders about social interactions lead some users to reach out to friends more frequently, improving their social life. One user points out, “A lot of the prompts remind me that I have not been socializing in person as much, which has led to me reaching out to friends I do not see as often.” The app’s ability to track changes in routines and lifestyle also stands out for many users. For example, prompts related to changes in workspaces or sleep patterns provide insights that participants might not have noticed on their own. One participant succinctly notes, “The app has been accurately keeping track of my changes in work routines, which allows me to take time and reflect on how these changes are affecting my overall work performance.”
Given that these questions are posed at the end of the study, following two weeks of generic prompts, we ask participants to self-report on the frequency with which context-aware prompts facilitate more profound reflection than usual. Out of respondents, 5% (N=1) reported never, 10% (N=2) rarely, 45% (N=9) sometimes, 30% (N=6) often, and 10% (N=2) always. Regarding changes in daily habits or behaviors, 15% of participants report moderate changes, 30% notice slight changes, and 55% do not report any changes. However, when we ask to share specific instances of behavioral changes, many highlight tangible impacts driven by the app. One user notes the effect of visual prompts related to phone usage “Every time I see that little green light [indicating microphone use], I’m prompted to think about whether my current device usage aligns with my goals.”. This increased awareness often leads to decreased unreflective screen time and transitions to other daily activities. Changes in social interactions are also notable, with one respondent mentioning: “I’ve seen improvement in my social life…through prompts that make me reflect on the importance of nurturing conversations with my friends.” Additionally, the app enhances self-awareness regarding personal well-being and daily activities. Reports include better monitoring of sleep patterns, more frequent walks, and increased engagement in self-reflection and meditation. However, not all feedback indicates a change. One participant remarks on the challenging nature of adapting behaviors during a particularly busy life phase, suggesting that while the app identifies reduced phone usage, the decrease is more due to their hectic schedule rather than a conscious effort spurred by the app.
To gain further insight into the app’s influence on planning and mindset, we ask participants about its impact on their weekly structure and appreciation of daily activities. Responses reveal varied effects. Several appreciate the structuring aspect introduced by the app, notably through timed journaling activities. One participant notes: “It’s led me to prioritize working out and socializing in person more…the prompts are good reminders.” Another remarked on increased mindfulness: “It has made me more aware of small positive interactions I’ve had throughout the day…the app brought those moments closer again.” However, some find the prompts and check-ins inadequate or misaligned with their personal reflection needs.
Overall, 55% (N=11) of participants express satisfaction or high satisfaction with the app, while 30% (N=6) remain neutral, and 15% (N=3) feel dissatisfied. The ease of integrating MindScape into daily routines is notable, with 75% (N=15) stating it is somewhat easy to very easy. Regarding comfort with using data from phones and smartwatches alongside AI to personalize journaling prompts, 70% (N=14) voice comfort or great comfort, while 30% (N=6) are uncomfortable or neutral. In terms of data privacy and security, 60% (N=12) express no or slight concern, whereas 40% (N=8) report moderate to high levels of concern. When comparing context-aware directed prompts to standard journaling, participants note both benefits and challenges. Many appreciate the structured prompts for enhancing their engagement and reflection practices. As one user mentioned: “Previously, it was cumbersome for me to journal as I’d have to sit and recollect everything. With MindScape, I reflect more earnestly and compare my current state with previous ones, giving me clear insights into my mental and physical well-being.” Participants also value the specific insights prompted by the app, which sometimes brings attention to overlooked aspects of their daily lives: “The context-aware prompts pull out surprising trends that I may not have noticed and ask me to reflect on it.” This feature helps some users gain a deeper understanding of their behavioral patterns and encourages proactive thinking, like another user who points out, “The prompts lead me to think about specific parts of the experiences of my day and how I might make changes.”
However, challenges with the specificity and relevance of the AI-driven prompts are also noted. Some participants feel that the prompts could be too rigid or not entirely reflective of their true daily experiences due to inaccuracies in activity or location tracking. For example, one user criticizes the prompts for being “sometimes based on metrics that might be irrelevant,” such as reacting to an unusually high number of spam calls as if they are meaningful phone conversations. Similarly, another points out that “the app tends to see trends or differences in my behavior when there isn’t a particular cause behind them.” Despite these challenges, the general consensus acknowledges the utility of the MindScape app in fostering regular reflective practices. Even those who note limitations often recognize the benefits of prompted reflections, with one participant aptly summarizing: “I think that I can more easily decide what to write about with a prompt, but I don’t know if I access my feelings in the same way as a free-form journal.”
4.6. Strengths, Weaknesses and Improvement Opportunities
At the conclusion of the study, we gathered qualitative feedback on the MindScape app’s strengths and weaknesses. Users praised the app’s innovative approach to integrating behavioral patterns with journal prompts and valued the regular notifications for maintaining daily reflection habits. One participant noted, “I liked how it regularly made me aware of what I am doing, and it helped me reflect on the activities. I loved the journal prompts as they were not too specific nor broad that I could sufficiently elaborate on my thoughts.” Another user appreciated how the app “did initiate some reflection on the interactions I had throughout the day, how I value the people in my life.” The app’s ease of use was highlighted as a major advantage. A user expressed, “Love the ease of use of the app. Super simple to click on the notification and complete the journaling and check-in practices.” Another added, “I also enjoyed the personalized prompts and how I only had to write short pieces for the journal.” However, participants also identified areas for improvement. The most common suggestion was enhancing the accuracy of context-aware prompts. One user shared, “Many times, the app didn’t seem to be aware of where I lived and would ask me questions about being in my dorm when I wasn’t there.” This highlights the need for more precise behavioral sensing, which could potentially be addressed by integrating additional data sources like smartwatches. The repetitiveness of prompts was another concern, with users noting identical prompts over multiple days. This could be improved by ensuring more dynamic and varied prompt generation. Some participants also suggested linking the app to fitness trackers for better context awareness. A few users also raised concerns about the app’s battery consumption, indicating a need for power management optimization.
Based on participant feedback, we identified several key improvement opportunities for the MindScape app:
Enhanced Personalization and Prompt Variability: Participants suggest enhancing personalization by considering responses to previous prompts alongside behavioral data when generating new ones. This approach could introduce greater diversity and relevance to the questions posed, balancing specific lifestyle habits with broader life reflections and personality exploration. Users prefer a mix of specific and open-ended prompts that maintain a narrative between journal entries, offering opportunities for deeper reflection and emotional processing. By analyzing previous responses, the app can better determine whether to base new prompts on sensed data or earlier interactions, especially when introducing broader questions on weekends. Implementing these changes could significantly improve user engagement by providing deeper insights into emotional well-being and adding greater therapeutic value to the journaling experience. This refined approach focuses more on emotional processing rather than strictly behavioral tracking, potentially offering users a more personalized and meaningful reflection tool.
Improved Context Sensitivity: Perfecting context sensitivity remains challenging due to reliance on passively sensed data. However, we can enhance this by better integrating with smartphones and wearable technology for deeper insights into physical and social contexts. Users are interested in syncing with smartwatches and fitness trackers. There is also demand for allowing users to define and assign meanings to frequently visited locations beyond predefined campus spots. This can be facilitated by integrating third-party location APIs like Google Places, while carefully managing privacy concerns. Alternatively, we can use GPS data from photos to prompt reflections on recent travels, similar to Apple’s Journal app. By focusing on improving the app’s ability to accurately adapt to various contexts rather than relying on hard-coded campus locations, we can significantly boost its utility and user satisfaction. These enhancements will provide more personally relevant prompts, especially for off-campus activities, and improve the overall user experience.
Goal-Setting and Personal Growth Features: Incorporating goal-setting could transform prompts into opportunities for meaningful reflection. A user suggests, “I wonder if the contextual prompts could be more of a prompt to reflect on goal progress rather than guesses about what I did that day.” Some users also propose behavioral suggestions when low levels of social interaction or physical movement are detected. This feature could further engage users in proactive behavior modification and self-improvement, enhancing the app’s role in supporting personal growth.
Customization Options: Participants desire more customization options to tailor the app to their individual needs. They want increased control over prompt timing, adjustable directly from the app, to accommodate personal routines and sleep patterns. One user suggests allowing users to set their core hours to better align with their daily schedule. Additionally, participants recommend offering multiple prompt options to enhance engagement by catering to specific interests and needs. A user proposes providing prompts from different focus areas (e.g., physical activity, social interaction) for users to choose from as needed. This flexibility in both timing and content would ensure interactions are more relevant and stimulating, potentially increasing user involvement and satisfaction. By implementing these customization features, the app could better adapt to individual preferences, fostering a more personalized and engaging experience for users.
Expansion of Check-in Functionality: User feedback on check-ins is mixed. Some appreciate the brief interactions, while others find them redundant. Suggestions include decreasing the frequency of basic check-ins and focusing more on prompts that help identify or process emotions. Many users recommend making check-ins optional, allowing users to activate, deactivate, or adjust the frequency according to their preference. Some participants propose transforming check-ins into encouraging advice or reminders. These changes could improve the user experience by providing more flexibility, relevance, and clarity in the check-in process.
These improvements could significantly enhance the app’s utility, user engagement, and overall satisfaction, making it a more effective tool for promoting well-being and self-reflection.
5. Discussion
In this section, we discuss our findings, the implications of our work and the associated ethical considerations.
5.1. Summary of Results
We collect a total of 661 journal entries from 20 students at Dartmouth College, with a significantly higher engagement rate in the first six weeks of contextual prompts compared to the last two weeks of generic prompts. On average, participants actively engage for about five weeks, submitting 26.65 entries each during the initial six weeks. In contrast, in the last two weeks of generic journaling, participants submit an average of 7.11 entries. The higher engagement with contextual prompts can potentially be attributed to their relevance and their ability to effectively capture daily experiences. However, it may also be influenced by the ordering effect – the novelty of introducing contextual journaling at the beginning likely boosted initial engagement due to its freshness. Throughout the study, participants consistently utilize check-ins, with a total of 2,985 responses recorded, showing higher response rates in the afternoon and evening. Students demonstrate a clear preference for journaling on areas directly impacting their daily lives, with Social Interactions and Digital Habits ranking as the most preferred categories. Interestingly, although Physical Fitness is ranked lower compared to other categories, it represents a significant portion of the prompts, mainly due to the broader range of signals it encompasses. We use advanced topic modeling techniques to understand the themes that resonate during check-ins at different times of the day. Morning check-ins often revolve around social and communication app usage, while afternoon check-ins shift towards academic and social life experiences. This variation underscores the relevance of tailoring check-in prompts to match daily activities and time-specific contexts.
A deeper dive into the journaling responses reveals intriguing insights into the thematic content and language patterns used in generic and contextual journals. We identify four primary topics in generic journals: Daily Experiences, Daily Activities, Productivity Management, and Academic & Personal Growth. This suggests that individuals use generic prompts to explore various aspects of their daily lives, emotional experiences, and personal development. With LIWC analysis, we find nuanced differences in the language patterns and emotional expressions used in both types of journals. Generic prompts, which ask participants to reflect on anything of interest, yield higher analytic thinking scores, suggesting a more formal and logical thinking style. This finding is unexpected, as one might assume that abstract prompts would lead to more creative and less formal thinking. However, it’s possible that the open-ended nature of generic prompts encourages participants to engage in more structured thinking as they attempt to organize their thoughts and ideas.
Contextual prompts yield higher scores on personal pronouns and lower on analytic thinking scores, indicating a more personal and introspective writing style. The findings reveal that generic prompts may encourage broader emotional expression, higher positive tone, and reduced negative tone. Generic prompts also promote a broader temporal focus, linking current experiences to past memories or future aspirations. Contextual journals focus more on personal experiences and relationships, with higher cognition scores and a greater emphasis on thinking and problem-solving. This finding appears to contradict the earlier result showing lower formal/logical thinking in contextual journals compared to generic ones. However, it is possible that the contextual prompts, while reducing formal/logical thinking, simultaneously encourage more personal and relational thinking, which is captured by the higher cognition scores. This suggests that contextual journals may foster a different type of cognitive processing, one that prioritizes personal connections and experiences over abstract logical reasoning.
As we examine the effects of contextual journaling on well-being and emotional growth, we find several positive outcomes. We observe a significant decrease in neuroticism (11.81%), an increase in positive affect (7.15%) and a decrease in negative affect (10.60%). Stress and anxiety also decrease, although not significantly. Notably, we observe significant improvements in mindfulness (6.76%), self-reflection (5.80%), and insight (7.57%). Weekly EMA reveal consistent decreases in anxiety levels, particularly among males. Depression scores also decrease significantly (β = −0.13), especially among graduate students (β = −1.55). Self-reflection scores increase significantly week-to-week while mindfulness and insight do not show significant changes. Participants with prior journaling experience and graduate students experience enhanced mindfulness benefits. Several studies support the idea that individuals with prior experience in journaling or mindfulness practices tend to exhibit greater benefits from mindfulness interventions [40, 48, 60]. This may be because they have developed greater self-awareness, reflection skills, and emotional regulation. Moreover, graduate students’ advanced education and exposure to various learning strategies [12] may also contribute to their advantage, alongside the potential impact of their age [34].
Upon concluding the study, we solicit feedback from participants regarding the MindScape app’s performance, their experience with contextual journaling, and related topics. The feedback is overwhelmingly positive, with 50% of participants rating the app as excellent and 35% as good. Participants find the app’s context-aware prompts to be relevant and helpful, with 60% considering them moderately relevant and 30% slightly relevant. Many participants appreciate the app’s ability to track changes in routines and lifestyle, and 55% report moderate changes in their daily habits or behaviors since using the app. Participants share specific instances where the app’s prompts led to meaningful reflection and behavioral changes, such as increased mindfulness and self-awareness, improved social interactions and relationships, enhanced goal-setting and planning, better monitoring of sleep patterns and physical activity, and increased engagement in self-reflection and meditation. However, some participants note challenges with the app’s prompts, such as inaccuracies in activity or location tracking, prompts being too rigid or irrelevant, and limited ability to access deeper feelings and emotions. Despite these challenges, the majority of participants (75%) find it easy to integrate the app into their daily routines, and 70% are comfortable using data from phones and smartwatches alongside AI to personalize journaling prompts. Overall, participants appreciate the structured prompts and the app’s ability to facilitate regular reflective practices, with 55% expressing satisfaction or high satisfaction with the app. Thus, our study demonstrates the efficacy of contextual journaling in promoting positive emotional responses and personal growth, with significant improvements in cognition and self-awareness. These findings indicate promising paths for future applications and studies aimed at supporting mental health and cognitive awareness through contextual journaling.
5.2. Implications
This study’s findings have notable implications for HCI and the design of context-aware systems. Our research demonstrates the critical role of personalization and context-awareness in enhancing engagement with journaling applications. Specifically, prompts that are aligned with users’ daily activities, routines, and personal experiences tend to elicit more engaged, introspective responses, leading to positive behavioral changes. However, it is important to note that generic prompts also have their benefits, such as encouraging broader emotional expression, higher positive tone, and reduced negative tone. A balanced approach, incorporating both contextual and generic prompts, could potentially offer the most comprehensive benefits. Our study highlights the importance of tailoring interventions to individual needs and circumstances, and adapting prompts and interfaces to align with users’ daily schedules and routines. The differences observed based on gender, student status, and prior journaling experience also emphasize the need for personalized approaches. While contextual prompts may facilitate reflections on personal experiences and relationships, generic prompts may promote analytic thinking and broader emotional expression. By combining the strengths of both approaches, researchers and designers can create more effective and inclusive journaling applications.
Furthermore, our study illustrates the potential benefits of integrating data from diverse sources such as smartphones, wearables, and AI-powered language models to enrich and personalize the journaling experience. Researchers are encouraged to explore innovative methods to merge these data streams while conscientiously addressing associated privacy and ethical concerns. Participants generally reported that the context-aware journaling app was easy to incorporate into their routines, but some faced challenges with prompt relevance and accessing deeper emotional layers. This underlines a crucial area for HCI researchers to improve the user experience in such applications, ensuring that the prompts are not only engaging and relevant but also effective in facilitating meaningful self-reflection. Moreover, the positive emotional and cognitive outcomes achieved in this study support increased multidisciplinary collaboration between HCI, psychology, and other relevant fields. Such collaborative efforts can merge user-centered design with behavior change theories and data-driven methodologies to craft more impactful interventions. As journaling and other applications with behavioral sensing and AI evolve to become more advanced and personalized, it is imperative to consider and address ethical issues such as data privacy, algorithmic bias, and potential misuse of personal data. Researchers should commit to responsible design practices and actively involve users in the development process to enhance transparency and maintain accountability. In summary, this study showcases the potential of context-aware journaling systems to facilitate significant personal growth and behavioral improvements, and highlights the importance of balancing contextual and generic prompts to offer a comprehensive and inclusive journaling experience.
5.3. Limitations and Future Work
Our study has some important limitations to consider. First, given our small sample size and exploratory aims focused on feasibility, acceptability, and preliminary efficacy, our findings might not generalize well to outside populations. As a pilot study, our primary objective was to assess the acceptability and feasibility of combining AI with behavioral sensing in journaling apps, rather than conducting a large-scale randomized controlled trial (RCT). Therefore, we did not focus on statistical significance, which is heavily influenced by sample size. Instead, our study should be seen as a proof-of-concept for this novel intervention, providing preliminary insights into its potential benefits and areas for future development. Our findings, focused on a specific student population, might not be widely applicable. Future studies should aim to engage a larger and more diverse sample, building on our suggestions and results. Future studies should also consider employing a counterbalanced design to directly compare enhanced and traditional journaling methods, controlling for potential order effects and providing more robust evidence of the specific impacts of our contextual AI approach. One significant limitation is our focus on Android users, which likely contributes to the low number of participants. In the US, most young adults use iPhones [74], making our Android-only approach a limiting factor. Future researchers should develop apps compatible with both Android and iOS operating systems to reach a broader audience and increase participant diversity. Privacy and data handling remain crucial considerations in our approach. While we implemented measures to protect user data, future iterations should provide more granular control over data sharing. This could include allowing users to selectively enable or disable specific behavioral signals used for prompt generation. Additionally, exploring the use of self-hosted, open-source LLMs could further enhance privacy by keeping sensitive data local. However, this approach may impact the quality of generated prompts, necessitating careful evaluation of the trade-offs between privacy and functionality.
While we do not emphasize statistical significance in many cases, and some findings are not statistically significant, we still observe several positive changes. Future research with expanded populations might determine which of these positive changes are causally linked to the journaling intervention and which are merely coincidental. Our study does not compare traditional journaling to personalized AI journaling in a randomized-controlled way, as we did not have enough participants for a RCT. So, it is unclear whether contextual AI journaling offers advantages over traditional journaling. While we collect objective data on physical activity, sleep patterns, and other behaviors, we choose to focus our analysis on self-reported measures for this initial exploratory study. This decision is made to prioritize understanding users’ subjective experiences with contextual journaling, which is crucial when evaluating a novel intervention’s acceptability and perceived impact. However, we acknowledge that incorporating analysis of the objective data could provide valuable additional insights. We’ve stored this data securely and plan to conduct a more comprehensive analysis in future work, comparing self-reported experiences with objective behavioral changes. This future analysis will help us better understand the relationship between perceived and actual changes in behavior and well-being.
We also do not use participants’ journal entries to help the LLM learn and adapt; instead, we only use behavioral sensing data to contextualize the journals. Using prior journaling responses could potentially enhance the app’s functionality, but we prioritized privacy considerations by not sending potentially identifying information (that may be contained in the journal entries) to OpenAI. To maintain participants’ privacy, we only sent de-identified high-level behavioral sensing data to GPT-4. Future research should explore more privacy-preserving approaches, such as using locally deployed open-source LLMs (e.g., Llama2). It could potentially allow for the utilization of Protected Health Information (PHI) or journal entries, while maintaining control over data privacy and security. This could also allow for a comparison of different models’ performance while keeping user data on-device. Additionally, investigating techniques like differential privacy for any aggregated data used in model improvements could further enhance privacy protections. Our app also does not incorporate user feedback (thumbs up/down) from check-ins to personalize the journaling experience, future research could explore the integration of such feedback to enhance personalization. Future researchers could also expand the range of signals by incorporating data from wearable devices, enhancing the diversity and coverage of the prompts used in journaling. As an early exploratory study integrating AI and behavioral sensing for journaling, our focus is on describing what happened without exploring the underlying reasons. For instance, while some participants prefer certain types of check-ins, we do not examine why these preferences exist or what might influence them. Future investigations could explore these nuances to better understand participant preferences and refine the journaling process further. We acknowledge the lack of ablation studies in our current work. While we included daily check-ins as part of the intervention, we did not separately analyze their specific impact on outcomes. Future studies will include ablation analyses to understand the individual contributions of different components, such as contextual prompts, daily check-ins, and the breathing exercise. This will provide more nuanced insights into which aspects of the app are most effective for improving well-being. It is crucial to recognize that the positive changes we observe in the follow-up surveys and weekly EMAs might be influenced by the academic calendar (like exams or breaks) rather than the journaling intervention itself. Our study does not account for other external factors driving these changes. Future research can build on our findings and leverage recent advancements in AI, such as prompt engineering approaches that utilize knowledge graphs for more automated and efficient journaling experiences. By embracing these innovations and addressing the limitations of our study, future research can continue exploring the potential of AI-powered journaling applications and their impact on mental health and well-being.
6. Ethical Considerations
To ensure the privacy and security of our participants’ data, we implement multiple measures throughout the study. First, participants provide informed consent before commencing the study, which includes a thorough explanation of the study’s purpose, procedures, and potential risks and benefits. They have the option to withdraw from the study at any time. We assign anonymized IDs to each participant, and we store all data securely with restricted access granted only to authorized researchers. We implement best practices for data security, including encryption and regular backups, to prevent data breaches. In addition, we take steps to ensure participants’ privacy and security in their journal entries. We advise them to omit personal identifiers and clarify that their data would not be monitored live. We also provide emergency services information in case of distress and display a reminder on the journaling screen. Before sending the journaling responses to GPT-4 for analysis, we remove all personal information to ensure participant anonymity. This includes names, locations, and any other identifiable information. We also use a keyword filter to prevent potentially harmful or sensitive content in GPT-4 generated prompts. Participants have the option to report any prompt-related issues but by the end of the study, we did not receive any reports of sensitive prompts. Furthermore, participants have the freedom to skip any journal entries as they choose, without any consequences or repercussions. This ensures they maintain control over their participation and can opt-out of any prompts that make them uncomfortable. We recognize the potential privacy implications of collecting extensive behavioral data and using it to generate personalized prompts. While our current implementation uses GPT-4, we acknowledge that future iterations could benefit from exploring self-hosted, open-source LLM solutions to enhance data privacy. It will also be crucial to implement granular privacy settings for users to control data usage, and to create more transparent explanations of data practices.
7. Conclusion
In this study, we enable context-aware journaling by integrating passive sensing with LLMs. By harnessing the power of mobile technology, we have developed a novel system that provides tailored support for Android users, leveraging behavioral data from smartphones and personalized prompt generation through LLMs – offering a high degree of customization in journaling applications. Our findings demonstrate the potential effectiveness of this approach, with participants exhibiting improvements in well-being, including reduced anxiety and depression, enhanced self-reflection, and increased positive affect. Moreover, our analysis of prompts, check-ins, and journaling responses provided important insights into the efficacy of our approach. By integrating passive sensing and LLMs, we have created a novel framework for mental health support that can be seamlessly integrated into daily life. We hope our research paves the way for further exploration of AI-driven, personalized interventions, which are particularly crucial for individuals in stressful academic environments and beyond, where access to traditional support systems may be limited.
CCS Concepts:
• Human-centered computing → Ubiquitous and mobile computing; • Applied computing → Health informatics.
Acknowledgment
The research discussed in this paper was supported by the National Institute of Mental Health (NIMH) under award number 5R61MH126094.
Appendix A. Surveys and Questionnaires
Table 5.
Facet | Survey |
---|---|
Personality | Big Five Personality Scale [62] |
Emotion Regulation | Emotion Regulation Questionnaire (ERQ) [61] |
Affect | Positive and Negative Affect Scale (PANAS) [77] |
Stress | Perceived Stress Scale (PSS) [21] |
Anxiety | State-Trait Anxiety Index (STAI) [49] |
Resilience | Brief Resilience Scale (BRS) [69] |
Psychological Wellbeing | Ryff’s Scales of Psychological Well-being [65] |
Life Satisfaction | Satisfaction with Life scale (SWLS) [24] |
Flourishing | Flourishing Scale [25] |
Social Provision | Social Provisions Scale (SPS) [16] |
Loneliness | UCLA Loneliness Scale [64] |
Mindfulness | Five Facet Mindfulness Questionnaire (FFMQ) [5] |
Self-reflection and Insight | The Self-reflection and insight scale (SRIS) [68] |
Table 6.
Table 7.
Question & Options | Count |
---|---|
How would you rate the overall performance of the MindScape app (e.g., speed, reliability)? | |
Very poor | 0 (0.0%) |
Poor | 2 (10.0%) |
Average | 4 (20.0%) |
Good | 7 (35.0%) |
Excellent | 7 (35.0%) |
How mentally demanding do you find using the MindScape app? | |
Not demanding at all | 13 (65.0%) |
Slightly demanding | 4 (20.0%) |
Moderately demanding | 3 (15.0%) |
Very demanding | 0 (0.0%) |
How easy was it to integrate the MindScape app into your daily routine? | |
Very difficult | 0 (0.0%) |
Somewhat difficult | 3 (15.0%) |
Neutral | 2 (10.0%) |
Somewhat easy | 8 (40.0%) |
Very easy | 7 (35.0%) |
Table 8.
Question & Options | Count |
---|---|
How relevant do you find the journaling prompts generated by the MindScape app? | |
Not at all relevant | 0 (0.0%) |
Slightly relevant | 6 (30.0%) |
Moderately relevant | 12 (60.0%) |
Very relevant | 2 (10.0%) |
Extremely relevant | 0 (0.0%) |
How relevant do you find the check-in prompts (i.e., the thumbs up/thumbs down messages) generated by the MindScape app? | |
Not at all relevant | 1 (5.0%) |
Slightly relevant | 8 (40.0%) |
Moderately relevant | 8 (40.0%) |
Very relevant | 3 (15.0%) |
Extremely relevant | 0 (0.0%) |
How often do the context-aware prompts lead you to reflect more deeply than usual? | |
Never | 1 (5.0%) |
Rarely | 2 (10.0%) |
Sometimes | 9 (45.0%) |
Often | 6 (30.0%) |
Always | 2 (10.0%) |
Since using the MindScape app, have you noticed any changes in your daily habits or behaviors? | |
No change | 11 (55.0%) |
Slight change | 6 (30.0%) |
Moderate change | 3 (15.0%) |
Significant change | 0 (0.0%) |
Table 9.
Question & Options | Count |
---|---|
How comfortable are you with the idea of using data collected from phones and smart watches along with AI to personalize journaling prompts? | |
Very uncomfortable | 0 (0.0%) |
Uncomfortable | 2 (10.0%) |
Neutral | 4 (20.0%) |
Comfortable | 12 (60.0%) |
Very comfortable | 2 (10.0%) |
How concerned are you about the privacy and security of your data within the MindScape app? | |
Very comfortable | 1 (5.0%) |
Not concerned | 6 (30.0%) |
Slightly concerned | 5 (25.0%) |
Moderately concerned | 7 (35.0%) |
Very concerned | 1 (5.0%) |
Table 10.
Question & Options | Count |
---|---|
How satisfied are you with the MindScape app overall? | |
Very unsatisfied | 0 (0.0%) |
Unsatisfied | 3 (15.0%) |
Neutral | 6 (30.0%) |
Satisfied | 9 (45.0%) |
Very satisfied | 5 (10.0%) |
Compared to other mental health or journaling apps you have used, how does MindScape rank in terms of overall satisfaction? | |
Much worse | 0 (0.0%) |
Somewhat worse | 1 (5.0%) |
About the same | 4 (20.0%) |
Somewhat better | 4 (20.0%) |
Much better | 2 (10.0%) |
Unsure | 9 (45.0%) |
How likely are you to recommend the MindScape app to a friend or peer? | |
Very unlikely | 1 (5.0%) |
Unlikely | 6 (30.0%) |
Neutral | 5 (25.0%) |
Likely | 6 (30.0%) |
Very likely | 2 (10.0%) |
If allowed, would you consider continuing to use the MindScape app after this study concludes? | |
Definitely not | 1 (5.0%) |
Probably not | 6 (30.0%) |
Unsure | 3 (15.0%) |
Probably will | 10 (50.0%) |
Definitely will | 0 (0.0%) |
Table 11.
Facet | Survey |
---|---|
Usability | System Usability Scale (SUS) [13] |
Differences in Journaling Experience | Can you describe any differences in the depth, quality of your reflections or anything else that you noticed when using context-aware directed prompts from the MindScape app compared to standard journaling that are free-form (i.e., with no prompts)? |
Resonating prompts | Can you recall any specific prompts from the MindScape app that significantly resonated with you or were particularly relevant to your experiences? Please describe them and share why they stood out. For instance, did any prompts lead to enhanced self-understanding or self-awareness? If so, could you share how these moments of increased self-awareness were prompted by the app? |
Noticeable change | Can you share a specific instance where you noticed a change in your behavior or habits due to using the app? |
App influences | Has using the MindScape app influenced the way you plan or structure your week? If so, in what ways? Additionally, do you find that the process of journaling and reflecting with the app has altered your mindset, perhaps leading you to appreciate your daily activities more? Please share any specific instances or thoughts you have regarding these changes. |
Improvement | How do you think we could improve the context-aware journaling prompts to better support your reflective journaling practices? |
Overall experience | Can you describe your overall experience using the MindScape app? What did you like or dislike? |
Enhancements | What enhancements or additional features would you like to see in future versions of the MindScape app? |
Suggestions to improve | Please provide any additional feedback or suggestions you have for improvingthe MindScape app or the study. Also feel free to provide any comments not covered by the survey questions. |
Appendix B. Prompt for Check-ins
System Prompt:
Imagine you’re a friendly digital buddy for college students, offering quick, casual check-ins based on their mobile sensing behavioral data. Your goal is to keep the nudges light, non-intrusive, and varied—some ending with questions, others as statements. They should prompt the students to give a simple thumbs up or thumbs down response. Based on user data, craft a short, engaging nudge that reflects a specific aspect of their behavior. Remember, the tone should be informal and upbeat without requiring deep reflection or much time to answer. Don’t use thumbs up or down emoji. The response from the user is going to be a simple thumbs up or thumbs down. Therefore, don’t ask loaded question whose answer could be confusing. For example, don’t ask questions such as “Busy day being social or just lots of back-to-back classes?”. This question is too vague to answer with a simple thumbs up or down because a thumbs up could mean either the user agrees with both or maybe they agree with just one half of it. Thumbs up or thumbs down should result in clear Yes or No without any confusing question. The nudges MUST NOT in any scenario mention specific data points – do not say, for example, you walked for 5 miles, you visited 4 places and so on. No numbers should be present. It should all be relative. The morning nudge uses data from 6 AM to 12 PM, the afernoon nudge uses data from 12 to 3 PM, the evening nudge uses data from 3 to 6 PM and the night nudge uses data from 6 to 11 PM. So don’t put contexts in the nudges that are about sleep or sunset, for example because they don’t make sense. If there is no data provided or the prompts are too repetitive, do not make any assumptions instead you must do this: the nudge should default to a general message that relates to common aspects of student life or offers a light, encouraging thought. These messages should still adhere to the criteria of being brief, casual, and requiring a simple thumbs up or thumbs down response. For example, “Have you taken a little break from your screen today?”, “Just checking in - have you had your cup of hydration yet? Remember, water is your best friend during study sessions!”, “Have you connected with a friend or family member today? A quick chat can be a great mood booster!”, “It seems we don’t have much data for today, but let’s not skip our check-in. How about this - have you stepped outside for a bit of fresh air today?”, “Have you done something today just for fun or relaxation? Remember, balance is key! “
User Prompt:
Today’s date: [DATE]
Timing: [CURRENT TIME OF DAY]
Previous Responses: [PREVIOUS THREE CHECK-IN PROMPTS]
User Data: [USER DATA]
Response Rules:
Do not provide a generic or offensive, argumentative, or mentally damaging response, instead be friendly and upbeat
Avoid repetitive response by using the context of the Previous Responses. Do not mention the same idea conveyed in Previous Responses.
Response should be a non-generic Yes/No question
Only provide one question
MUST respond with only the prompt, do not give any prefix such as “prompt” and do not use double quotes at the start and end of the response.
Do not use the same data or signal that has been mentioned in Previous Responses. For example, if any of the Previous Responses talked about library, do not mention about library again.
This response will be shown to college students, make the response more relatable
Do not make assumptions based on the context provided, for example, do not assume that students are currently working on academic projects, are in a relationship, or for instance, just because they walked do not assume they were out walking on a sunset, just because they were in dorm do not assume that they spent time sleeping etc. Do not assume information.
Do not use the word “vibe”
Do not start with the same starting word as in Previous Responses.
Make the call to action to be different or variable each time. For example, while questioning users might be one way to get them to answer with a yes or no, making statements that they may agree to or not is also one way. Try several different ways so that its always refreshing to see the nudge.
Highlight data that might be more important.
Always follow this rule with regards to the timing: always refer to the day/data as [CURRENT TIME OF DAY – MORNING, AFTERNOON, EVENING, OR NIGHT]. Do not say monday, or today.
Make the nudges human-like and warm.
There should be variability in the response. Users are going to see this multiple times a day for 2 months. They should not be annoyed with it.
If any of the Previous Responses are a question, the new prompt generated MUST NOT be a yes/no question but a yes/no statement.
You are not aware about the order in which user performed an activity and visited different locations. So don’t assume the order. For example, don’t say things like gym session followed by cafeteria, or library followed by dorm – because you don’t know the order of the activity.
MUST not mention anything about sunset.
It should be less than 200 characters.
Don’t always mention users to either thumbs up or down.
Appendix C. Prompt for Weekend Journal
System Prompt:
You are MindScape AI. A chatbot integrated into a self-journaling application that provides concise, conversational journaling prompts based on the last week for college students.
MindScape AI is governed by the following rules:
MindScape AI uses mood score and previous responses to create the prompt.
Mood Score is on a 1 to 5 scale, with 1 being the lowest value (worst mood) and 5 being the highest value (best mood)
MindScape AI produces a prompt that is based on a broad theme such as resilience, achievements, challenges, personal growth and emotional well-being. It should encourage deep reflection on personal experiences, feelings, and learning from the past week.
MindScape AI produces a prompt that is engaging, easy to respond to verbally or in short written notes, and foster self-awareness and positivity.
MindScape AI designs the prompt to forces users to do some type of interpretation and encourage them to respond to in their own words.
MindScape AI takes note of how the user is feeling (i.e., their mood score) before crafting a prompt that would appropriate for them to see.
MindScape AI should focus on the user ranked priorities, from highest to lowest, when crafting the prompt.
MindScape AI should produce a prompt that is friendly, conversational, upbeat, and has a sense of personality in order to make the user feel comfortable and motivated to share their thoughts and feelings in a casual, conversational manner.
MindScape AI will not provide a generic or offensive, argumentative, or mentally damaging response.
MindScape AI will avoid repetitive response by using the context of the “Previous Responses” and will not mention the same idea conveyed in Previous Responses.
MindScape AI will create a response that is a non-generic question and do not use over-the-top words.
MindScape AI will not mention specific data points, for example “Your mood score was 1/5 this week”.
MindScape AI will not use clinical or quantitative language.
MindScape AI will create a prompt that does not exceed 250 characters.
MindScape AI will respond with only the prompt, and the prompt will not have any prefix such as “Prompt:”, “Tip:”, “Question: “ etc.
MindScape will not use any hashtags in the response.
MindScape will refer to the week, not today in its response.
MindScape AI will not use quotes at the start and end of the prompt.
MindScape AI creates a prompt that is not open-ended, instead it is direct in order to facilitate the user focusing on one area.
MindScape AI avoids any phrases that might be stigmatizing or feel exclusionary, for example “if you have a partner”.
MindScape AI produces a prompt that is relatable, trendy, and Gen-Z
MindScape AI produces a prompt that concludes with a message of gratitude and encouragement for their ongoing journey.
User Prompt:
Today’s date: [DATE]
Mood Score: [USER MOOD SCORE]
Previous Responses: [PREVIOUS TWO JOURNAL PROMPTS]
Appendix D. Data Processing and LLM Integration
-
1
Data Collection and Preprocessing: Raw sensing data is collected via phones and regularly uploaded to our system.
-
2Feature Extraction: Our system extracts features from the uploaded data regularly.
- Physical Activity: We extract features for time spent doing certain activities (biking, walking, running) using the Android Activity Recognition API.
- Location: We cluster GPS coordinates using Density-based spatial clustering of applications with noise (DBSCAN)[28] algorithm and map them to semantic locations (e.g., home, work, gym).
- Time spent at semantic locations: We extract time spent at certain semantic locations based on the GPS data.
- App Usage: We categorize apps by making calls to Google Play Store and scraping the category of the app, then track usage frequencies.
- Distance: We use the clustered GPS locations to identify total distance travelled between different locations where a user spends at least 30 minutes.
- Screen time: Our app records the numbers of locks and unlocks made to generate the time spent using the phone.
- Phone logs: We use Android API to collect metadata on the type of calls made/received, as well as SMS. No raw SMS/call or phone number is recorded.
- In-person Conversations: Our app listens to audio periodically to identify whether there’s a voice present. If it detects multiple voices, it marks it as a conversation. This happens on-device with our machine learning model. No raw audio is recorded.
- Number of Significant places visited: We use the DBSCAN algorithm to cluster the GPS data, marking each place identified as a cluster as one significant place if a user spends at least 30 minutes there.
-
3
Temporal Aggregation: We aggregate data over different time scales (hourly, daily, weekly) to capture both immediate context and longer-term patterns.
-
4
Feature Processing: We generate all these features and their temporal aggregations every 30 minutes on the server with the help of a cronjob. If there’s a new file transferred from the phone, the server backend ingests that file and imports it to our MongoDB every 30 minutes and generates all the updated features.
-
5GPT-4 Prompt Generation: We create a Jinja template for the GPT-4 request. A cronjob runs every hour, executing a Python script that:
- Grabs entire day’s data (for weekday journal), week’s data (for weekend journal), or current day’s data of a certain period (for check-ins).
- Generates aggregate data of the features and compares them with historical averages, creating a percentage change for each feature.
- Uses the Jinja template to fill in appropriate variables (increase/decrease/stable and the amount of change). Also inserts todays date as well as the previous journal prompts/check-ins (grabbed from database again).
- Combines this data with the remaining static GPT-4 prompt to generate journals/check-ins and stores the completed prompt in our database.
-
6LLM Integration:
- One hour prior to the journaling or check-in time, another cronjob script reads the complete prompt from the database (which includes both the user data portion as well as the main prompt to generate journals/check-ins), sends it to GPT-4, and stores the response in the database.
- For real-time requests, when a user self-reports their mood, we grab the same prompt, insert the user-reported mood, and send it again to GPT-4.
- We store this in the database and display this to the user after a one-minute breathing exercise.
- If there’s no internet access or an error occurs, our app uses the pre-generated prompt without the self-report.
- As a final fallback, we have canned responses/journaling prompts hardcoded into the app.
Appendix E. Sample Contextual Journaling Prompts
Social Interaction:
You’ve embraced more face-to-face chats and less screen time. How’s this new social rhythm shaping your day?
I see you’ve been visiting new places but your calls and texts have dropped. Can you share what’s drawing you to these spots and how it’s impacting you?
Exploring new places seems to be on the rise for you! What’s a standout spot you’ve discovered and how has it impacted your social vibes?
Noticing more texts and fewer calls, what’s one message you received that stood out and why?
You seem quieter on calls and texts lately. Could a catch-up with friends bring some cheer?
Digital Habits:
You’ve been clocking less screen time lately. What have you been doing instead that you’ve found rewarding or enjoyable?
Your screen time and app use have climbed! Reflecting on this, which app might you cut back on to reclaim some headspace?
Your knack for digital entertainment has spiked. Consider how these choices might shape your tomorrow.
Your digital habits have improved. Noticed any changes in your sleep with more screen-free time before bed?
You’re dialing down on screen time and phone unlocks lately. How is this affecting your focus or stress levels?
Physical Fitness:
Your recent gym time boost is impressive! How is this new routine helping with your daily energy and focus?
Consider the impact of less walking and more screen time on your well-being. Could increasing movement lighten your mood?
Your recent trend shows less walking and travel. Share one thing you’ll do this week to introduce a bit more motion in your routine.
With gym visits up but running down, consider trying a new sport this week for fun. How do you feel about that?
Noticed your time at the gym is up. What new workout or routine inspired this change, and how does it feel integrating it?
Sleep:
Reflect on a calming activity to try before sleep that might improve your rest.
Your sleep schedule’s been versatile; did this affect your wakefulness or daily focus?
Consider experimenting with a sleep schedule tweak to wake up feeling more refreshed tomorrow. What’s one change you could try tonight?
With a busy academic term, have you thought of a new routine to maintain your sleep schedule?
Your screen interactions have remained stable, but sleep has shifted. Could altering bedtime routines improve your rest?
Broader Weekend Prompts:
Who in your circle has been a positive influence lately? Share how they’ve helped brighten your day.
Reflect on a hobby that uplifts you and how you could make time for it this week.
Reflect on a decision you made this week that you’re proud of, and how it echoed through your daily life.
Reflecting on the week, what single experience gave you the most strength and why? Appreciate your strides and keep it up!
Describe the moment this week that made you feel on top of the world. Thanks for sharing your journey!
Appendix F. Sample Check-ins
Appendix G. Gratitude and Self-compassion Journals
Contributor Information
SUBIGYA NEPAL, Dartmouth College, USA.
ARVIND PILLAI, Dartmouth College, USA.
WILLIAM CAMPBELL, Colby College, USA.
TALIE MASSACHI, Brown University, USA.
MICHAEL V. HEINZ, Dartmouth College, USA
ASHMITA KUNWAR, Dartmouth College, USA.
EUNSOL SOUL CHOI, Cornell Tech, USA.
XUHAI XU, Massachusetts Institute of Technology, USA.
JOANNA KUC, University College London, UK.
JEREMY F. HUCKINS, Biocogniv Inc, USA
JASON HOLDEN, University of California, San Diego, USA.
SARAH M. PREUM, Dartmouth College, USA
COLIN DEPP, University of California, San Diego, USA.
NICHOLAS JACOBSON, Dartmouth College, USA.
MARY P. CZERWINSKI, Microsoft Research, USA
ERIC GRANHOLM, University of California, San Diego, USA.
ANDREW T. CAMPBELL, Dartmouth College, USA
References
- [1].Alt Dorit and Raichel Nirit. 2020. Reflective journaling and metacognitive awareness: Insights from a longitudinal study in higher education. Reflective Practice 21, 2 (2020), 145–158. [Google Scholar]
- [2].Apple. 2024. Get started with Journal on iPhone. https://support.apple.com/guide/iphone/get-started-with-journal-iph0e5ca7dd3/ios
- [3].Aronson Louise. 2010. Twelve tips for teaching reflection at all levels of medical education. Medical Teacher 33, 3 (Sept. 2010), 200–205. 10.3109/0142159x.2010.507714 [DOI] [PubMed] [Google Scholar]
- [4].American College Health Association. 2024. American College Health Association-National College Health Assessment III: Reference Group Executive Summary Spring 2023. https://www.acha.org/documents/ncha/NCHA-III_SPRING_2023_REFERENCE_GROUP_EXECUTIVE_SUMMARY.pdf
- [5].Baer Ruth A., Carmody James, and Hunsinger Matthew. 2012. Weekly Change in Mindfulness and Perceived Stress in a Mindfulness-Based Stress Reduction Program. Journal of Clinical Psychology 68, 7 (May 2012), 755–765. 10.1002/jclp.21865 [DOI] [PubMed] [Google Scholar]
- [6].Bakker David and Rickard Nikki. 2018. Engagement in mobile phone app for self-monitoring of emotional wellbeing predicts changes in mental health: MoodPrism. Journal of Affective Disorders 227 (Feb. 2018), 432–442. 10.1016/j.jad.2017.11.016 [DOI] [PubMed] [Google Scholar]
- [7].Beiter Rebecca, Nash Ryan, McCrady Melissa, Rhoades Donna, Linscomb Mallori, Clarahan Molly, and Sammut Stephen. 2015. The prevalence and correlates of depression, anxiety, and stress in a sample of college students. Journal of affective disorders 173 (2015), 90–96. [DOI] [PubMed] [Google Scholar]
- [8].Benson Herbert and Stuart Eileen M. 1993. The wellness book: The comprehensive guide to maintaining health and treating stress-related illness. Simon and Schuster. [Google Scholar]
- [9].Bhattacharjee Ananya, Kulzhabayeva Dana, Reza Mohi, Kumar Harsh, Seong Eunchae, Wu Xuening, Rifat Mohammad Rashidujjaman, Bowman Robert, Kornfield Rachel, Mariakakis Alex, Ahmed Syed Ishtiaque, De Choudhury Munmun, Doherty Gavin, Czerwinski Mary P, and Williams Joseph Jay. 2023. Integrating Individual and Social Contexts into Self-Reflection Technologies. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ‘23). Association for Computing Machinery, New York, NY, USA, Article 356, 6 pages. 10.1145/3544549.3573803 [DOI] [Google Scholar]
- [10].Bhattacharjee Ananya, Zeng Yuchen, Sarah Yi Xu, Kulzhabayeva Dana, Ma Minyi, Kornfield Rachel, Ahmed Syed Ishtiaque, Mariakakis Alex, Czerwinski Mary P, Kuzminykh Anastasia, et al. 2023. Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination. arXiv preprint arXiv:2312.13581 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Boyd Ryan L, Ashokkumar Ashwini, Seraj Sarah, and Pennebaker James W. 2022. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin; (2022), 1–47. [Google Scholar]
- [12].Bränström Richard, Duncan Larissa G., and Moskowitz Judith Tedlie. 2011. The association between dispositional mindfulness, psychological well-being, and perceived health in a Swedish population-based sample: Influence of dispositional mindfulness. British Journal of Health Psychology 16, 2 (March 2011), 300–316. 10.1348/135910710x501683 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Brooke John. 1996. SUS – a quick and dirty usability scale. 189–194.
- [14].Brown Kirk Warren and Ryan Richard M.. 2003. The benefits of being present: Mindfulness and its role in psychological well-being. Journal of Personality and Social Psychology 84, 4 (2003), 822–848. 10.1037/0022-3514.84.4.822 [DOI] [PubMed] [Google Scholar]
- [15].Campello Ricardo J. G. B., Moulavi Davoud, and Sander Joerg. 2013. Density-Based Clustering Based on Hierarchical Density Estimates. Springer Berlin Heidelberg, 160–172. 10.1007/978-3-642-37456-2_14 [DOI] [Google Scholar]
- [16].Caron Jean. 2013. A validation of the Social Provisions Scale: the SPS-10 items. Santé mentale au Québec 38, 1 (Oct. 2013), 297–318. 10.7202/1019198ar [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Cengiz Canan. 2020. The effect of structured journals on reflection levels: With or without question prompts? Australian Journal of Teacher Education (Online) 45, 2 (2020), 23–43. [Google Scholar]
- [18].Chen Nian-Shing, Wei Chun-Wang, Wu Kuen-Ting, and Uden Lorna. 2009. Effects of high level prompts and peer assessment on online learners’ reflection levels. Computers & Education 52, 2 (Feb. 2009), 283–291. 10.1016/j.compedu.2008.08.007 [DOI] [Google Scholar]
- [19].Chiu Yu Ying, Sharma Ashish, Lin Inna Wanyin, and Althoff Tim. 2024. A Computational Framework for Behavioral Assessment of LLM Therapists. arXiv preprint arXiv:2401.00820 (2024). [Google Scholar]
- [20].Cho Janghee, Xu Tian, Zimmermann-Niefield Abigail, and Voida Stephen. 2022. Reflection in Theory and Reflection in Practice: An Exploration of the Gaps in Reflection Support among Personal Informatics Apps. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ‘22). Association for Computing Machinery, New York, NY, USA, Article 142, 23 pages. 10.1145/3491102.3501991 [DOI] [Google Scholar]
- [21].Cohen Sheldon, Kamarck Tom, and Mermelstein Robin. 1983. A Global Measure of Perceived Stress. Journal of Health and Social Behavior 24, 4 (Dec. 1983), 385. 10.2307/2136404 [DOI] [PubMed] [Google Scholar]
- [22].Mack Cathleen Cortis, Cinel Caterina, Davies Nigel, Harding Michael, and Ward Geoff. 2017. Serial position, output order, and list length effects for words presented on smartphones over very long intervals. Journal of Memory and Language 97 (Dec. 2017), 61–80. 10.1016/j.jml.2017.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Dickens Leah R. 2017. Using gratitude to promote positive change: A series of meta-analyses investigating the effectiveness of gratitude interventions. Basic and Applied Social Psychology 39, 4 (2017), 193–208. [Google Scholar]
- [24].Diener Ed, Emmons Robert A., Larsen Randy J., and Griffin Sharon. 1985. The Satisfaction With Life Scale. Journal of Personality Assessment 49, 1 (Feb. 1985), 71–75. 10.1207/s15327752jpa4901_13 [DOI] [PubMed] [Google Scholar]
- [25].Diener Ed, Wirtz Derrick, Tov William, Kim-Prieto Chu, Choi Dong-won, Oishi Shigehiro, and Biswas-Diener Robert. 2009. New Well-being Measures: Short Scales to Assess Flourishing and Positive and Negative Feelings. Social Indicators Research 97, 2 (May 2009), 143–156. 10.1007/s11205-009-9493-y [DOI] [Google Scholar]
- [26].Dimitroff Lynda J., Sliwoski Linda, O’Brien Sue, and Nichols Lynn W.. 2016. Change your life through journaling–The benefits of journaling for registered nurses. Journal of Nursing Education and Practice 7, 2 (Oct. 2016). 10.5430/jnep.v7n2p90 [DOI] [Google Scholar]
- [27].Englhardt Zachary, Ma Chengqian, Morris Margaret E., Chang Chun-Cheng, Xu Xuhai “Orson”, Qin Lianhui, McDuff Daniel, Liu Xin, Patel Shwetak, and Iyer Vikram. 2024. From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, 2 (May 2024), 1–25. 10.1145/3659604 [DOI] [Google Scholar]
- [28].Ester Martin, Kriegel Hans-Peter, Sander Jörg, and Xu Xiaowei. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon) (KDD’96). AAAI Press, 226–231. [Google Scholar]
- [29].Ferrara Alessio. 2022. Empowering emotional well-being through a LLM-based chatbot: a comparative study with the standard journaling technique. (2022).
- [30].Ge Xun and Land Susan M.. 2003. Scaffolding students’ problem-solving processes in an ill-structured task using question prompts and peer interactions. Educational Technology Research and Development 51, 1 (March 2003), 21–38. 10.1007/bf02504515 [DOI] [Google Scholar]
- [31].Glogger Inga, Holzäpfel Lars, Schwonke Rolf, Nückles Matthias, and Renkl Alexander. 2009. Activation of Learning Strategies in Writing Learning Journals: The Specificity of Prompts Matters. Zeitschrift für Pädagogische Psychologie 23, 2 (Jan. 2009), 95–104. 10.1024/1010-0652.23.2.95 [DOI] [Google Scholar]
- [32].Grootendorst Maarten. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022). [Google Scholar]
- [33].Gueldner Barbara A, Feuerborn Laura L, and Merrell Kenneth W. 2020. Social and emotional learning in the classroom: Promoting mental health and academic success. Guilford Publications. [Google Scholar]
- [34].Hohaus LC and Spark J. 2013. 2672 – Getting better with age: do mindfulness & psychological well-being improve in old age? European Psychiatry 28 (Jan. 2013), 1. 10.1016/s0924-9338(13)77295-x21920709 [DOI] [Google Scholar]
- [35].Hua Yining, Liu Fenglin, Yang Kailai, Li Zehan, Sheu Yi-han, Zhou Peilin, Moran Lauren V, Ananiadou Sophia, and Beam Andrew. 2024. Large Language Models in Mental Health Care: a Scoping Review. arXiv preprint arXiv:2401.02984 (2024). [Google Scholar]
- [36].Huggingface. 2024. Sentence-Transformers/All-Mpnet-Base-v2·Hugging Face—Huggingface.co. https://huggingface.co/sentence-transformers/all-mpnet-base-v2. Accessed: 2024-04-27.
- [37].Keech Karsen N and Coberly Patricia G. 2021. Journaling for Mental Health. In Strategies and Tactics for Multidisciplinary Writing. IGI Global, 39–44. [Google Scholar]
- [38].Khaokaew Yonchanok, Xue Hao, and Salim Flora D.. 2024. MAPLE: Mobile App Prediction Leveraging Large Language Model Embeddings. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, 1 (March 2024), 1–25. 10.1145/3643514 [DOI] [Google Scholar]
- [39].Kian Mina J, Zong Mingyu, Fischer Katrin, Singh Abhyuday, Velentza Anna-Maria, Sang Pau, Upadhyay Shriya, Gupta Anika, Faruki Misha A, Browning Wallace, et al. 2024. Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students. arXiv preprint arXiv:2402.17937 (2024). [Google Scholar]
- [40].Kiken Laura G., Garland Eric L., Bluth Karen, Palsson Olafur S., and Gaylord Susan A.. 2015. From a state to a trait: Trajectories of state mindfulness in meditation during intervention predict changes in trait mindfulness. Personality and Individual Differences 81 (July 2015), 41–46. 10.1016/j.paid.2014.12.044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Kim Taewan, Bae Seolyeong, Kim Hyun Ah, Lee Su-woo, Hong Hwajung, Yang Chanmo, and Kim Young-Ho. 2023. MindfulDiary: Harnessing Large Language Model to Support Psychiatric Patients’ Journaling. arXiv preprint arXiv:2310.05231 (2023). [Google Scholar]
- [42].Kim Taewan, Shin Donghoon, Kim Young-Ho, and Hong Hwajung. 2024. DiaryMate: Exploring the Roles of Large Language Models in Facilitating AI-mediated Journaling. (2024).
- [43].Kocielnik Rafal, Xiao Lillian, Avrahami Daniel, and Hsieh Gary. 2018. Reflection Companion: A Conversational System for Engaging Users in Reflection on Physical Activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (July 2018), 1–26. 10.1145/3214273 [DOI] [Google Scholar]
- [44].Kornell Nate and Bjork Robert A.. 2009. A stability bias in human memory: Overestimating remembering and underestimating learning. Journal of Experimental Psychology: General 138, 4 (2009), 449–468. 10.1037/a0017350 [DOI] [PubMed] [Google Scholar]
- [45].Kroenke K, Spitzer RL, Williams JBW, and Lowe B. 2009. An Ultra-Brief Screening Scale for Anxiety and Depression: The PHQ-4. Psychosomatics 50, 6 (Nov. 2009), 613–621. 10.1176/appi.psy.50.6.613 [DOI] [PubMed] [Google Scholar]
- [46].Kumar Harsh, Wang Yiyi, Shi Jiakai, Musabirov Ilya, Farb Norman AS, and Williams Joseph Jay. 2023. Exploring the use of large language models for improving the awareness of mindfulness. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7. [Google Scholar]
- [47].Li Zhuoyang, Liang Minhui, Hai Trung Le Ray Lc, and Luo Yuhan. 2023. Exploring Design Opportunities for Reflective Conversational Agents to Reduce Compulsive Smartphone Use. In Proceedings of the 5th International Conference on Conversational User Interfaces (Eindhoven, Netherlands) (CUI ‘23). Association for Computing Machinery, New York, NY, USA, Article 37, 6 pages. 10.1145/3571884.3604305 [DOI] [Google Scholar]
- [48].Machado Sónia Matos and Costa Maria Emília. 2015. Mindfulness Practice Outcomes Explained Through the Discourse of Experienced Practitioners. Mindfulness 6 (2015), 1437–1447. https://api.semanticscholar.org/CorpusID:141385885 [Google Scholar]
- [49].Marteau Theresa M. and Bekker Hilary. 1992. The development of a six-item short-form of the state scale of the Spielberger State—Trait Anxiety Inventory (STAI). British Journal of Clinical Psychology 31, 3 (Sept. 1992), 301–306. 10.1111/j.2044-8260.1992.tb00997.x [DOI] [PubMed] [Google Scholar]
- [50].Matz SC, Teeny JD, Peters H, Harari GM, and Cerf M. 2024. The potential of generative AI for personalized persuasion at scale. Scientific Reports 14, 1 (2024), 4692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].McInnes Leland, Healy John, Saul Nathaniel, and Grossberger Lukas. 2018. UMAP: Uniform Manifold Approximation and Projection. The Journal of Open Source Software 3, 29 (2018), 861. [Google Scholar]
- [52].Miller William. 2014. Interactive journaling as a clinical tool. Journal of Mental Health Counseling 36, 1 (2014), 31–42. [Google Scholar]
- [53].Mofatteh Mohammad. 2021. Risk factors associated with stress, anxiety, and depression among university undergraduate students. AIMS public health 8, 1 (2021), 36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Naveed Humza, Khan Asad Ullah, Qiu Shi, Saqib Muhammad, Anwar Saeed, Usman Muhammad, Barnes Nick, and Mian Ajmal. 2023. A Comprehensive Overview of Large Language Models. arXiv preprint arXiv:2307.06435 (2023). [Google Scholar]
- [55].Nepal Subigya, Liu Wenjun, Pillai Arvind, Wang Weichen, Vojdanovski Vlado, Huckins Jeremy F., Rogers Courtney, Meyer Meghan L., and Campbell Andrew T.. 2024. Capturing the College Experience: A Four-Year Mobile Sensing Study of Mental Health, Resilience and Behavior of College Students during the Pandemic. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 1, Article 38 (mar 2024), 37 pages. 10.1145/3643501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Nepal Subigya, Pillai Arvind, Campbell William, Massachi Talie, Eunsol Soul Choi Xuhai Xu, Kuc Joanna, Huckins Jeremy F, Holden Jason, Depp Colin, Jacobson Nicholas, Czerwinski Mary P, Granholm Eric, and Campbell Andrew. 2024. Contextual AI Journaling: Integrating LLM and Time Series Behavioral Sensing Technology to Promote Self-Reflection and Well-being using the MindScape App. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ‘24). Association for Computing Machinery, New York, NY, USA, Article 86, 8 pages. 10.1145/3613905.3650767 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Nie Jingping, Shao Hanya, Fan Yuang, Shao Qijia, You Haoxuan, Preindl Matthias, and Jiang Xiaofan. 2024. LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices. arXiv preprint arXiv:2403.10779 (2024). [Google Scholar]
- [58].OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] [Google Scholar]
- [59].OpenAI. 2024. OpenAI. https://openai.com
- [60].Osin Evgeny N. and Turilina Irina I. 2021. Mindfulness meditation experiences of novice practitioners in an online intervention: Trajectories, predictors, and challenges. Applied psychology. Health and well-being (2021). https://api.semanticscholar.org/CorpusID:235962146 [DOI] [PubMed] [Google Scholar]
- [61].Preece David A., Petrova Kate, Mehta Ashish, and Gross James J.. 2023. The Emotion Regulation Questionnaire-Short Form (ERQ-S): A 6-item measure of cognitive reappraisal and expressive suppression. Journal of Affective Disorders 340 (Nov. 2023), 855–861. 10.1016/j.jad.2023.08.076 [DOI] [PubMed] [Google Scholar]
- [62].Rammstedt Beatrice and John Oliver P.. 2007. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality 41, 1 (Feb. 2007), 203–212. 10.1016/j.jrp.2006.02.001 [DOI] [Google Scholar]
- [63].Rudrum Sarah, Casey Rebecca, Frank Lesley, Brickner Rachel K., MacKenzie Sami, Carlson Jesse, and Rondinelli Elisabeth. 2022. Qualitative Research Studies Online: Using Prompted Weekly Journal Entries During the COVID-19 Pandemic. International Journal of Qualitative Methods 21 (Jan. 2022), 160940692210931. 10.1177/16094069221093138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Russell Dan, Peplau Letitia Anne, and Ferguson Mary Lund. 1978. Developing a Measure of Loneliness. Journal of Personality Assessment 42, 3 (June 1978), 290–294. 10.1207/s15327752jpa4203_11 [DOI] [PubMed] [Google Scholar]
- [65].Ryff Carol D. and Keyes Corey Lee M.. 1995. The structure of psychological well-being revisited. Journal of Personality and Social Psychology 69, 4 (1995), 719–727. 10.1037/0022-3514.69.4.719 [DOI] [PubMed] [Google Scholar]
- [66].Sedikides Constantine and Skowronski John J.. 2020. In Human Memory, Good Can Be Stronger Than Bad. Current Directions in Psychological Science 29, 1 (Jan. 2020), 86–91. 10.1177/0963721419896363 [DOI] [Google Scholar]
- [67].Sharma Ashish, Rushton Kevin, Lin Inna Wanyin, Nguyen Theresa, and Althoff Tim. 2023. Facilitating self-guided mental health interventions through human-language model interaction: A case study of cognitive restructuring. arXiv preprint arXiv:2310.15461 (2023). [Google Scholar]
- [68].Silvia Paul J.. 2021. The self-reflection and insight scale: applying item response theory to craft an efficient short form. Current Psychology 41, 12 (Jan. 2021), 8635–8645. 10.1007/s12144-020-01299-7 [DOI] [Google Scholar]
- [69].Smith Bruce W., Dalen Jeanne, Wiggins Kathryn, Tooley Erin, Christopher Paulette, and Bernard Jennifer. 2008. The brief resilience scale: Assessing the ability to bounce back. International Journal of Behavioral Medicine 15, 3 (Sept. 2008), 194–200. 10.1080/10705500802222972 [DOI] [PubMed] [Google Scholar]
- [70].Smyth Joshua M, Johnson Jillian A, Auer Brandon J, Lehman Erik, Talamo Giampaolo, and Sciamanna Christopher N. 2018. Online Positive Affect Journaling in the Improvement of Mental Distress and Well-Being in General Medical Patients With Elevated Anxiety Symptoms: A Preliminary Randomized Controlled Trial. JMIR Mental Health 5, 4 (Dec. 2018), e11290. 10.2196/11290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [71].Sohal Monika, Singh Pavneet, Singh Dhillon Bhupinder, and Singh Gill Harbir. 2022. Efficacy of journaling in the management of mental illness: a systematic review and meta-analysis. Family medicine and community health 10, 1 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Stoliker Bryce E and Lafreniere Kathryn D. 2015. The influence of perceived stress, loneliness, and learning burnout on university students’ educational experience. College student journal 49, 1 (2015), 146–160. [Google Scholar]
- [73].Thompson Edmund R.. 2007. Development and Validation of an Internationally Reliable Short-Form of the Positive and Negative Affect Schedule (PANAS). Journal of Cross-Cultural Psychology 38, 2 (March 2007), 227–242. 10.1177/0022022106297301 [DOI] [Google Scholar]
- [74].Exploding Topics. 2024. iPhone vs Android User Stats (2024 Data). https://explodingtopics.com/blog/iphone-android-users
- [75].Tosevski Dusica L, Milovancevic Milica P, and Gajic Saveta D. 2010. Personality and psychopathology of university students. Current opinion in psychiatry 23, 1 (2010), 48–52. [DOI] [PubMed] [Google Scholar]
- [76].Wang Weichen, Nepal Subigya, Huckins Jeremy F, Hernandez Lessley, Vojdanovski Vlado, Mack Dante, Plomp Jane, Pillai Arvind, Obuchi Mikio, Dasilva Alex, et al. 2022. First-gen lens: Assessing mental health of first-generation students across their first year at college using mobile sensing. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 6, 2 (2022), 1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Watson David, Clark Lee Anna, and Tellegen Auke. 1988. Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology 54, 6 (1988), 1063–1070. 10.1037/0022-3514.54.6.1063 [DOI] [PubMed] [Google Scholar]
- [78].Williams Gail B, Gerardi Margit B, Gill Sara L, Soucy Mark D, and Taliaferro Donna H. 2009. Reflective journaling: Innovative strategy for self-awareness for graduate nursing students. International Journal of Human Caring 13, 3 (2009), 36–43. [Google Scholar]
- [79].Wu Ruolan, Yu Chun, Pan Xiaole, Liu Yujia, Zhang Ningning, Fu Yue, Wang Yuhan, Zheng Zhi, Chen Li, Jiang Qiaolei, Xu Xuhai, and Shi Yuanchun. 2023. MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention. 10.48550/ARXIV.2309.16639 [DOI]
- [80].Xu Xuhai, Yao Bingsheng, Dong Yuanzhe, Gabriel Saadia, Yu Hong, Hendler James, Ghassemi Marzyeh, Dey Anind K., and Wang Dakuo. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol 8, 1, Article 31 (mar 2024), 32 pages. 10.1145/3643540 [DOI] [Google Scholar]
- [81].Yeo Shun Yi, Lim Gionnieve, Gao Jie, Zhang Weiyu, and Perrault Simon Tangi. 2024. Help Me Reflect: Leveraging Self-Reflection Interface Nudges to Enhance Deliberativeness on Online Deliberation Platforms. arXiv preprint arXiv:2401.10820 (2024). [Google Scholar]
- [82].Zaccaro Andrea, Piarulli Andrea, Laurino Marco, Garbella Erika, Menicucci Danilo, Neri Bruno, and Gemignani Angelo. 2018. How Breath-Control Can Change Your Life: A Systematic Review on Psycho-Physiological Correlates of Slow Breathing. Frontiers in Human Neuroscience 12 (Sept. 2018). 10.3389/fnhum.2018.00353 [DOI] [PMC free article] [PubMed] [Google Scholar]