Small Data Approaches to Link Faster Time Scale Engagement Dynamics with Slower Time Scale Outcomes in Biobehavioral Interventions

Jingchuan Wu; Nilam Ram; James Marks; Necole M Streeper; David E Conroy

doi:10.1007/s41111-024-00255-1

. Author manuscript; available in PMC: 2025 Aug 12.

Published in final edited form as: Chin Polit Sci Rev. 2024 Jun 24;10(2):254–267. doi: 10.1007/s41111-024-00255-1

Small Data Approaches to Link Faster Time Scale Engagement Dynamics with Slower Time Scale Outcomes in Biobehavioral Interventions

Jingchuan Wu ¹, Nilam Ram ², James Marks ³, Necole M Streeper ⁴, David E Conroy ¹

PMCID: PMC12330994 NIHMSID: NIHMS2004730 PMID: 40786139

Abstract

Purpose:

This study illustrates the application of time series clustering and feature engineering techniques to small data obtained at a fast time-scale from biobehavioral interventions to identify slower time-scale health outcomes.

Methods:

Using data from 26 adult kidney stone patients engaged with mini-sip^IT, a month-long digital health intervention targeting increased fluid intake, we identified distinct patterns of engagement with both manual app tracking and automated smart water bottles and examined how those patterns were related to subsequent urine volume.

Results:

Time-series based analysis of engagement revealed that manual tracking was significantly associated with increased urine volume, highlighting the potential for active self-monitoring to improve health behaviors. In contrast, differential patterns of engagement with automated tracking were not related to differences in urine volume.

Conclusion:

These findings suggest that small data approaches can effectively bridge time scales in behavioral interventions, and that manual engagement methods may be more beneficial than automated ones in fostering behavior change. Absent large datasets to support identification of engagement patterns via deep learning, time series clustering and feature engineering provide valuable tools for linking fast time-scale engagement processes with slow time-scale health outcome processes.

IRB Approval:

This study was conducted with the approval of the Institutional Review Board (STUDY00015017), granted on 9/22/2021.

Keywords: Biobehavioral Interventions, Small Data, Time Series Clustering, Feature Engineering, Digital Health, Manual Tracking

Behavioral interventions have been extensively employed across diverse health domains, including clinical and public health settings, to foster healthier lifestyles and mitigate maladaptive behaviors (Wilson et al., 2019). Two challenges that researchers and clinicians face in this domain are ensuring that participants are actively engaged with the intervention content and translating this engagement into behavior change and clinically-meaningful improvements in health outcomes (Walton et al., 2020). The advent of digital technologies has provided a promising avenue to address these challenges. By leveraging mobile and connected technologies, it is now possible to sense contextual variations in real-time and engage participants with intervention content at crucial junctures in participants’ daily routines (Cole-Lewis et al., 2019; Dallery et al., 2014; Yardley et al., 2016). The potential for these interventions to modify behavior and improve health is contingent on their ability to engage participants in using the technology; however, realizing that potential requires methods to link changes in faster and slower time scale processes (i.e., engagement and health outcomes, respectively).

Engagement can be studied through various lenses. Within the behavioral domain, engagement typically refers to “the extent of usage” (Saleem et al., 2022). Engagement with a hydration monitoring app, for example, could be measured by a variety of metrics, including how often water intake is recorded (frequency) and the volume of water logged (amount). However, no single metric offers a fully comprehensive perspective on user interaction. Cole-Lewis et al. (2019) introduced a nuanced perspective on engagement by differentiating between ‘small e’ engagement, which pertains to user interactions with specific intervention features (i.e., frequency or amount), and ‘big E’ engagement, which pertains to user enactment of the targeted health behavior (i.e., hydration outcomes). To better understand how ‘small e’ engagement can drive behavioral change and clinical outcomes, a recent study suggested that summary measures of engagement with digital interventions (e.g., number of logins or activities completed) might predict clinical outcomes seen over longer periods, like months or year (Gan et al., 2021). Reducing engagement, a time-varying phenomenon, to a single summary metric discards substantial information about how engagement changes over time. The dynamic patterns may be informative for understanding how ‘small e’ engagement on a fast time scale (e.g., daily) influences ‘big E’ engagement on a slower time scale (e.g., 6 months). Deep learning methods could be applied to identify patterns of change in engagement that predict clinical outcomes (Norgeot et al., 2019); however, deep learning models generally require vast amounts of training data (Chen & Lin., 2014, Cascarano et al., 2023). Thus, a challenge arises when dealing with many clinical and biobehavioral issues which are frequently characterized by small, intensive longitudinal datasets (Hekler et al., 2019). These datasets, often derived from a limited number of participants, are not always suitable for deep learning techniques which are vulnerable to overfitting with small data.

Given the growing interest in digital health interventions for improving public health (Karpathakis et al., 2021), policymakers will need to understand engagement patterns to optimize health strategies and governance. In these small data scenarios, alternative methodologies, such as time series clustering or feature engineering, might offer valuable insights into patterns of engagement that correlate with improved health outcomes. Time series clustering, wherein similar sequences of data are grouped together based on their temporal patterns may be particularly useful when seeking to identify common trajectories or patterns of behavior when the order and timing of behavior play a crucial role in producing health outcomes (Aghabozorgi et al., 2015). For instance, in the context of engagement with a digital health intervention, Xu et al (2023) used time series clustering to reveal users who engaged in similar patterns of dietary monitoring behaviors. The identified trajectories of food tracking adherence—ranging from high to low—differentiated between weight loss patterns over 6 months. There is thus already some evidence that insights into engagement patterns might usefully guide the optimization of intervention strategies, pinpointing critical junctures where users may require additional support to maintain their engagement with the program.

Feature engineering and feature selection offer another approach to identifying the characteristics of engagement that may be relevant for improving intervention effects. Feature engineering is a fundamental process in machine learning where raw data is transformed into a structured format that algorithms can interpret and utilize effectively. It involves creating meaningful variables, or features, that capture the underlying patterns in the data, which are crucial for the predictive modeling of outcomes (Verdonck et al., 2021). The process of feature engineering typically involves several steps. Initially, it starts with domain expertise to hypothesize which data attributes could be relevant. Next, data is preprocessed to handle issues like missing values or outliers. Then, new features are constructed from the raw data, which may involve encoding, deriving new metrics, or aggregating multiple data points. These features are then evaluated for their predictive power and relevance to the intervention’s outcomes. Finally, redundant or irrelevant features are pruned to refine the model (Butcher & Smith, 2020; Verdonck et al., 2021). In a study by Gavin and colleagues, for example, specific self-monitoring behaviors such as the total number of gaps in tracking, the average length of these gaps, and the timing of the first gap in tracking physical activity were analyzed for their impact on weight loss and physical activity levels over 24 months. The study found that early and prolonged gaps in self-monitoring were associated with lower physical activity and higher weight two years later, highlighting the importance of sustained and regular self-monitoring in weight loss interventions (Gavin et al., 2021).

Feature engineering can help to optimize intervention design, providing direction on which features of engagement to emphasize for enhanced effectiveness. By meticulously analyzing data and extracting features that exhibit a strong correlation with positive outcomes, scientists can pinpoint the most influential engagement features within an intervention. This targeted approach allows for the reinforcement of strategies that show tangible benefits, ensuring that these elements are given prominence in the intervention’s framework. Conversely, features that demonstrate minimal impact can be reevaluated, refined, or even discarded, streamlining the intervention to eliminate inefficiencies. Such strategic pruning not only sharpens the focus of the intervention but also contributes to a more judicious allocation of resources. The end result is a refined intervention strategy that is not only more aligned with the desired outcomes but also more cost-effective, maximizing the potential for success while minimizing unnecessary expenditure.

The insights garnered from small data approaches not only advance scientific knowledge but also provide actionable guidance for health administrators and policymakers who are tasked with the design and implementation of effective and sustainable health policies. To compare time series clustering and feature engineering approaches for linking fast time-scale variability in various indicators of engagement with a slower time-scale clinical outcome, the present study undertook a secondary analysis of data from a one-month, single-group trial of a novel digital intervention, mini-sip^IT, that promoted fluid intake among patients diagnosed with kidney stones (Streeper et al., 2023). Increasing fluid intake and urine output are core strategies for preventing a recurrence of a kidney stone in clinical guidelines (Pearle et al., 2014). Unfortunately, often less than half of the patients consume enough fluids to produce the recommended 2.5 L/day of urine (Borghi et al., 1996; Khambati et al., 2017). The mini-sip^IT intervention was developed to provide support for tracking fluid intake and reminding patients to drink when they lapse and do not achieve an hourly fluid intake goal (Conroy et al., 2020). Self-monitoring of behavior is a common and potent element of behavior change programs (Burke et al., 2011; Compernolle et al., 2019; Michie et al., 2009; Sanders et al., 2016). The mini-sip^IT intervention used semi-automated tracking with manual and automated components to track patients’ fluid intake in real time. Manual self-tracking can be done via a mobile application but has historically proven difficult to sustain for extended periods (Gavin et al., 2021). Automated self-tracking can be done with a connected water bottle that senses changes in volume due to consumption and may alleviate long-term engagement problems (Wright et al., 2022). This project sought to identify the patterns of engagement with the manual and automated self-tracking tools in mini-sip^IT that are associated with greater urine output at one month.

Method

Participants

Participants were 35 adults, enrolled between February and September 2021 through Penn State Health Stone Clinics, who had previously experienced kidney stone and recorded a 24-hour urine output of less than 2 liters per day (L/d). Additional inclusion criteria included age 18 or older, proficient in English, able to provide informed consent, and owning a smartphone. We excluded individuals with a history of cystinuria, with medical conditions that either limited fluid consumption or led to increased fluid loss (e.g., having congestive heart failure, chronic diarrhea, or a post-bariatric surgery status), and with professions that made use of the connected water bottle impractical (e.g., truck drivers, surgeons).

Intervention

The mini-sip^IT intervention was developed to support fluid intake and included education about fluid intake goals, a semi-automated tool for tracking fluid intake, and lapse-contingent text message reminders to drink. This deployment of the intervention incorporated the H2OPal-connected water bottle and its associated mobile application (Out of Galaxy LLC, San Francisco, CA), but did not include a smartwatch that detected drinking gestures as in the original mini-sip^IT intervention (Conroy et al., 2020). As part of the intervention, the bottle automatically tracked participants’ fluid intake from the bottle and participants could also manually log drinks from other sources. Both automatic and manual drink entries were transmitted in real-time to the intervention server (via application programming interfaces) and used to determine if users met their hourly fluid intake goals during a 12-hour daily monitoring period selected by the participant. If a participant had not met their fluid intake goal after one hour, an automated text message accompanied by a small image file (chosen at random from a set of 56) was sent to promote fluid intake in a visually-appealing way. This study was conducted with the approval of the Institutional Review Board (STUDY00015017).

Procedures

The study commenced with a screening process where patients’ medical records were assessed to identify those with a 24-hour urine output of less than 2 L in the past three months. Eligible participants were then invited to join the study, where they first provided informed consent and completed questionnaires about their demographics. Participants were provided with a connected water bottle and an associated mobile application designed to monitor their hydration levels. For the next month, participants received text message prompts during a 12-hour monitoring period each day. Specifically, these prompts encouraged increased water intake whenever the participant failed to achieve their fluid intake goal within an hour of the last goal achievement or reminder message. The program aimed to achieve a minimum daily fluid consumption of 96 ounces. Upon completion of the month-long intervention, a final 24-hour urine sample was collected from a certified laboratory that delivered and collected collection kits via mail.

Measures

Urine Volume.

The primary outcome was 24-hour urine volume at the end of the one-month intervention period. A certified clinical laboratory handled the analysis of all 24-hour urine samples.

Demographics.

Demographic information was self-reported and included age, sex, race, ethnicity, height, and weight.

Daily Tracking.

Engagement with the app and connected water bottle for manual and automated tracking, respectively, was measured using time-stamped logs from the H20Pal server. Manual entries in the app and automated entries from the bottle were recorded as timestamped volumes.

Data Analysis

Two sets of analyses were used to explore how time series clustering and feature engineering inform understanding of participants’ engagement with the digital intervention and how those engagement patterns are related to differences in fluid consumption. All analyses were done in R (R Core Team, 2022).

Time Series Clustering.

Time series clustering was done in a series of steps. First, we computed a set of distance matrices for each of the six daily tracking behaviors. Specifically, the ‘diss’ function in the TSclust package (Montero & Vilar, 2014) was applied to the N = 29 time-series for each daily behavior to compute, via Dynamic Time Warping (DTW), distances between each pair of individuals’ time series. Second, the ‘hclust’ function (complete linkage approach) in the stats package (R Core Team, 2022) was used to obtain hierarchical clustering solutions that would support identification of the optimal number of clusters. Silhouette scores and dendrograms (obtained using the ‘heatmap.2’ function from the gplots package; Warnes et al., 2022) were used to identify number of clusters. Higher silhouette scores (which can range from −1 to +1) indicated preferred solution in terms of separation and cohesion of clusters (Shahapure & Nicholas, 2020). Third, for each of the six behaviors, participants were assigned to clusters, the average engagement patterns over a 30-day intervention period for each cluster were visualized (using the ggplot2 package; Wickham, 2016) and interpreted to develop descriptive labels for each cluster. Finally, differences in 24-hour urine volumes for each cluster were tested using two-sample t-tests.

Feature Engineering.

Specific features of participants’ daily engagement metrics with both smart bottle usage and app manual input were selectively curated based on domain knowledge. In particular, we examined six theoretically-informed descriptive features of each of the two time-series: the mean, standard deviation, maximum, minimum, latency to maximum and minimum, and the frequency of daily disuse for engagements. First, these six features were systematically extracted using the tsfeaturex package (Roque, 2019). Second, the ranked features were incorporated as predictors in a series of multiple linear regression models of one-month urine volumes. This allowed us to explore the strength of their associations with urine volume outcomes, providing additional understanding of how different engagement behaviors influence the primary urine volume outcome.

Results

The demographic profile and engagement levels of participants are detailed in Tables 1 and 2, respectively. Study participants were predominantly middle-aged (69% aged 35–60, M = 50.38, SD = 14.19), female (77%), overweight or obese (73%), and white (92%). As shown in Table 2, participants recorded an average of drinks 14.5 times per day from the bottle and app combined, with an average total daily drink volume of 1.99 L. Drinks were primarily recorded from their connected water bottles. Similar volumes were recorded automatically from the water bottle (M = 0.98 L/d, SD = 1.02) and manually from the app (M = 1.02 L/d, SD = 0.86).

Table 1.

Demographic Characteristics of Participants (N = 26)

Demographic	Category	N	%

Age (years)
	< 35	4	15.38
	35–60	18	69.23
	60+	4	15.38
Sex
	Female	20	76.92
	Male	6	23.08
Body Mass Index Classification
	Normal	6	23.08
	Obese	9	34.62
	Overweight	10	38.46
	Underweight	1	3.85
Race
	White	24	92.31
	Hispanic or Latino	1	3.85
	Two or more, including Hispanic or Latino	1	3.85

Open in a new tab

Note. Body Mass Index is calculated as weight in kilograms divided by the square of height in meters (kg/m²).

Table 2.

Descriptive Statistics for Daily Drinking Behaviors

Variable	Mean	SD	Median	Min	Max

Total Daily Drink Frequency	14.49	11.51	12.00	0	65.00
Total Daily Drink Amount (L)	1.99	1.11	1.87	0	5.43
Bottle Drinking Frequency per Day	10.88	11.89	8.00	0	65.00
Manual Drinking Frequency per Day	3.60	3.28	3.00	0	24.00
Daily Drink Amount from Bottle (L)	0.98	1.02	0.69	0	4.62
Daily Drink Amount from Manual Use (L)	1.02	0.86	0.86	0	3.99

Open in a new tab

Time-series clustering identified different behavioral patterns in participants’ engagement for each of six longitudinal engagement indicators. Using the dendrograms shown in Figure 1 and the silhouette scores, we settled on a 2-cluster solution with two distinct clusters of participants that could be described as exhibiting either high engagement or low engagement. The average engagement trajectories shown in Figure 2 illustrate a comparative pattern of high versus low user engagement on each of the six indicators over the course of a month.

*Note.* Each subplot represents a heatmap with a corresponding dendrogram clustering participants based on their 30-day engagement patterns with a smart hydration monitoring system. The color gradient time-series plots indicate the intensity of each of the N = 29 individuals’ daily fluid intake (Manual Amount, Bottle Amount, or Total Amount in liters) and engagement frequency (Manual Frequency and Bottle Frequency, or Total Drink Frequency) across the month-long intervention.

*Note*. This figure illustrates six line plots that represent 30-day trends in recorded fluid consumption and engagement frequency with a smart bottle and app. The top left plot indicates daily fluid consumption in liters as recorded by the smart bottle. The top right plot displays the number of times participants engaged with the smart bottle. The middle left plot indicates daily fluid consumption in liters as recorded by the manual input. The middle right represent plots shows the fluid volume recorded both manual input in the app and recorded by smart bottle. On the bottom, the left plot shows the fluid volume entered manually into the app, and the right plot counts for the both app and smart bottle events in the app. The shaded areas around each line denote the standard deviation, illustrating the variability around the mean for each cluster group.

Table 3 summarizes statistical tests of differences in 24-hour urine volumes between the high and low engagement cluster groups that were derived from each of the six different indicators of self-monitoring (each shown in a separate row). The only significant difference of note between the cluster groups was on engagement with manual tracking volume, with the high engagement with manual tracking volume cluster having significantly higher 24-hour urine volumes, t(10.15) = 2.36, p = .04, d=1.00. None of the other high and low engagement pattern clusters also differed in 24-hour urine volumes (all p > .05) although it is worth noting the large (non-significant) difference between classes defined by the frequency of bottle used < 1.28).

Table 3.

Urine Volume Differences Between Latent Classes of Engagement Defined by Each Tracking Method

	Class 1: High Engagement	Class 2: Low Engagement

	M (SD)	M (SD)	t (df)	p	Mean Difference	d

Combined Tracking Methods
Daily Drink Frequency	2187.50 (904.15)	1926.11 (1027.52)	−0.65 (15.27)	.52	261.39	0.28
Daily Volume (L)	2468.89 (1150.85)	1761.77 (810.53)	−1.64 (12.33)	.13	707.12	0.68
Automated Bottle Recordings
Daily Drinking Frequency	1670.00 (42.43)	2034.58 (1017.29)	1.74 (23.73)	.10	−364.58	−1.28
Daily Volume(L)	2151.11 (1287.13)	1930.00 (810.40)	−0.46 (11.46)	.65	221.11	0.19
Manual App Entries
Daily Drinking Frequency	3020.00 (1152.45)	1765.24 (786.72)	2.31 (4.92)	.07	1254.76	1.15
Daily Volume (L)	2705.00 (1098.71)	1696.11 (765.50)	2.36 (10.15)	.04	1008.89	1.00

Open in a new tab

In the feature engineering approach, a multiple linear regression model was estimated to evaluate associations between the identified feature variables and urine volume. Neither model was statistically significant (automated features model: F (7,18) = .81, p = .59, R² = .23 manual features model: F (7,18) = 1.48, p = .24, R² = .37).

Discussion

This paper compared two methods for linking faster and slower time scale processes that inform research on biobehavioral intervention development. The findings make clinical and scientific contributions. First, our research addressed a significant gap in hydration intervention research, a field that has received relatively scant scientific focus despite the importance of fluid intake as a health behavior (De la Hoz et al., 2023). Notably, our study leverages the mini-sip^IT intervention, one of the pioneering initiatives to utilize digitalized tools, such as smart water bottles, to improve hydration outcomes. With the increasing integration of digital devices in health monitoring, connected water bottles are poised for wider adoption (Wright et al., 2022). Our findings offer crucial insights into the usage patterns of connected water bottles and the design of accompanying apps, setting a foundation for future research to refine intervention strategies and practical applications in this emergent area of digital health monitoring.

Second, we addressed the challenge of linking faster time scale engagement dynamics with slower time scale outcomes through innovative small data approaches. We applied time series clustering and feature engineering in studies with a small dataset, such as those often used to develop and evaluate digital health biobehavioral interventions. These methods reveal detailed behavioral patterns over time and identify key engagement features that were associated with an important health outcome, even with a limited number of participants. Time-series clustering was more sensitive to differences in urine volume, a key outcome of our study, and suggests that this approach may be valuable for interrogating small-scale, personalized health care studies where large-scale data collection is not feasible. Feature engineering may be more useful for bridging time scales in studies with larger sample sizes for the between-person analyses.

Third, this study revealed that high engagement with manually tracking fluid intake volume via an app was positively associated with greater urine volume whereas engagement in automated tracking with a smart water bottle was not linked with increased urine volume. Indeed, and to our surprise, patients who used their smart water bottle less frequently over time showed signs of greater (although not statistically significant) urine volumes than those who used the bottle more frequently over time. Digital health solutions have generated hope for more effortless behavior change through automation of intervention processes. Our finding challenges speculation that increasing automation and removing manual elements from digital health interventions necessarily improves outcomes. The mere act of manual logging may heighten user awareness and lead to better behavioral outcomes, such as adherence to fluid intake recommendations. Similar findings about the value of cognitively-demanding self-monitoring compared to automated behavioral feedback have emerged in research on physical activity interventions (Conroy et al., 2023). Regarding the dosage of self-monitoring, our findings align with previous studies that have shown high-frequency manual tracking correlates with superior health outcomes (Xu et al., 2023).

The utility of small data approaches, such as time series clustering and feature engineering, extends beyond their contribution to scientific knowledge, also providing actionable guidance for health administrators and policymakers engaged in designing and implementing health interventions. These methods reveal nuanced behavioral trends within constrained datasets and suggest how social governance structures can be developed to support and monitor health behaviors. Specifically, the findings underscore the need for policies that prioritize active user engagement of self-monitoring over fully automated feedback mechanisms in specific health contexts. Regulatory frameworks that prioritize active user engagement of self-monitoring over fully automated behavioral feedback may be valuable in certain health contexts. However, it is essential to consider the potential burden that intensive self-monitoring may place on long-term adherence (Burke et al., 2011; Krukowski et al., 2022). Therefore, future research should focus on striking an optimal balance between manual and automated tracking to enhance user compliance and maximize the efficacy of behavioral adherence and health outcomes.

Several limitations of this study warrant attention. First, the sample was small and homogeneous. Further work with larger and more representative samples is needed to determine whether the findings generalize to more diverse populations. Second, the study design was prospective but observational and did not include either a pretest or randomized control group. Thus, we cannot definitively attribute the observed differences in urine volume to the intervention itself, as opposed to other external influences, time-related trends, or a placebo effect. Third, measurement error was likely introduced when devices were not present, connected, or charged (Cohen et al., 2022). Finally, because the time series clustering analysis was not specifically designed to represent all the interdependencies and complex temporal dynamics inherent in individual fluid intake behaviors, we likely only obtained oversimplified representation of participant’s engagement. These limitations highlight the need for further research with large experimental designs that include randomization and control conditions, multivariate time-series models of individual-level behavior change, and additional validation of technology-based data collection methods.

Conclusion

In conclusion, this study demonstrated the potential of small data approaches, such as time series clustering and feature engineering, to extract actionable insights from limited datasets and predict health outcomes. Our findings notably challenge the prevailing emphasis on automation in digital health interventions by revealing that manual tracking of fluid intake was associated with improved hydration outcomes and automated tracking was not. This finding underscores the importance of (‘small e’) user engagement and the potential benefits of manual input in enhancing health behaviors. They also point to the potential value for policymakers to attend to engagement dynamics in regulatory review of digital health interventions.

Acknowledgments

I am thankful to Dr. Zhang Zhiyong for his additional feedback on the methodologies and results sections.

Funding

Jingchuan Wu: Source of Funding: National Institute on Aging (T32 AG049676). David Conroy: Source of Funding: National Institute of Diabetes and Digestive and Kidney Diseases (R01 DK124469). Necole Streeper: Source of Funding: Keith and Lynda Harring Fund for Kidney Research at Penn State Health and National Institute of Diabetes and Digestive and Kidney Diseases (R01 DK124469).

Footnotes

Statements and Declarations

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

Aghabozorgi S, Seyed Shirkhorshidi A, & Ying Wah T (2015). Time-series clustering – a decade review. Information Systems, 53, 16–38. 10.1016/j.is.2015.04.007 [DOI] [Google Scholar]
Borghi L, Meschi T, Amato F, Briganti A, Novarini A, & Giannini A (1996). Urinary volume, water and recurrences in idiopathic calcium nephrolithiasis: A 5-year randomized prospective study. The Journal of Urology, 155(3), 839–843. [PubMed] [Google Scholar]
Burke LE, Wang J, & Sevick MA (2011). Self-monitoring in weight loss: A systematic review of the literature. Journal of the American Dietetic Association, 111(1), 92–102. 10.1016/j.jada.2010.10.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
Butcher B, & Smith BJ (2020). Feature engineering and selection: A practical approach for predictive models: by max kuhn and kjell johnson. boca raton, fl: chapman & hall/crc press, 2019, xv + 297 pp. The American Statistician, 74(3), 308–309. 10.1080/00031305.2020.1790217 [DOI] [Google Scholar]
Cascarano A, Mur-Petit J, Hernandez-Gonzalez J, Camacho M, de Toro Eadie N, Gkontra P, ... & Lekadir K (2023). Machine and deep learning for longitudinal biomedical data: a review of methods and applications. Artificial Intelligence Review, 56(Suppl 2), 1711–1771. [Google Scholar]
Chen XW, & Lin X (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514–525. [Google Scholar]
Cohen R, Fernie G, & Roshan Fekr A (2022). Monitoring fluid intake by commercially available smart water bottles. Scientific Reports, 12(1), Article 1. 10.1038/s41598-022-08335-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cole-Lewis H, Ezeanochie N, & Turgiss J (2019). Understanding health behavior technology engagement: Pathway to measuring digital behavior change interventions. JMIR Formative Research, 3(4), e14052. 10.2196/14052 [DOI] [PMC free article] [PubMed] [Google Scholar]
Compernolle S, DeSmet A, Poppe L, Crombez G, De Bourdeaudhuij I, Cardon G, van der Ploeg HP, & Van Dyck D (2019). Effectiveness of interventions using self-monitoring to reduce sedentary behavior in adults: A systematic review and meta-analysis. International Journal of Behavioral Nutrition and Physical Activity, 16(1), 1–16. 10.1186/s12966-019-0824-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
Conroy DE, Wu J, Lee AM, Brunke-Reese D, & Lagoa CM (2023). Dose-response relations between the frequency of two types of momentary feedback prompts and daily physical activity. Health Psychology, 42(3), 151–160. cin20. 10.1037/hea0001271 [DOI] [PMC free article] [PubMed] [Google Scholar]
Conroy DE, West AB, Brunke-Reese D, Thomaz E, & Streeper NM (2020). Just-in-time adaptive intervention to promote fluid consumption in patients with kidney stones. Health Psychology: Official Journal of the Division of Health Psychology, American Psychological Association, 39(12), 1062–1069. 10.1037/hea0001032 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dallery J, Kurti A, & Erb P (2014). A new frontier: Integrating behavioral and digital technology to promote health behavior. The Behavior Analyst, 38(1), 19–49. 10.1007/s40614-014-0017-y [DOI] [PMC free article] [PubMed] [Google Scholar]
De la Hoz A, Melo L, Álvarez A, Cañada F, & Cubero J (2023). The promotion of healthy hydration habits through educational robotics in university students. Healthcare, 11(15), Article 15. 10.3390/healthcare11152160 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gan DZQ, McGillivray L, Han J, Christensen H, & Torok M (2021). Effect of engagement with digital interventions on mental health outcomes: A systematic review and meta-analysis. Frontiers in Digital Health, 3. https://www.frontiersin.org/articles/10.3389/fdgth.2021.764079 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gavin KL, Sherwood NE, Wolfson J, Pereira MA, & Linde JA (2021). Characterizing self-monitoring behavior and Its association with physical activity and weight loss maintenance. American Journal of Lifestyle Medicine, 15(2), 173–183. 10.1177/1559827618790556 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hekler EB, Klasnja P, Chevance G, Golaszewski NM, Lewis D, & Sim I (2019). Why we need a small data paradigm. BMC Medicine, 17(1), 133. 10.1186/s12916-019-1366-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Karpathakis K, Libow G, Potts HWW, Dixon S, Greaves F, & Murray E (2021). An Evaluation Service for Digital Public Health Interventions: User-Centered Design Approach. Journal of Medical Internet Research, 23(9), e28356. 10.2196/28356 [DOI] [PMC free article] [PubMed] [Google Scholar]
Khambati A, Matulewicz RS, Perry KT, & Nadler RB (2017). Factors associated with compliance to increased fluid intake and urine volume following dietary counseling in first-time kidney stone patients. Journal of Endourology, 31(6), 605–610. 10.1089/end.2016.0836 [DOI] [PubMed] [Google Scholar]
Krukowski RA, Harvey J, Borden J, Stansbury ML, & West DS (2022). Expert opinions on reducing dietary self-monitoring burden and maintaining efficacy in weight loss programs: A Delphi study. Obesity Science & Practice, 8(4), 401–410. 10.1002/osp4.586 [DOI] [PMC free article] [PubMed] [Google Scholar]
Michie S, Abraham C, Whittington C, McAteer J, & Gupta S (2009). Effective techniques in healthy eating and physical activity interventions: A meta-regression. Health Psychology, 28(6), 690–701. 10.1037/a0016136 [DOI] [PubMed] [Google Scholar]
Montero P, & Vilar JA (2014). Tsclust: An r package for time series clustering. Journal of Statistical Software, 62(1), 1–43. [Google Scholar]
Norgeot B, Glicksberg BS, Trupin L, Lituiev D, Gianfrancesco M, Oskotsky B, ... & Butte AJ (2019). Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA network open, 2(3), e190606–e190606. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pearle MS, Goldfarb DS, Assimos DG, Curhan G, Denu, -Ciocca Cynthia J, Matlaga BR, Monga M, Penniston KL, Preminger GM, Turk TMT, & White JR (2014). Medical management of kidney stones: Aua guideline. Journal of Urology, 192(2), 316–324. 10.1016/j.juro.2014.05.006 [DOI] [PubMed] [Google Scholar]
R Core Team. (2022). R: a language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
Roque N (2019). Tsfeaturex: An r package for automating time series feature extraction. https://github.com/nelsonroque/tsfeaturex [DOI] [PMC free article] [PubMed] [Google Scholar]
Saleem M, Kühne L, Santis KKD, Christianson L, Brand T, & Busse H (2021). Understanding Engagement Strategies in Digital Interventions for Mental Health Promotion: Scoping Review. JMIR Mental Health, 8(12), e30000. 10.2196/30000 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sanders JP, Loveday A, Pearson N, Edwardson C, Yates T, Biddle SJ, Esliger DW, & Sanders. (2016). Devices for self-monitoring sedentary time or physical activity: A scoping review. Journal of Medical Internet Research, 18(5), e90–e90. cin20. 10.2196/jmir.5373 [DOI] [PMC free article] [PubMed] [Google Scholar]
Shahapure KR, & Nicholas C (2020, October). Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA) (pp. 747–748). IEEE. [Google Scholar]
Streeper NM, Fairbourn JD, Marks J, Thomaz E, Ram N, & Conroy DE (2023). Feasibility of mini sipit behavioral intervention to increase urine volume in patients with kidney stones. Urology. 10.1016/j.urology.2023.06.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
Verdonck T, Baesens B, Óskarsdóttir M, & vanden Broucke S (2021). Special issue on feature engineering editorial. Machine Learning. 10.1007/s10994-021-06042-2 [DOI] [Google Scholar]
Walton H, Spector A, Williamson M, Tombor I, & Michie S (2020). Developing quality fidelity and engagement measures for complex health interventions. British Journal of Health Psychology, 25(1), 39–60. 10.1111/bjhp.12394 [DOI] [PMC free article] [PubMed] [Google Scholar]
Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, & Venables B (2022). Gplots: Various r programming tools for plotting data. https://CRAN.R-project.org/package=gplots [Google Scholar]
Wickham H (2016). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org [Google Scholar]
Wilson DK, Christensen A, Jacobsen PB, & Kaplan RM (2019). Standards for economic analyses of interventions for the field of health psychology and behavioral medicine. Health Psychology, 38(8), 669–671. 10.1037/hea0000770 [DOI] [PubMed] [Google Scholar]
Wright HC, Alshara L, DiGennaro H, Kassis YE, Li J, Monga M, Calle J, & Sivalingam S (2022). The impact of smart technology on adherence rates and fluid management in the prevention of kidney stones. Urolithiasis, 50(1), 29–36. 10.1007/s00240-021-01270-6 [DOI] [PubMed] [Google Scholar]
Xu R, Bannor R, Cardel MI, Foster GD, & Pagoto S (2023). How much food tracking during a digital weight-management program is enough to produce clinically significant weight loss? Obesity, 31(7), 1779–1786. 10.1002/oby.23795 [DOI] [PubMed] [Google Scholar]
Yardley L, Spring BJ, Riper H, Morrison LG, Crane DH, Curtis K, Merchant GC, Naughton F, & Blandford A (2016). Understanding and promoting effective engagement with digital behavior change interventions. American Journal of Preventive Medicine, 51(5), 833–842. 10.1016/j.amepre.2016.06.015 [DOI] [PubMed] [Google Scholar]

[R1] Aghabozorgi S, Seyed Shirkhorshidi A, & Ying Wah T (2015). Time-series clustering – a decade review. Information Systems, 53, 16–38. 10.1016/j.is.2015.04.007 [DOI] [Google Scholar]

[R2] Borghi L, Meschi T, Amato F, Briganti A, Novarini A, & Giannini A (1996). Urinary volume, water and recurrences in idiopathic calcium nephrolithiasis: A 5-year randomized prospective study. The Journal of Urology, 155(3), 839–843. [PubMed] [Google Scholar]

[R3] Burke LE, Wang J, & Sevick MA (2011). Self-monitoring in weight loss: A systematic review of the literature. Journal of the American Dietetic Association, 111(1), 92–102. 10.1016/j.jada.2010.10.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Butcher B, & Smith BJ (2020). Feature engineering and selection: A practical approach for predictive models: by max kuhn and kjell johnson. boca raton, fl: chapman & hall/crc press, 2019, xv + 297 pp. The American Statistician, 74(3), 308–309. 10.1080/00031305.2020.1790217 [DOI] [Google Scholar]

[R5] Cascarano A, Mur-Petit J, Hernandez-Gonzalez J, Camacho M, de Toro Eadie N, Gkontra P, ... & Lekadir K (2023). Machine and deep learning for longitudinal biomedical data: a review of methods and applications. Artificial Intelligence Review, 56(Suppl 2), 1711–1771. [Google Scholar]

[R6] Chen XW, & Lin X (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514–525. [Google Scholar]

[R7] Cohen R, Fernie G, & Roshan Fekr A (2022). Monitoring fluid intake by commercially available smart water bottles. Scientific Reports, 12(1), Article 1. 10.1038/s41598-022-08335-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Cole-Lewis H, Ezeanochie N, & Turgiss J (2019). Understanding health behavior technology engagement: Pathway to measuring digital behavior change interventions. JMIR Formative Research, 3(4), e14052. 10.2196/14052 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Compernolle S, DeSmet A, Poppe L, Crombez G, De Bourdeaudhuij I, Cardon G, van der Ploeg HP, & Van Dyck D (2019). Effectiveness of interventions using self-monitoring to reduce sedentary behavior in adults: A systematic review and meta-analysis. International Journal of Behavioral Nutrition and Physical Activity, 16(1), 1–16. 10.1186/s12966-019-0824-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Conroy DE, Wu J, Lee AM, Brunke-Reese D, & Lagoa CM (2023). Dose-response relations between the frequency of two types of momentary feedback prompts and daily physical activity. Health Psychology, 42(3), 151–160. cin20. 10.1037/hea0001271 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Conroy DE, West AB, Brunke-Reese D, Thomaz E, & Streeper NM (2020). Just-in-time adaptive intervention to promote fluid consumption in patients with kidney stones. Health Psychology: Official Journal of the Division of Health Psychology, American Psychological Association, 39(12), 1062–1069. 10.1037/hea0001032 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Dallery J, Kurti A, & Erb P (2014). A new frontier: Integrating behavioral and digital technology to promote health behavior. The Behavior Analyst, 38(1), 19–49. 10.1007/s40614-014-0017-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] De la Hoz A, Melo L, Álvarez A, Cañada F, & Cubero J (2023). The promotion of healthy hydration habits through educational robotics in university students. Healthcare, 11(15), Article 15. 10.3390/healthcare11152160 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Gan DZQ, McGillivray L, Han J, Christensen H, & Torok M (2021). Effect of engagement with digital interventions on mental health outcomes: A systematic review and meta-analysis. Frontiers in Digital Health, 3. https://www.frontiersin.org/articles/10.3389/fdgth.2021.764079 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Gavin KL, Sherwood NE, Wolfson J, Pereira MA, & Linde JA (2021). Characterizing self-monitoring behavior and Its association with physical activity and weight loss maintenance. American Journal of Lifestyle Medicine, 15(2), 173–183. 10.1177/1559827618790556 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Hekler EB, Klasnja P, Chevance G, Golaszewski NM, Lewis D, & Sim I (2019). Why we need a small data paradigm. BMC Medicine, 17(1), 133. 10.1186/s12916-019-1366-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Karpathakis K, Libow G, Potts HWW, Dixon S, Greaves F, & Murray E (2021). An Evaluation Service for Digital Public Health Interventions: User-Centered Design Approach. Journal of Medical Internet Research, 23(9), e28356. 10.2196/28356 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Khambati A, Matulewicz RS, Perry KT, & Nadler RB (2017). Factors associated with compliance to increased fluid intake and urine volume following dietary counseling in first-time kidney stone patients. Journal of Endourology, 31(6), 605–610. 10.1089/end.2016.0836 [DOI] [PubMed] [Google Scholar]

[R19] Krukowski RA, Harvey J, Borden J, Stansbury ML, & West DS (2022). Expert opinions on reducing dietary self-monitoring burden and maintaining efficacy in weight loss programs: A Delphi study. Obesity Science & Practice, 8(4), 401–410. 10.1002/osp4.586 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Michie S, Abraham C, Whittington C, McAteer J, & Gupta S (2009). Effective techniques in healthy eating and physical activity interventions: A meta-regression. Health Psychology, 28(6), 690–701. 10.1037/a0016136 [DOI] [PubMed] [Google Scholar]

[R21] Montero P, & Vilar JA (2014). Tsclust: An r package for time series clustering. Journal of Statistical Software, 62(1), 1–43. [Google Scholar]

[R22] Norgeot B, Glicksberg BS, Trupin L, Lituiev D, Gianfrancesco M, Oskotsky B, ... & Butte AJ (2019). Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA network open, 2(3), e190606–e190606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Pearle MS, Goldfarb DS, Assimos DG, Curhan G, Denu, -Ciocca Cynthia J, Matlaga BR, Monga M, Penniston KL, Preminger GM, Turk TMT, & White JR (2014). Medical management of kidney stones: Aua guideline. Journal of Urology, 192(2), 316–324. 10.1016/j.juro.2014.05.006 [DOI] [PubMed] [Google Scholar]

[R24] R Core Team. (2022). R: a language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]

[R25] Roque N (2019). Tsfeaturex: An r package for automating time series feature extraction. https://github.com/nelsonroque/tsfeaturex [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Saleem M, Kühne L, Santis KKD, Christianson L, Brand T, & Busse H (2021). Understanding Engagement Strategies in Digital Interventions for Mental Health Promotion: Scoping Review. JMIR Mental Health, 8(12), e30000. 10.2196/30000 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Sanders JP, Loveday A, Pearson N, Edwardson C, Yates T, Biddle SJ, Esliger DW, & Sanders. (2016). Devices for self-monitoring sedentary time or physical activity: A scoping review. Journal of Medical Internet Research, 18(5), e90–e90. cin20. 10.2196/jmir.5373 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Shahapure KR, & Nicholas C (2020, October). Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA) (pp. 747–748). IEEE. [Google Scholar]

[R29] Streeper NM, Fairbourn JD, Marks J, Thomaz E, Ram N, & Conroy DE (2023). Feasibility of mini sipit behavioral intervention to increase urine volume in patients with kidney stones. Urology. 10.1016/j.urology.2023.06.019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Verdonck T, Baesens B, Óskarsdóttir M, & vanden Broucke S (2021). Special issue on feature engineering editorial. Machine Learning. 10.1007/s10994-021-06042-2 [DOI] [Google Scholar]

[R31] Walton H, Spector A, Williamson M, Tombor I, & Michie S (2020). Developing quality fidelity and engagement measures for complex health interventions. British Journal of Health Psychology, 25(1), 39–60. 10.1111/bjhp.12394 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, & Venables B (2022). Gplots: Various r programming tools for plotting data. https://CRAN.R-project.org/package=gplots [Google Scholar]

[R33] Wickham H (2016). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org [Google Scholar]

[R34] Wilson DK, Christensen A, Jacobsen PB, & Kaplan RM (2019). Standards for economic analyses of interventions for the field of health psychology and behavioral medicine. Health Psychology, 38(8), 669–671. 10.1037/hea0000770 [DOI] [PubMed] [Google Scholar]

[R35] Wright HC, Alshara L, DiGennaro H, Kassis YE, Li J, Monga M, Calle J, & Sivalingam S (2022). The impact of smart technology on adherence rates and fluid management in the prevention of kidney stones. Urolithiasis, 50(1), 29–36. 10.1007/s00240-021-01270-6 [DOI] [PubMed] [Google Scholar]

[R36] Xu R, Bannor R, Cardel MI, Foster GD, & Pagoto S (2023). How much food tracking during a digital weight-management program is enough to produce clinically significant weight loss? Obesity, 31(7), 1779–1786. 10.1002/oby.23795 [DOI] [PubMed] [Google Scholar]

[R37] Yardley L, Spring BJ, Riper H, Morrison LG, Crane DH, Curtis K, Merchant GC, Naughton F, & Blandford A (2016). Understanding and promoting effective engagement with digital behavior change interventions. American Journal of Preventive Medicine, 51(5), 833–842. 10.1016/j.amepre.2016.06.015 [DOI] [PubMed] [Google Scholar]

PERMALINK

Small Data Approaches to Link Faster Time Scale Engagement Dynamics with Slower Time Scale Outcomes in Biobehavioral Interventions

Jingchuan Wu

Nilam Ram

James Marks

Necole M Streeper

David E Conroy

Abstract

Purpose:

Methods:

Results:

Conclusion:

IRB Approval:

Method

Participants

Intervention

Procedures

Measures

Urine Volume.

Demographics.

Daily Tracking.

Data Analysis

Time Series Clustering.

Feature Engineering.

Results

Table 1.

Table 2.

Figure 1. Time Series Cluster Dendrograms with Heatmap Visualization of 30-Day Fluid Consumption and Devices Engagement Data.

Figure 2. Patterns of Device Engagements Across Different Cluster Groups Over a 30-Day Period.

Table 3.

Discussion

Conclusion

Acknowledgments

Funding

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases