EntroLLM: Leveraging Entropy and Large Language Model Embeddings for Enhanced Risk Prediction with Wearable Device Data

Xueqing Huang; Tian Gu

. 2025 Jun 10;2025:225–234.

EntroLLM: Leveraging Entropy and Large Language Model Embeddings for Enhanced Risk Prediction with Wearable Device Data

Xueqing Huang ¹, Tian Gu ¹

PMCID: PMC12150754 PMID: 40502232

Abstract

Wearable devices collect complex structured data with high-dimensional and time-series features that are challenging for traditional models to handle efficiently. We propose EntroLLM, a new method that combines entropy measures and the low-dimensional representation (embedding) generated from large language models (LLMs) to enhance risk prediction using wearable device data. In EntroLLM, the entropy quantifies the variability of a subject’s physical activity patterns, while the LLM embedding approximates the latent temporal structure. We evaluate the feasibility and performance of EntroLLM using NHANES data to predict overweight status using demographics and physical activity collected from wearable devices. Results show that combining entropy with GPT-based embedding improves model performance compared to baseline models and other embedding techniques, leading to an average increase in AUC from 0.56 to 0.64. EntroLLM showcases the potential of combining entropy and LLM-based embedding and offers a promising approach to wearable device data analysis for predicting health outcomes.

Introduction

In the era of big data, high-dimensional time series data are becoming increasingly available in various fields, including finance, healthcare, and environmental studies1, ², and it also poses challenges to effectively and efficiently analyze such data. In particular, wearable devices continuously generate streams of health-related data such as heart rate, step count, and sleep patterns, representing high-dimensional time series data in healthcare. Traditional methods, such as ridge³ and Lasso regression⁴, although effective in many high-dimensional settings, can struggle to capture complex interactions and non-linear relationships, particularly among numerous categorical variables, leading to suboptimal prediction performance.

Recent advancements in machine learning, particularly in time series embeddings, have shown great potential for improving prediction performance^{5 - 8}. Traditional methods like Dynamic Time Warping⁹ and Word2Vec¹⁰ have been widely applied to time series tasks while the new embedding techniques convert time series data into low-dimensional representations of vector form, capturing key temporal dependencies. Recent studies have explored embedding methods for predicting various health outcomes, including cardiovascular disease risk¹¹, medical events¹², and patient deterioration¹³. More and more research has shown the potential of applying embedding techniques in wearable device data for enhanced disease risk prediction and patient health monitoring^{14 - 17}.

Large language models (LLMs) have recently emerged as promising tools for sequence modeling, demonstrating significant success in natural language processing applications processing¹⁸. With their deep architectures, LLMs have the potential to capture both local and global contexts in time-series data, enabling them to model complex dependencies over long time horizons. This capability positions LLMs as a promising approach for analyzing health-related time-series data, where contextual understanding may be critical¹⁹. Additionally, these models are good at handling high-dimensional and sparse data, such as that generated by wearable devices, suggesting they could provide robust embeddings for downstream prediction tasks^{20 - 23}. Further investigation into their applicability and effectiveness in this domain remains an exciting avenue for exploration.

While embeddings, including LLM embeddings, are effective at capturing patterns and context, they may not fully account for the variability and uncertainty inherent in wearable device data. Accelerometer data, for instance, often exhibits significant irregularity, such as fluctuating activity levels or unpredictable physiological responses, which embeddings alone might overlook. This is where entropy becomes a valuable complement. As a measure of unpredictability or randomness in time-series data, entropy quantifies the variability that embeddings cannot capture^24,25. By incorporating entropy, we can capture this additional layer of complexity, potentially enhancing both the predictive power and interpretability of models for wearable device data. While prior studies have explored the use of entropy with LLMs in various contexts—such as adjusting temperature sampling²⁶, detecting LLM hallucinations²⁷, and improving decoding strategies in retrieval-augmented LLM²⁸—these efforts primarily focus on enhancing LLM outputs or refining natural language processing tasks, which differ from our objective.

In this paper, we propose a novel predictive model that leverages entropy and LLM embeddings to improve health outcome predictions using wearable device data. Our approach directly combines LLM embeddings with entropy measures from wearable device data, bridging traditional time-series embedding techniques with the advanced representation learning capabilities of LLMs. By quantifying pattern variability through time-series entropy, we demonstrate that this integration can improve risk prediction performance, offering a new pathway for leveraging wearable device data to predict health outcomes more effectively.

Methods

We have the outcome of interest Y ∈ ℝⁿ, a set of p predictors X ∈ ℝ^n×p, and a set of q predictors Z ∈ ℝ^n×q in the data of size n. X is the baseline covariates, such as demographics. Z is the time series data collected from wearable devices, such as step counts recorded every minute, which is often high-dimensional. Our goal is to model the relationship between X, X and Z by fitting the conditional model Y|X,Z. We consider a generic class of machine-learning methods where fitting the model involves a loss function indexed by a set of parameters, denoted by L(βY,Y,Z) with β ∈ ℝ^p. For example, in the generalized linear models (GLM), L(βY,X,Z) is the negative log-likelihood function and β corresponds to the regression coefficients and over-dispersion parameters²⁹. We propose a new prediction model and compare its performance with several benchmark models suitable for the given case. We introduce the details of each model and key variables below.

Penalized Regression

Given the high-dimensional covariate Z, we can directly apply penalized regression

β_{λ} = {a r g m i n}_{β} L (β; Y, X, Z) + λ ‖ β ‖_{γ},

(1)

where λ is a tuning parameter and γ denotes different types of penalty. For example, when γ = 1, it is the Lasso penalty ⁴, while γ = 2 represents the ridge penalty³. The objective function L(β; Y, X, Z) depends on the type of outcome Y. For example, for continuous Y, we may use the linear link with the following loss function

L (β; Y, X, Z) = - \frac{1}{n} {‖Y - X β_{X} - Z β_{Z}‖}_{2}^{2} .

For binary Y, we may use the logistic regression model, and L(β; Y, X, Z) is of the following form

L (β; Y, X, Z) = - \frac{1}{n} \sum_{i = 1}^{n} [Y_{i} \log (p_{i}) + (1 - Y_{i}) \log (1 - p_{i})]

where p_i is the probability of Y_i = 1 for the i-th observation and i ∈ {1, … ,n}.

Entropy

Entropy, particularly Shannon entropy³⁰, measures the uncertainty or randomness in a probability distribution. In the context of wearable devices, entropy allows us to quantify the unpredictability or regularity of health-related variables, such as physical activity level over time. Specifically, we calculate the Shannon entropy of covariate defined as

H = - \sum_{j = 1}^{q} p (Z_{j}) \log_{2} [p (Z_{j})]

where (p(Z_j) represents the probability of occurrence of Z_j for each calendar day. For wearable device data like step counts, higher entropy indicates greater variability in activity levels over the measured time period, which suggests more irregular activity patterns. Conversely, lower entropy suggests more consistent, predictable behavior, such as regularly walking a similar number of daily steps. Such variability can be crucial for predicting health outcomes, as irregular activity patterns may be linked to conditions such as fatigue, mobility issues, or chronic illness, while more regular activity may indicate better overall health. In the entropy-only model, we fit a GLM using H as the key predictor in addition to X, i.e., g[E(Y|X,H)] = (Xβ_X + Hβ_H , where g(.) is the link function depending on the outcome type.

Large Language Model Embeddings

LLM embeddings were designed to turn large-scale text into low-dimensional numerical values for downstream analysis. We propose to use LLM embeddings to capture low-dimensional representations of the time-series covariate Z. Since LLM only processes data in text format, to achieve this, we first convert Z into a character-based format by transforming its numerical values into corresponding characters. As illustrated in Figure 1, these characters are then concatenated in sequence with a space separating each one, forming a single text string, which serves as the input for the LLM embeddings. We denote the resulting embeddings as E ∈ R^r×m of dimension m. Given E, we can fit the penalized GLM, Y|X, E, using both E and the baseline covariate X as predictors, similar to equation (1). We can also add penalty terms when necessary to account for the potential sparsity and high dimensionality.

Figure 1. — A schematic illustration of generating LLM embeddings.

We implement three popular and established LLMs to compute embeddings:

OpenAI Generative Pre-training Transformer (GPT)-based model (text-embedding-3-small): The input text is tokenized and passed through multiple transformer layers, producing embeddings in the default 1,536-dimensional space³¹. To test the robustness, we also reduce the number of embeddings to 50 through truncation and normalization of the resulting vectors to unit length³¹.
BERT model (bert-base-uncased)³²: We use the default 768-dimension embeddings and also reduce them to 50-dimensional, similar to GPT embedding.
Cohere model (embed-english-v3.0)³³: We use the default 1,024-dimensional embeddings and then reduce them to 50 dimensions, similar to GPT and BERT embedding.

Proposed EntroLLM Model

We propose to incorporate both time series entropy and LLM embeddings to create a more comprehensive predictive model. While LLM embeddings capture complex temporal dependencies, entropy provides an additional layer of information about the variability and uncertainty within the time series data. As illustrated in Figure 2, EntroLLM first computes the entropy, H, and LLM embeddings, E, using covariate Z, respectively. Then it includes H and E as predictors, in addition to X, in the target model, i.e., we can fit the penalized GLM Y|X, E, H similar to equation (1).

Figure 2. — Flowchart of the proposed EntroLLM model.

Data Application

We evaluate EntroLLM using physical activity data from wearable devices in NHANES to predict overweight status. The Centers for Disease Control and Prevention established the NHANES program in the 1960s to record the health status of Americans. The dataset from wearable devices was collected during the survey cycles of 2003-2006 and 2011-201434, ³⁵. In our analysis, we focus on data from the 2003-2004 and 2005-2006 periods. Participants wore ActiGraph devices for seven consecutive days, which recorded their physical activity intensity every 60 seconds. We apply standard quality control procedures, including excluding pregnant females and only retaining reliable and calibrated NHANES measurements from adults aged 20-85 years old36, ³⁷. We include participants who had complete covariate values and activity data over a seven-day period, resulting in a total of 6,943 subjects.

Outcome Variables

Our outcome of interest is overweight status, defined by Body Mass Index (BMI) based on World Health Organization guidelines³⁸. Individuals with a BMI over 25.0 kg/m² are classified as overweight (1, including obese), while those with a BMI of 25.0 kg/m² or lower are classified as non-overweight (0).

Accelerometer Data from Wearable Devices

We use the intensity values of physical activity as the exposure variables (Z), along with baseline demographic data (X). In Figure 3(a), we present three randomly selected case samples that illustrate the time series data capturing physical activity patterns over a span of seven days. Each subplot represents an individual’s weekly activity pattern, with the x-axis showing time in minutes, covering over 10,080 minutes for the entire week, while the y-axis indicates the intensity level.

We preprocess the raw accelerometer data to ensure consistency with the LLM input format and to keep the total number of tokens, i.e., the number of characters, within the input limit. We first average the physical activity data over five-minute intervals for each subject. Next, we categorize these averaged values into five groups based on intensity levels: sedentary (intensity <100 counts/min), light (100 ≤ intensity < 760 counts/min), lifestyle (760 ≤ intensity <2200 counts/min), moderate (2200 ≤ intensity < 6000 counts/min), and vigorous (intensity ≥6000 counts/min)^{39, 40}. Among the three LLMs we implement, BERT is available for free, while the OpenAI API and Cohere API charge computing fees based on the number of input tokens. All three have input token limits. To ensure cost-effectiveness for computing embeddings, we recode the original intensity values as follows to reduce the total number of token inputs: “0” for sedentary, “1” for light, “2” for lifestyle, “3” for moderate, and “4” for vigorous activities. Figure 3(b) displays the activity patterns of the same three individuals in panel 3(a) after preprocessing. The y-axis represents numeric values from 0 to 4 for illustration purposes.

To summarize, we transform the raw accelerometer data into continuous intensity values, which are then converted into categorical variables. This process preserves the overall trends and key characteristics of the data. As a result, we reduce the number of data points for each sample from 10,080 to 2,016. This reduction shortens the input token length for the LLM, making computation more cost-efficient. Entropy H is calculated for each day using q=288, resulting in seven daily entropy values, all of which are included in the model.

Baseline Covariates

In addition to the wearable device data, we also consider several baseline covariates, including gender, age, race, education level, marital status, and family poverty income ratio (PIR)³⁶. Race is categorized into five groups: non-Hispanic white, non-Hispanic black, Mexican American, other Hispanic, and other racial identities (including multiracial). Education level is assessed by the highest grade completed, level of schooling attained, or the highest degree earned. Marital status is measured using a six-point scale: married, widowed, divorced, separated, never married, or living with a partner. The family PIR compares an individual’s family income to the national poverty threshold, with values from zero to five. Values over five are recorded as five to protect privacy³⁵.

Comparison Models

Since the outcome is binary, we apply logistic regression using the logit link function across all models. The ridge penalty is added when embeddings are included as the predictor. In each of the nine models we compare, the baseline covariates are always adjusted:

Demographic-only model: Y|X, using only baseline covariates, X.
Base model: Y|X, Z, using baseline covariates and the raw accelerometer data, Z.
Entropy-only model: Y|X, H, using baseline covariates and the entropy, H, without embeddings.
EntroLLM: Y|X, H, E, using baseline covariates, entropy, and LLM embeddings
1. EntroBERT768: BERT embeddings of size 768 (the default embedding dimension of the BERT model).
2. EntroCohere1024: Cohere embeddings of size 1,024 (the default embedding dimension of the Cohere model).
3. EntroGPT1536: GPT embeddings of size 1,536 (the default embedding dimension of the GPT model).
4. EntroBERT50: BERT embeddings of size 50.
5. EntroCohere50: Cohere embeddings of size 50.
6. EntroGPT50: GPT embeddings of size 50.

In addition to the ridge penalty, we also assess the Lasso penalty to explore the effectiveness of different regularization methods in these models.

Model Evaluation

We split the data into a training set (80%) for model fitting and a testing set (20%) for evaluation. The Area Under the Receiver Operating Characteristic Curve (AUC) is used as the evaluation metric. We also assess precision, defined as true positives divided by the sum of true positives and false positives, and recall, defined as true positives divided by the sum of true positives and false negatives, to classify high-risk (HR) patients. For each model, HR patients are defined as those predicted to be overweight. They are classified based on the ROC curve by selecting the threshold that maximizes the F1 score, calculated as 2 × (precision × recall) / (precision + recall). True positives are those correctly identified as HR, while false positives are those incorrectly classified as HR.

Results

Demographic Characteristics

Table 1 summarizes the baseline demographic information of the dataset, categorized by outcome status and overall population. The total population is nearly evenly split between male (51%) and female (49%) participants. The average age in the overweight group is 50 years, slightly higher than the non-overweight group, which has an average age of 47 years. Non-Hispanic Whites make up the largest demographic in both BMI groups and the overall population, at over 48%. Mexican Americans have a higher proportion in the overweight group (22%) compared to the non-overweight group (16%). College graduates or individuals with higher education are less prevalent in the overweight group (18%) than in the non-overweight group (24%). Additionally, a higher percentage of married individuals are found in the overweight group (57%) compared to the non-overweight group (51%). The mean family poverty income ratio is similar in both groups, slightly above 2.6.

Table 1.

Demographic information of samples stratified by overweight status

Characteristic Overall (N = 6,943) ¹	Non-overweight (N = 2,081) ¹	Overweight (N = 4,862) ¹	p-value ²
Gender				< 0.001
Male	3,536 (51%)	975 (47%)	2,561 (53%)
Female	3,407 (49%)	1,106 (53%)	2,301 (47%)
Age (years)	49 (18)	47 (19)	50 (17)	< 0.001
Race				< 0.001
Mexican American	1,394 (20%)	332 (16%)	1,062 (22%)
Other Hispanic	206 (3.0%)	67 (3.2%)	139 (2.9%)
Non-Hispanic White	3,542 (51%)	1,198 (58%)	2,344 (48%)
Non-Hispanic Black	1,526 (22%)	350 (17%)	1,176 (24%)
Others	275 (4.0%)	134 (6.4%)	141 (2.9%)
Education				< 0.001
<9th Grade	877 (13%)	227 (11%)	650 (13%)
9-11th Grade (No diploma)	1,029 (15%)	325 (16%)	704 (14%)
High School Grad/GED	1,692 (24%)	471 (23%)	1,221 (25%)
Some College or AA degree	1,998 (29%)	567 (27%)	1,431 (29%)
College Graduate or Above	1,347 (19%)	491 (24%)	856 (18%)
Married				< 0.001
Married	3,830 (55%)	1,060 (51%)	2,770 (57%)
Widowed	548 (7.9%)	157 (7.5%)	391 (8.0%)
Divorced	737 (11%)	209 (10%)	528 (11%)
Separated	201 (2.9%)	52 (2.5%)	149 (3.1%)
Never Married	1,125 (16%)	430 (21%)	695 (14%)
Living with Partner	502 (7.2%)	173 (8.3%)	329 (6.8%)
Family Poverty Income Ratio	2.66 (1.59)	2.65 (1.61)	2.66 (1.58)	0.7

Open in a new tab

n (%); Mean (SD).

Pearson’s Chi-squared test; Wilcoxon rank sum test.

AUC and ROC Results

Figures 4(a) and 4(b) present the AUC boxplots and ROC curves for the nine models, respectively. The base model performs the worst across both metrics, with the lowest median AUC of around 0.56 and its ROC curve positioned farthest from the top-left corner, indicating poor classification ability. The demographic-only model shows slight improvement, with a median AUC of approximately 0.58 and the ROC curve closer to the top-left corner, reflecting marginally better classification.

Figure 4. — Model performance across eight predictive models. (a) AUC boxplots; (b) ROC plots; (c) True positives and false positives based on the maximum F1 scores; (d) Maximum F1 scores, precision, and recall.

Models incorporating entropy with BERT (EntroBERT768, EntroBERT50) and Cohere (EntroCohere1024, EntroCohere50) embeddings show slight improvements over the baseline, but still underperform compared to the entropy-only model. The entropy-only model demonstrates better performance, with a median AUC of 0.62 and a ROC curve positioned closer to the top-left corner, particularly in the mid-to-high sensitivity range.

The best performance is seen with EntroGPT1536 and EntroGPT50, which achieve the highest AUCs of approximately 0.64 and exhibit superior classification performance, especially at higher sensitivity levels. These models’ ROC curves are noticeably closer to the top-left corner, confirming the AUC results. Notably, EntroGPT1536 has relatively lower variability, reflecting a consistent and robust improvement in performance. The improved performance of these models emphasizes that combining entropy with GPT embeddings provides the most substantial and stable gains, with the dimensionality of the embeddings playing a critical role depending on the model used.

High-risk patient classification

Figure 4(c) illustrates the HR patient classification results based on the ROC curve in Figure 4(b), showing the number of true positives (blue) and false positives (red) for each model. The base model has 958 true positives and 411 false positives, performing the worst in terms of misclassification. The demographic-only model achieves 962 true positives and 415 false positives, slightly outperforming the base model in true positives but with a modestly higher number of false positives. EntroBERT768 and EntroBERT50 show similar performance, detecting 951 and 964 true positives, respectively, with 393 and 413 false positives. EntroCohere1024 and EntroCohere50 also demonstrate comparable results, with 959 and 957 true positives and slightly fewer false positives (406 and 403). The entropy-only model achieves a balance, detecting 945 true positives with 391 false positives. The best performance is observed in the EntroGPT1536 and EntroGPT50 models, which have the lowest number of false positives (382 and 372) while maintaining 936 and 935 true positives, respectively, indicating superior classification accuracy.

While the EntroGPT1536 and EntroGPT50 models achieve the lowest number of false positives, this comes with a slightly reduced number of true positives compared to the entropy-only model. Nevertheless, these models outperform others in terms of balancing classification accuracy with the risk of misclassification, making them particularly effective for HR patient classification, where minimizing false positives is crucial.

Figure 4(d) shows the corresponding maximum F1 scores, precision, and recall derived from the HR patient classification results. Recall (blue curve) remains consistently high, close to 1.0 for all models, indicating strong performance in identifying true positives. In HR patient classification, high recall is essential to ensure at-risk patients are identified, minimizing the risk of missing those in need of intervention. The maximum F1 score (red curve), which balances precision and recall, is around 0.8 for all models, with minimal variation across the models. Precision (green curve) shows slightly more variability, with the demographic-only model having the lowest precision due to a higher number of false positives. In contrast, EntroGPT1536 and EntroGPT50 models show slightly better precision, reflecting their ability to minimize false positives while maintaining high recall. This reduction in false positives is particularly important in clinical settings, where false positives can lead to unnecessary follow-ups or treatments.

Interestingly, the results highlight that GPT-based models tend to outperform others slightly in precision, while recall remains consistently high across all models. This suggests that combining entropy with GPT embeddings offers the most balanced and robust solution for HR patient classification. The improved precision seen with these models helps reduce unnecessary resource allocation, making them a more efficient choice for high-risk patient identification. When evaluating the model using the Lasso penalty (results not shown), we found a similar trend to ridge regression.

Discussion and Conclusion

In this paper, we propose a novel risk prediction model, EntroLLM, designed for analyzing accelerometer data from wearable devices. By combining LLM embeddings with time-series entropy, EntroLLM quantifies variability in physical activity patterns while leveraging LLMs to capture low-dimensional representations of the data. The results show that EntroLLM is more effective than traditional models, like ridge regression, in predicting overweight status from physical activity data in the NHANES dataset. Notably, models using GPT embeddings show the most significant improvement, offering the best AUC performance and more stable variability across simulations. Additionally, entropy provides valuable insights into the variability of wearable data, especially in capturing irregular activity patterns crucial for health predictions. The code for implementing EntroLLM is available on GitHub^*.

EntroLLM shows the potential for real-time health monitoring and risk prediction, particularly in identifying patients at high risk, such as those with overweight status. While its application to other health outcomes is still to be explored, EntroLLM’s ability to capture activity variability also improves interpretability, providing healthcare providers with actionable insights into how activity patterns relate to health risks, which could be valuable in proactive healthcare interventions. Beyond healthcare, EntroLLM’s ability to integrate entropy and LLM embeddings holds promise for various domains that rely on time-series data for risk prediction. Its framework could be adapted to diverse applications where capturing variability and temporal patterns is critical. Careful validation would be necessary to ensure its effectiveness in these fields. Future work could also extend EntroLLM to other health outcomes and explore the integration of multimodal data to enhance predictive accuracy. Additionally, transfer learning, leveraging LLM embeddings trained on diverse data, offers the potential for improving generalization across domains.

In EntroLLM, raw accelerometer data is converted into numerical intensity values that contain no language content. Despite this, the method seems to effectively capture temporal patterns and achieves strong predictive performance. To further investigate the robustness of the model and explore why LLM embeddings perform well in this context, we examined the effect of shuffling the raw accelerometer data. Specifically, the columns of are shuffled before generating LLM embeddings. Interestingly, while the embeddings changed, the final predictions remained consistent. This suggests that the model may rely on cross-sample patterns that persist even after shuffling, or that LLMs capture intrinsic properties of the accelerometer data that are independent of the exact sequence of features. Existing work suggests other approaches for merging numerical time-series data with language model outputs, offering promising avenues for future exploration⁴¹. Furthermore, while LLM embeddings proved effective, their reliance on pre-trained models presents an opportunity to develop measurement-specific embeddings tailored to wearable data, which could improve accuracy—especially when integrating multiple wearable measurements like physical activity, heart rate variability, and sleep patterns.

While the results are promising, there are areas for future improvement. The current method relies on predefined categorical groupings of physical activity intensity, which simplifies the data representation but may lead to a loss of granularity. More refined categorization or dynamic representations could potentially enhance the model’s ability to capture nuanced activity patterns. Additionally, the token input limitations of certain models provide another avenue for optimization. The OpenAI GPT-based model, with a maximum token limit of 8,191, fully processed our input sequence of 4,031 tokens, while BERT and Cohere models, limited to 512 tokens, required truncation, leading to a potential loss of temporal information and reduced predictive performance. This limitation likely contributes to EntroGPT outperforming EntroBERT and EntroCohere. While the model’s AUC of 0.64 is a promising outcome, it highlights opportunities for improvement through the use of richer datasets and advanced modeling techniques.

Related studies have achieved higher predictive accuracies by integrating multimodal wearable data or employing alternative model architectures. For instance, Xue et al.⁴² achieved accuracies of 77% to 86% using Recurrent Neural Networks, while Romero-Tapiador et al.⁴³ reported an AUC of over 0.8 with gradient-boosting classifiers. These results underline the potential of incorporating additional data modalities and optimizing feature selection to achieve further gains in predictive accuracy.

Acknowledgment

This work is supported by the National Institute of Health (R01CA296289).

Footnotes

https://github.com/huangxq63/EntroLLM

Figures & Tables

References

1.Almeida A, Brás S, Sargento S, Pinto Filipe Cabral. Time series big data: a survey on data stream frameworks, analysis and algorithms. 2023 May 28;10(1) doi: 10.1186/s40537-023-00760-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Gudziunaite S, Shabani Z, Weitensfelder L, Moshammer Hanns. Time series analysis in environmental epidemiology: challenges and considerations. International Journal of Occupational Medicine and Environmental Health. 2023 Oct 2;36(6):704–16. doi: 10.13075/ijomeh.1896.02237. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. [Google Scholar]
4.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88. [Google Scholar]
5.Fraikin AF, Bennetot A, Allassonnière S. T-Rep: representation learning for time series using time-embeddings. Proceedings of the Twelfth International Conference on Learning Representations (ICLR) 2024 Available from: https://openreview.net/forum?id=3y2TfP966N . [Google Scholar]
6.Nalmpantis Christoforos, Vrakas Dimitris. Signal2Vec: time series embedding representation. Communications in computer and information science. 2019 Jan 1;80:90. [Google Scholar]
7.Gugulothu N, Tv V, Malhotra P, Vig L, Agarwal P, Shroff G. Predicting remaining useful life using time series embeddings based on recurrent neural networks. arXiv preprint arXiv:1709.01073. 2017 Sep 4 [Google Scholar]
8.Jiang S, Koch B, Sun Y. HINTS: citation time series prediction for new publications via dynamic heterogeneous information network embedding. 2021 Apr 19 [Google Scholar]
9.Bemdt D, Clifford J. Using dynamic time warping to find patterns in time series [Internet] 1994 Available from: https://cdn.aaai.org/Workshops/1994/WS-94-03/WS94-03-031.pdf . [Google Scholar]
10.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space [Internet] 2013 Available from: https://arxiv.org/pdf/1301.3781 . [Google Scholar]
11.Moon J, Posada-Quintero HF, Chon KH. A literature embedding model for cardiovascular disease prediction using risk factors, symptoms, and genotype information. Expert Systems with Applications. 2023 Mar 1;213:118930. [Google Scholar]
12.Farhan W, Wang Z, Huang Y, Wang S, Wang F, Jiang X. A predictive model for medical events based on contextual embedding of temporal sequences. JMIR Medical Informatics. 2016 Nov 25;4(4):e39. doi: 10.2196/medinform.5977. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Li Y, Dong W, Ru B, Black A, Zhang X, Guan Y. Generic medical concept embedding and time decay for diverse patient outcome prediction tasks. iScience. 2022 Aug 1;25(9):104880–0. doi: 10.1016/j.isci.2022.104880. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Franceschi JY, Jaggi M. Unsupervised scalable representation learning for multivariate time series [Internet] [cited 2024 Sep 14] Available from: https://arxiv.org/pdf/1901.10738 . [Google Scholar]
15.Spathis D, Perez-Pozuelo I, Brage S, Wareham NJ, Mascolo C. Self-supervised transfer learning of physiological representations from free-living wearable data. InProceedings of the Conference on Health, Inference, and Learning. 2021 Apr 8:69–78. [Google Scholar]
16.Ghods A, Cook DJ. Activity2Vec: learning ADL embeddings from sensor data with a sequence-to-sequence model [Internet] 2019 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/1907.05597 . [Google Scholar]
17.Jiang JY, Chao Z, Bertozzi AL, Wang W, Young SD, Needell D. Learning to predict human stress level with incomplete sensor data from wearable devices. Proceedings of the 28th ACM International conference on information and knowledge management. 2019 Nov 3:2773–2781. [Google Scholar]
18.Min B, Ross H, Sulem E, Veyseh APB, Nguyen TH, Sainz O, et al. Recent advances in natural language processing via large pre-trained language models: a survey (2021) arXiv preprint arXiv:2111.01243 [Google Scholar]
19.Jin M, Zhang Y, Chen W, Zhang K, Liang Y, Yang B, et al. Position: what can large language models tell us about time series analysis [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2402.02713 . [Google Scholar]
20.Kim Y, Xu X, McDuff D, Breazeal C, Park HW. Health-LLM: large language models for health prediction via wearable sensor data [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2401.06866 . [Google Scholar]
21.Hota A, Chatterjee S, Chakraborty S. Evaluating large language models as virtual annotators for time-series physical sensing data [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2403.01133 . [Google Scholar]
22.Liu Z, Chen C, Cao J, Pan M, Liu J, Li N, et al. Large language models for cuffless blood pressure measurement from wearable biosignals [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2406.18069 . [Google Scholar]
23.Jörke M, Sapkota S, Warkenthien L, Vainio N, Schmiedmayer P, Brunskill E, et al. Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents. arXiv preprint arXiv:2405.06061. 2024 May 9 [Google Scholar]
24.Balzter H, Tate N, Kaduk J, Harper D, Page S, Morrison R, et al. Multi-scale entropy analysis as a method for time-series analysis of climate data. Climate. 2015 Mar 6;3(1):227–40. [Google Scholar]
25.Shardt YAW, Huang B. Statistical properties of signal entropy for use in detecting changes in time series data. Journal of Chemometrics. 2013 Sep 9;27(11):394–405. [Google Scholar]
26.Zhang S, Bao Y, Huang S. EDT: Improving Large Language Models’ Generation by Entropy-based Dynamic Temperature Sampling. arXiv preprint arXiv:2403.14541. 2024 Mar 21 [Google Scholar]
27.Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature. 2024 Jun 20;630(8017):625–30. doi: 10.1038/s41586-024-07421-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Qiu Z, Ou Z, Wu B, Li J, Liu A, King I. Entropy-based decoding for retrieval-augmented large language models. arXiv preprint arXiv:2406.17519. 2024 Jun 25 [Google Scholar]
29.Müller HG, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005 Apr 1;33(2) [Google Scholar]
30.Shannon CE. A mathematical theory of communication. The Bell system technical journal. 1948 Jul;27(3):379–423. [Google Scholar]
31.OpenAI API [Internet] platform.openai.com. Available from: https://platform.openai.com/docs/guides/embeddin. gs/what-are-embeddings. [Google Scholar]
32.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding [Internet] arXiv.org. 2018 Available from: https://arxiv.org/abs/1810.04805 . [Google Scholar]
33.Embeddings [Internet] Cohere AI. Available from: https://docs.cohere.com/docs/embeddings . [Google Scholar]
34.NHANES questionnaires, datasets, and related documentation [Internet] wwwn.cdc.gov . Available from: https://wwwn.cdc.gov/nchs/nhanes/ [Google Scholar]
35.Johnson CL, Paulose-Ram R, Ogden CL, Carroll MD, Kruszon-Moran D, Dohrmann SM, et al. National health and nutrition examination survey: analytic guidelines, 1999-2010. Vital and Health Statistics Series 2, Data Evaluation and Methods Research [Internet] 2013 Sep 1;161:1–24. Available from: https://pubmed.ncbi.nlm.ni. h.gov/25090154/ [PubMed] [Google Scholar]
36.Cheng X, Lin S, Liu J, Liu S, Zhang J, Nie P, et al. Does physical activity predict obesity—A machine learning and statistical method-based analysis. International Journal of environmental research and public Health. 2021 Apr 9;18(8):3966. doi: 10.3390/ijerph18083966. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Chang HW, McKeague IW. Empirical likelihood-based inference for functional means with application to wearable device data. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2022 Nov;84(5):1947–68. [Google Scholar]
38.World Health Organization Obesity and overweight [Internet] World Health Organization. 2024 Available from: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight . [Google Scholar]
39.Wanner M, Richard A, Martin B, Faeh D, Rohrmann S. Associations between self-reported and objectively measured physical activity, sedentary behavior and overweight/obesity in NHANES 2003–2006. International Journal of Obesity. 2016 Sep 28;41(1):186–93. doi: 10.1038/ijo.2016.168. [DOI] [PubMed] [Google Scholar]
40.Matthews CE, Chen KY, Freedson PS, Buchowski MS, Beech BM, Pate RR, et al. Amount of time spent in sedentary behaviors in the United States, 2003-2004. American Journal of Epidemiology. 2008 Mar 14;167(7):875–81. doi: 10.1093/aje/kwm390. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Zhang X, Chowdhury RR, Gupta RK, Shang J. Large language models for time series: A survey. arXiv preprint arXiv:2402.01801. 2024 Feb 2 [Google Scholar]
42.Xue Q, Wang X, Meehan S, Kuang J, Gao JA, Chuah MC. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) IEEE; 2018 Dec 17. Recurrent neural networks based obesity status prediction using activity data; pp. 865–870. [Google Scholar]
43.Romero-Tapiador S, Tolosana R, Morales A, Lacruz-Pleguezuelos B, Pastor SB, Marcos-Zambrano LJ, Bazán GX, Freixer G, Vera-Rodriguez R, Fierrez J, Ortega-Garcia J. Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence. arXiv preprint arXiv:2409.08700. 2024 Sep 13 [Google Scholar]

[r1-6448] 1.Almeida A, Brás S, Sargento S, Pinto Filipe Cabral. Time series big data: a survey on data stream frameworks, analysis and algorithms. 2023 May 28;10(1) doi: 10.1186/s40537-023-00760-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2-6448] 2.Gudziunaite S, Shabani Z, Weitensfelder L, Moshammer Hanns. Time series analysis in environmental epidemiology: challenges and considerations. International Journal of Occupational Medicine and Environmental Health. 2023 Oct 2;36(6):704–16. doi: 10.13075/ijomeh.1896.02237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3-6448] 3.Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. [Google Scholar]

[r4-6448] 4.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88. [Google Scholar]

[r5-6448] 5.Fraikin AF, Bennetot A, Allassonnière S. T-Rep: representation learning for time series using time-embeddings. Proceedings of the Twelfth International Conference on Learning Representations (ICLR) 2024 Available from: https://openreview.net/forum?id=3y2TfP966N . [Google Scholar]

[r6-6448] 6.Nalmpantis Christoforos, Vrakas Dimitris. Signal2Vec: time series embedding representation. Communications in computer and information science. 2019 Jan 1;80:90. [Google Scholar]

[r7-6448] 7.Gugulothu N, Tv V, Malhotra P, Vig L, Agarwal P, Shroff G. Predicting remaining useful life using time series embeddings based on recurrent neural networks. arXiv preprint arXiv:1709.01073. 2017 Sep 4 [Google Scholar]

[r8-6448] 8.Jiang S, Koch B, Sun Y. HINTS: citation time series prediction for new publications via dynamic heterogeneous information network embedding. 2021 Apr 19 [Google Scholar]

[r9-6448] 9.Bemdt D, Clifford J. Using dynamic time warping to find patterns in time series [Internet] 1994 Available from: https://cdn.aaai.org/Workshops/1994/WS-94-03/WS94-03-031.pdf . [Google Scholar]

[r10-6448] 10.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space [Internet] 2013 Available from: https://arxiv.org/pdf/1301.3781 . [Google Scholar]

[r11-6448] 11.Moon J, Posada-Quintero HF, Chon KH. A literature embedding model for cardiovascular disease prediction using risk factors, symptoms, and genotype information. Expert Systems with Applications. 2023 Mar 1;213:118930. [Google Scholar]

[r12-6448] 12.Farhan W, Wang Z, Huang Y, Wang S, Wang F, Jiang X. A predictive model for medical events based on contextual embedding of temporal sequences. JMIR Medical Informatics. 2016 Nov 25;4(4):e39. doi: 10.2196/medinform.5977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13-6448] 13.Li Y, Dong W, Ru B, Black A, Zhang X, Guan Y. Generic medical concept embedding and time decay for diverse patient outcome prediction tasks. iScience. 2022 Aug 1;25(9):104880–0. doi: 10.1016/j.isci.2022.104880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14-6448] 14.Franceschi JY, Jaggi M. Unsupervised scalable representation learning for multivariate time series [Internet] [cited 2024 Sep 14] Available from: https://arxiv.org/pdf/1901.10738 . [Google Scholar]

[r15-6448] 15.Spathis D, Perez-Pozuelo I, Brage S, Wareham NJ, Mascolo C. Self-supervised transfer learning of physiological representations from free-living wearable data. InProceedings of the Conference on Health, Inference, and Learning. 2021 Apr 8:69–78. [Google Scholar]

[r16-6448] 16.Ghods A, Cook DJ. Activity2Vec: learning ADL embeddings from sensor data with a sequence-to-sequence model [Internet] 2019 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/1907.05597 . [Google Scholar]

[r17-6448] 17.Jiang JY, Chao Z, Bertozzi AL, Wang W, Young SD, Needell D. Learning to predict human stress level with incomplete sensor data from wearable devices. Proceedings of the 28th ACM International conference on information and knowledge management. 2019 Nov 3:2773–2781. [Google Scholar]

[r18-6448] 18.Min B, Ross H, Sulem E, Veyseh APB, Nguyen TH, Sainz O, et al. Recent advances in natural language processing via large pre-trained language models: a survey (2021) arXiv preprint arXiv:2111.01243 [Google Scholar]

[r19-6448] 19.Jin M, Zhang Y, Chen W, Zhang K, Liang Y, Yang B, et al. Position: what can large language models tell us about time series analysis [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2402.02713 . [Google Scholar]

[r20-6448] 20.Kim Y, Xu X, McDuff D, Breazeal C, Park HW. Health-LLM: large language models for health prediction via wearable sensor data [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2401.06866 . [Google Scholar]

[r21-6448] 21.Hota A, Chatterjee S, Chakraborty S. Evaluating large language models as virtual annotators for time-series physical sensing data [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2403.01133 . [Google Scholar]

[r22-6448] 22.Liu Z, Chen C, Cao J, Pan M, Liu J, Li N, et al. Large language models for cuffless blood pressure measurement from wearable biosignals [Internet] 2024 [cited 2024 Sep 16] Available from: https://arxiv.org/abs/2406.18069 . [Google Scholar]

[r23-6448] 23.Jörke M, Sapkota S, Warkenthien L, Vainio N, Schmiedmayer P, Brunskill E, et al. Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents. arXiv preprint arXiv:2405.06061. 2024 May 9 [Google Scholar]

[r24-6448] 24.Balzter H, Tate N, Kaduk J, Harper D, Page S, Morrison R, et al. Multi-scale entropy analysis as a method for time-series analysis of climate data. Climate. 2015 Mar 6;3(1):227–40. [Google Scholar]

[r25-6448] 25.Shardt YAW, Huang B. Statistical properties of signal entropy for use in detecting changes in time series data. Journal of Chemometrics. 2013 Sep 9;27(11):394–405. [Google Scholar]

[r26-6448] 26.Zhang S, Bao Y, Huang S. EDT: Improving Large Language Models’ Generation by Entropy-based Dynamic Temperature Sampling. arXiv preprint arXiv:2403.14541. 2024 Mar 21 [Google Scholar]

[r27-6448] 27.Farquhar S, Kossen J, Kuhn L, Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature. 2024 Jun 20;630(8017):625–30. doi: 10.1038/s41586-024-07421-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28-6448] 28.Qiu Z, Ou Z, Wu B, Li J, Liu A, King I. Entropy-based decoding for retrieval-augmented large language models. arXiv preprint arXiv:2406.17519. 2024 Jun 25 [Google Scholar]

[r29-6448] 29.Müller HG, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005 Apr 1;33(2) [Google Scholar]

[r30-6448] 30.Shannon CE. A mathematical theory of communication. The Bell system technical journal. 1948 Jul;27(3):379–423. [Google Scholar]

[r31-6448] 31.OpenAI API [Internet] platform.openai.com. Available from: https://platform.openai.com/docs/guides/embeddin. gs/what-are-embeddings. [Google Scholar]

[r32-6448] 32.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding [Internet] arXiv.org. 2018 Available from: https://arxiv.org/abs/1810.04805 . [Google Scholar]

[r33-6448] 33.Embeddings [Internet] Cohere AI. Available from: https://docs.cohere.com/docs/embeddings . [Google Scholar]

[r34-6448] 34.NHANES questionnaires, datasets, and related documentation [Internet] wwwn.cdc.gov . Available from: https://wwwn.cdc.gov/nchs/nhanes/ [Google Scholar]

[r35-6448] 35.Johnson CL, Paulose-Ram R, Ogden CL, Carroll MD, Kruszon-Moran D, Dohrmann SM, et al. National health and nutrition examination survey: analytic guidelines, 1999-2010. Vital and Health Statistics Series 2, Data Evaluation and Methods Research [Internet] 2013 Sep 1;161:1–24. Available from: https://pubmed.ncbi.nlm.ni. h.gov/25090154/ [PubMed] [Google Scholar]

[r36-6448] 36.Cheng X, Lin S, Liu J, Liu S, Zhang J, Nie P, et al. Does physical activity predict obesity—A machine learning and statistical method-based analysis. International Journal of environmental research and public Health. 2021 Apr 9;18(8):3966. doi: 10.3390/ijerph18083966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r37-6448] 37.Chang HW, McKeague IW. Empirical likelihood-based inference for functional means with application to wearable device data. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2022 Nov;84(5):1947–68. [Google Scholar]

[r38-6448] 38.World Health Organization Obesity and overweight [Internet] World Health Organization. 2024 Available from: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight . [Google Scholar]

[r39-6448] 39.Wanner M, Richard A, Martin B, Faeh D, Rohrmann S. Associations between self-reported and objectively measured physical activity, sedentary behavior and overweight/obesity in NHANES 2003–2006. International Journal of Obesity. 2016 Sep 28;41(1):186–93. doi: 10.1038/ijo.2016.168. [DOI] [PubMed] [Google Scholar]

[r40-6448] 40.Matthews CE, Chen KY, Freedson PS, Buchowski MS, Beech BM, Pate RR, et al. Amount of time spent in sedentary behaviors in the United States, 2003-2004. American Journal of Epidemiology. 2008 Mar 14;167(7):875–81. doi: 10.1093/aje/kwm390. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r41-6448] 41.Zhang X, Chowdhury RR, Gupta RK, Shang J. Large language models for time series: A survey. arXiv preprint arXiv:2402.01801. 2024 Feb 2 [Google Scholar]

[r42-6448] 42.Xue Q, Wang X, Meehan S, Kuang J, Gao JA, Chuah MC. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) IEEE; 2018 Dec 17. Recurrent neural networks based obesity status prediction using activity data; pp. 865–870. [Google Scholar]

[r43-6448] 43.Romero-Tapiador S, Tolosana R, Morales A, Lacruz-Pleguezuelos B, Pastor SB, Marcos-Zambrano LJ, Bazán GX, Freixer G, Vera-Rodriguez R, Fierrez J, Ortega-Garcia J. Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence. arXiv preprint arXiv:2409.08700. 2024 Sep 13 [Google Scholar]

PERMALINK

EntroLLM: Leveraging Entropy and Large Language Model Embeddings for Enhanced Risk Prediction with Wearable Device Data

Xueqing Huang, MS

Tian Gu, PhD

Abstract

Introduction