Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text

Tobore Igbe; Boris Kovatchev

doi:10.1177/19322968251323913

. 2025 Mar 20:19322968251323913. Online ahead of print. doi: 10.1177/19322968251323913

Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text

Tobore Igbe ^1,^✉, Boris Kovatchev ¹

PMCID: PMC11924066 PMID: 40109176

Abstract

Background:

The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research. This article presents a novel approach to encoding a CGM daily profile into a CGM string and CGM text that preserves clinical metrics information but compresses the data.

Methods:

Eight alphabets were defined to represent glucose ranges. The Akaike information criterion (AIC) was derived from error, and the compression ratio was estimated for each alphabet to determine the optimal alphabet for encoding the CGM daily profile. The analysis was done with data from six distinct studies, with different treatment modalities, applied to individuals with type 1 diabetes (T1D) or type 2 diabetes (T2D), and without diabetes. The data set was divided into 70% for training and 30% for validation.

Result:

The result from the training data reveals that a 9-letter alphabet was optimal for encoding daily CGM profiles for T1D or T2D, yielding the lowest AIC score that minimizes information loss. However, in health, fewer letters were needed, and this is to be expected, given the lower variation of the data. Further testing with the Pearson correlation showed that the 9-letter alphabet approximated the coefficient of variation, with correlations between 0.945 and 0.965.

Conclusion:

Encoding CGM data into text could enhance the classification of CGM profiles and enable the use of well-established search engines with CGM data. Other potential applications include predictive modeling, anomaly detection, indexing, trend analysis, or future generative artificial intelligence applications for diabetes research and clinical practice.

Keywords: CGM analytics, CGM string, CGM text, decision support system, pattern discovery

Introduction

The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research, such as advanced classification of diverse therapeutic approaches and patient cohorts,^1,2 innovative data-driven methodologies for predicting glucose dynamics,³ categorization of CGM patterns,^4,5 and various applications of machine learning techniques.^6,7 The T1D Exchange, in a quality improvement collaborative network of over 40 type 1 diabetes (T1D) clinics in the United States, recorded a remarkable increase in the use of CGM technology among patients with T1D surging from 59% in 2016 to 72% in 2021.⁸ A similar trend is observed globally, involving T1D and type 2 diabetes (T2D) populations.⁹

The CGM data are commonly presented to patients and health care professionals as an Ambulatory Glucose Profile (AGP) accompanied by aggregated glycemic metrics. The AGP representation displays the median glucose value, the 25th and 75th percentiles, the lower and upper deciles, and the percentage of time spent in various glucose ranges to summarize glucose data.^10,11 This method allows for easy comparison of glucose data over time and between individuals. Advanced time-series data representations have been proposed for analysis, such as singular value decomposition,¹² piecewise aggregate approximation,¹³ and symbolic aggregate approximation.¹⁴

Systematic representation of CGM traces has been considered to emphasize the description of the patterns corresponding to some numerical measures, or response to treatment; for example, to illustrate the effect of therapy observed via CGM 72-h glucose traces pre- and four weeks post-islet transplantation, the aggregated process is depicted by squares that are red for hypoglycemia, green for the target range, and yellow for hyperglycemia.^15,16 Similarly, the percentages of time spent within, below, or above pre-set target limits would indicate the general behavior of CGM fluctuations.¹⁷ Furthermore, single patterns have been used to describe CGM traces, eg, a data-driven approach to determine a finite set of representative daily profiles (motifs) such that almost any daily CGM profile generated by a patient can be matched to one of the motifs.^18,19 This paradigm was markedly different from traditional classification and monitoring methods based on standard glycemic metrics.^20,21

One way to effectively harness large amounts of CGM data is a data transformation aggregating CGM traces to a reduced alphabet-descriptive size while retaining important clinical characteristics and performing further analysis. Because text processing is very advanced, alphabet CGM representation is expected to facilitate data indexing and reduce search time for strings with particular glycemic meaning. Computation complexity would be reduced as well, by using text-based data structures and machine learning algorithms for personalized therapeutic strategies and refined diabetes management paradigms. Besides technological advantages, encoding CGM profiles holds significant potential for clinical care. Encoded CGM data into text-based formats enable a more standardized interpretation of glycemic patterns, facilitating informed clinical decision-making. Encoded CGM profiles can be seamlessly integrated into electronic health records and decision support systems, allowing health care providers to detect trends, such as excessive glycemic variability. Furthermore, this method supports efficient sharing of data across platforms, enhancing its utility for population-level studies, and longitudinal tracking of patient outcomes. Last but not least, text processing by artificial intelligence (AI) algorithms and large language models is very advanced; therefore, encoding CGM data into text opens possibilities for applying contemporary AI methods. In this context, it is particularly important to underscore that the major finding of this article—establishing an optimal length alphabet—is critical for all potential applications listed above.

Methods

In this article, we propose an approach for transforming a CGM trace into an alphabet representation using a glycemic hashing function visualized in Figure 1: The hashing function takes a CGM trace from a standardized 24-hour period, reflecting a full day of glucose monitoring and a categorical glycemic state, defined as a glucose range described with an alphabet, as input. It maps each value from the CGM trace to a letter in the alphabet to create a CGM-alphabet trace, hereby defined as CGM string. Consecutive occurrences of a similar letter in the CGM string are compressed to form a sequence of alphabet-number representation (where the number indicates the repetitions of the letter), hereby defined as CGM text. Figure 2 illustrates the connection between CGM trace, CGM string, and CGM text. The optimal alphabet for representing CGM trace is achieved by estimating the Akaike information criterion (AIC) from the error and the compression ratio. The encoded CGM string and corresponding CGM text can be utilized for advanced data analytics.

Figure 1. — A systematic approach for transforming CGM profiles into discrete alphabet representations involves using a glycemic hashing function. This function takes a CGM profile and a categorical glycemic state defined as a glucose range described with an alphabet as input and produces an encoded CGM trace. The encoded CGM trace is analyzed using information criteria (balancing error and compression ratio) to determine the optimal length of the alphabet needed to encode the CGM profile. Repeated patterns in the encoded CGM can be compressed into letters and numbers for data analytics.

Figure 2. — An example of a CGM string and CGM text evaluated from CGM Trace for a CGM profile. The CGM string is derived from CGM trace with a hash function, and The CGM text is derived from CGM string through serial compression.

Clinical Category Description

Furthermore, Figure 3 presents how a stratification of the glucose range corresponds to an alphabet. For example, a 4-letter alphabet corresponds to three glucose ranges: 70 to 180 mg/dL, glucose >180 mg/dL, and glucose <70 mg/dL, which are used to estimate time in range (TIR), time above range (TAR), and time below range (TBR) metrics, respectively.^10,11 Further splitting is done to the glucose <70 mg/dL range to create 55 to 70 mg/dL and glucose <55 mg/dL ranges, which are considered for estimating the severity of hypoglycemia in the AGP report. Similarly, the glucose >180 mg/dL range is divided into 180 to 250 mg/dL and glucose >250 mg/dL ranges, which is considered for estimating the extent of hyperglycemia. Reports from the World Health Organization (WHO) and the American Diabetes Association (ADA) suggest that glucose >140 indicates prediabetes or diabetes after 2 hours of oral glucose tolerance test (OGTT).^22,23 Similarly, fasting glucose >100 mg/dL denotes possible prediabetes or diabetes.²⁴ Precisely, fasting glucose between 100 and 126 mg/dL (approximately 100-130 mg/dL) indicates borderline (prediabetes).^24,25 Also, it is reported that a glucose >200 mg/dL after OGTT after 2 hours is diagnosed as diabetes.²⁵ The glucose >250 mg/dL range is split at 300 mg/dL (extreme hyperglycemia), which indicates glucose with significant hyperglycemia occurring during a clinical trial.¹⁷ Subsequent alphabets are derived based on these indications and reports on glucose measurements.

Therefore, the 6-letter alphabet representation further splits the 4-letter alphabet to accommodate different severities of hyperglycemia and hypoglycemia. In the 7-letter alphabet, the 70 to 180 mg/dL range from the 6-letter alphabet is divided between 70 to 140 mg/dL and 140 to 180 mg/dL. The 9-letter alphabet representation builds on the 7-letter alphabet, further dividing the 70 to 180 mg/dL range into 70 to 100 mg/dL, 100 to 130 mg/dL, 130 to 150 mg/dL, and 150 to 180 mg/dL. The 11-letter alphabet extends the 9-letter alphabet in the glucose >180 mg/dL with more letters to represent the following glucose ranges 180 to 200 mg/dL, 200 to 250 mg/dL, 250 to 300 mg/dL, and >300 mg/dL. Figure S1 (see Supplementary File) describes 13-letter, 19-letter, and 21-letter alphabets after further splitting the 11-letter alphabet. Each alphabet description has a letter to encode inherent missing values in the CGM trace.

Evaluating Glucose Categorical Values

A pair of values bound the letters in the alphabet for encoding the CGM profile: a lower bound, representing the minimum value ( $x_{m i n}$ ), and an upper bound, representing the maximum value ( $x_{m a x}$ ). The lower and upper boundary glucose values are translated from glucose space to risk space to establish a single value encapsulating each letter. This conversion is done using equations (1) and (2), which were formulated in 1997 by Kovatchev et al¹⁶ and later expounded upon in 2017 by Kovatchev.²⁰ The risk space constitutes a logarithmic transformation of the glucose values, harmonizing the extent of hypoglycemic and hyperglycemic ranges by amplifying the former and constricting the latter. Consequently, this transformation renders the data symmetrically distributed around zero, fitting a normal distribution.

V (x_{r}) = 1.509 [\ln {(x_{r})}^{1.026} - 5.381]

(1)

R (x_{r}) = 10 V {(x_{r})}^{2}

(2)

where $x_{r}$ is the glucose value in mg/dL and $R (x_{r})$ is the value in risk space for the corresponding glucose value. Therefore, to obtain a value, $C (x),$ that represents an alphabet, the sum of the mean is evaluated between the boundary in risk space with the equation below.

C (x) = \frac{\sum_{i = x_{m i n}}^{x_{m a x}} R (x_{i})}{x_{m a x} - x_{m i n}}

(3)

The risk score is translated to the equivalent glucose value to conform with regular measurements of glucose from the sensor.

Optimization Procedure

To identify the optimal alphabet appropriate to encode a CGM profile, we use an information criterion based on root mean square error (RMSE) and the compression ratio of the alphabet. Selecting an appropriate alphabet for encoding a CGM profile involves a trade-off between compression from the encoding and the corresponding resulting error. The AIC,²⁶ is computed to find the optimal alphabet. The AIC quantifies the quality of the encoding by considering its goodness of fit, which can be measured using the error metrics and the number of parameters (letters) defined in each alphabet, as described in.^27,28 Generally, the smaller the AIC value, the “better” is the alphabet in representing the CGM trace. The smallest AIC value is “best” in minimizing information loss.

In this assessment, the RMSE evaluates the difference between the alphabet-encoded CGM profile and the CGM trace representing the CGM profile. Given that the trace of a CGM profile is represented as $y (t) = {x : 40 \leq x \leq 400}$ and the alphabet is represented as $y (c) = {y : y \in {a l l l e t t e r s i n a n a l p h a b e t}}$ . The encoding of the CGM profile is achieved by a glycemic hashing function, $H_{y}$ , that maps the glucose value in the CGM trace to an alphabet to generate a CGM string, $y (e)$ as illustrated in Figure 2. Listing 1 describes run-length encoding (RLE) of CGM string to form the compressed CGM text

y (e) = {x : H_{y} [y (t) \to y (c)]}

(4)

Therefore, the RMSE, $R_{c}$ , for encoding a CGM profile with alphabets is described with the equation below

R_{c} = \sqrt{E_{c}}

(5)

and

E_{c} = \frac{\sum_{i}^{N} {[y (t_{i}) - y (e_{i})]}^{2}}{N}

(6)

where $y (t_{i})$ and $y (e_{i})$ are the glucose value from the CGM trace and glucose value from the CGM string after encoding, respectively, for a CGM profile with $N$ is the number of observations.

Listing 1: Procedure for run-length encoding for a continuous glucose monitoring string

Let $S = s_{1}, s_{2}, s_{3}, \dots, s_{N}$ be the input sequence of letters from an alphabet representing the CGM string after encoding the CGM profile, where $s_{i}$ represents the $i t h$ letter in the sequence.

The RLE algorithm outputs a compressed sequence, CGM text, $T = T_{1}, T_{2}, T_{3}, \dots, T_{m}$ , where each $T_{i}$ represents a pair ( $s, c o u n t$ ) indicating that the letter $s$ is repeated $c o u n t$ times consecutively.

The process of generating CGM text, $T$ , can be described as:

Initialize an empty CGM text = $T$ .
Start with $i = 1 .$
While $i$ is within the bounds of $N :$
1. Initialize $s = s_{i}$ and $c o u n t = 1$ .
2. Increment $i$ to $i + 1$ and $c o u n t$ to $c o u n t + 1$ as long as $s_{i} = s .$
3. Append $(s, c o u n t)$ to $T$ .
Repeat steps 3 until all alphabets in $S$ are processed.

Therefore, $T$ is represented as: $T = (s_{1}, c o u n t_{1}) (s_{2}, c o u n t_{2}) \dots (s_{m}, c o u n t_{m})$ , where $s_{j}$ represents the letters in $S$ , and $c o u n t_{j}$ represents the number of consecutive occurrences of the letter $s_{j}$ . The definition of compression ratio, $L_{c}$ , is described in terms of the RLE that constitutes CGM text, $| T |$ , and the number of observations in the CGM profile, $N$ as:

L_{c} = \frac{| T |}{N}

(7)

The AIC score, $I (e)$ , for encoding a CGM profile with an alphabet is given as follows:

I (e) = N l n (E_{c}) + 2 | y (L_{c}) | + b (e)

(8)

and

b (e) = \frac{2 | y (c) | [| y (c) | + 1]}{N - | y (c) | - 1}

(9)

where $| y (c) |$ is the length of the alphabet from encoding a CGM profile, $b (e)$ is the bias correction for $I (e)$ and it converges as $N$ increases. $I (e)$ is computed for each alphabet, and the optimal is determined by the smallest value of $I (e)$ .

To evaluate the optimal alphabet set and ensure that the clinical relevance is conserved, a Pearson correlation between the CGM trace and the CGM-alphabet encoded trace is computed. The following clinical metrics were considered: coefficient of variation (CV), mean blood glucose, low blood glucose index, high blood glucose index, and standard deviation (SD).

Results

Data

The data for the evaluation for the encoding analysis are derived from six studies. Detailed descriptions and demographic insights into these studies can be found in Montaser et al²⁹ and Lobo et al.¹⁹ The data include various treatment modalities tested in individuals with T1D or T2D. Table 1 provides an overview of the CGM profile record, encompassing factors such as mean, SD, and CV. The records are stratified into 70% for training and 30% for validation. The training data are used to find the optimal alphabet-encoding CGM profiles, and the validation data are used to estimate the correlation between clinical metrics derived from the original and the encoded CGM profiles. Figure 4 and Figure S2 (see Supplementary File) describe the glucose value for each alphabet after evaluation with equation (3).

Table 1.

Description of the Data for Finding and Validating Optimal Alphabet Set for Encoding CGM Profile.

Study	Training			Validation
Study	CGM profile count	Mean ± SD	Average CV	CGM profile count	Mean ± SD	Average CV
DCLP1	3879	163.3±32.1	0.318	1663	168.4±38.3	0.329
DCLP3	7000	159.4±29.0	0.305	3000	157.3±33.0	0.308
DIAMONDT1D	9170	178.1± 40.4	0.338	3930	176.1±39.6	0.331
DIAMONDT2D	5185	172.3±36.1	0.266	2223	163.0±31.3	0.266
DSS	3031	170.6±44.2	0.317	1299	165.6±40.0	0.324
HEALTHY	339	16.4±8.2	0.163	146	18.0±11.4	0.172

Open in a new tab

Figure 4. — Glucose values and the alphabets for the 4-letter, 6-letter, 7-letter, 9-letter, and 11-letter alphabets.

Encoding Continuous Glucose Monitoring Profile

The CGM profiles following the encoding process are illustrated in Figures 5a and 5b, which show 4-letter and 21-letter encoding, respectively. Within each illustration, the figure describes the unencoded CGM trace alongside its encoded representation. Detailed insights into the encoded CGM profiles with the various alphabets are presented in Figures S3–S10 (see Supplementary File).

Figure 5. — (a) Daily CGM profile encoding with 4-letter alphabets. (b) Daily CGM profile encoding with 21-letter alphabets. The top panel compares the profile of the CGM trace (black line) with the alphabet (red line). The bottom panel describes the CGM profile with letters from the alphabet to create a CGM string and the colour codes indicating continuous occurence of the letters.

The encoded traces unravel an interesting pattern. For instance, the CGM trace encoded with the 6-letter alphabet advances the 4-letter alphabet, particularly enhancing the encoding of glucose values above 180 mg/dL and below 70 mg/dL. Similarly, the 7-letter alphabet refines the glucose value 70 to 180 mg/dL representation. Increasing the alphabet to nine letters improves the encoding of glucose values 70 to 180 mg/dL even further. The 11-letter alphabet improves the glucose value above 180 mg/dL representation from the 9-letter alphabet, therefore cutting off high glucose values in the encoded trace. Subsequently, the 13-letter, 19-letter, and 21-letter alphabets enhance the encoding process with consistent improvement, resulting in traces that mirror the CGM profile as exemplified in Supplementary Figure S10.

Information Criteria Assessment for Finding Optimal Alphabet

The evaluation of RMSE is in Figures S11–S15 (see Supplementary File), and the compression ratio is described in Figures S16–S22 (see Supplementary File) for the alphabets encoding of the CGM profile.

The minimum AIC score identifies the optimal alphabet for encoding CGM profiles, achieving a balance between error and compression. The outcomes of this estimation are shown in Figure 6: for most data sets, a 9-letter alphabet with the lowest AIC score minimizes the information loss. The one exception is the “healthy” study presented in Figure 6f, where distinct dynamics unfold. Thus, the 9-letter alphabet is appropriate, as most of the illustrations in the figure show that it has the lowest AIC score.

Figure 6. — (a) Information criterion for the DCLP1 study. (b) Information criterion for the DCLP3 study. (c) Information criterion for the DiamondT1D study. (d) Information criterion for the DiamondT2D study. (e) Information criterion for the DSS1 study. (f) Information criterion for the healthy study.

Figures 7 and 8 describe the CV scatter plot and Pearson correlation value between the CGM profile and encoded CGM profile with 7-letter and 9-letter alphabets, respectively. Generally, the Pearson correlation value is higher with 9-letter alphabets compared to the 7-letter alphabets. In Figure 7, the minimum correlation value is 0.832 and the maximum correlation value is 0.893. In Figure 8, the minimum and maximum Pearson correlations are 0.945 and 0.965, respectively. Similarly, the other comparison results for other clinical metrics between 7-letter and 9-letter alphabets are described in Figures S23–S30 in the Supplementary File. The plots in the figures show that, regardless of the metrics considered, encoding with 9-letter alphabets has a comparatively higher correlation to the CGM profile than the 7-letter alphabets. Therefore, the 9-letter alphabets are considered the optimal alphabets for encoding the CGM profile.

Figure 7. — Scatter plot of coefficient of variation showing the correlation between 7-letter alphabets encoding and CGM profile for (a) DCLP1 study, (b) DCLP3 study, (c) DiamondT1D study, (d) DiamondT2D study, (e) DSS1 study, and (f) healthy study.

Figure 8. — Scatter plot of coefficient of variation showing the correlation between 9-letter alphabets encoding and CGM profile for (a) DCLP1 study, (b) DCLP3 study, (c) DiamondT1D study, (d) DiamondT2D study, (e) DSS1 study, and (f) healthy study.

Discussion

This article derives an optimal alphabet with applications to encoding CGM traces. This approach is aimed at optimizing the text encoding of CGM profiles through balancing error reduction and compression efficiency. Eight alphabets based on clinical ranges were investigated for optimality. The analysis included data from six distinct studies representing healthy individuals and diverse treatment modalities used by individuals with T1D or T2D. In all studies, except in health, a 9-letter alphabet yielded optimal text encoding of the CGM data. In health, this was achieved by the 7-letter alphabet, which corresponds to the more stable CGM profiles in healthy individuals.

One of the primary applications of this approach is optimal data compression and text representation of CGM profiles. Text representation of CGM data enhances interoperability and facilitates data exchange among different health care systems and devices. When CGM profiles are represented uniformly across different systems and devices, the seamless integration of data from various sources becomes feasible, promoting collaboration and consistency of CGM profile status. Furthermore, by compressing and structuring the data, our approach enables the integration into DSS frameworks, where these structured text-based representations can be used as inputs for automated clinical decision-making tools. For instance, this method could support the identification of glycemic excursions and trigger personalized feedback for lifestyle modifications. Furthermore, pattern discovery can be enhanced by leveraging encoded CGM data in machine learning algorithms, enabling the detection of long-term trends such as glycemic variability, forecast analysis, or response to specific interventions and events.

In this article, the AIC guides the selection of an optimal alphabet for encoding CGM traces. Identifying the alphabet that balances the reduction of error and efficient data compression is crucial for encoding the CGM profile for simplified trend description and data-driven decision-making. We show that any CGM trace can be encoded as a combination of 9 letters, preserving CGM trends while minimizing error. Our approach maintains consistency with the AGP and accepted glucose metrics, such as TIR, TBR, and TAR.

The findings of our study confirm the effectiveness of alphabet-encoding techniques in approximating various clinical metrics derived from CGM traces. As indicated by the Pearson correlation coefficients, our analysis demonstrates strong associations between the alphabet-encoded CGM traces and key glycemic variability and control parameters. Specifically, the observed Pearson correlation coefficients of ≥0.95 for CV, 0.99 for mean glucose, 0.92 for standard deviation, 0.99 for high blood glucose index, and 0.98 for low blood glucose index underscore the robustness of the alphabet-encoding approach in capturing essential features of CGM data. These findings suggest that symbolic representations offer a promising avenue for simplifying and interpreting complex glucose dynamics, facilitating more precise assessments of CGM profiles. For instance, the symbolic representation simplifies the identification of TIR metrics, glycemic excursions, or variability, allowing for quicker assessment of patient status. In addition, the encoded data can facilitate comparative analyses across patients or populations and support predictive modeling in clinical decision-making.

Conclusion

This article identifies the optimal alphabet for encoding CGM time series, driven by balancing error minimization and compression efficiency. The results indicate that a 9-letter alphabet was optimal for encoding CGM traces. Further analysis revealed that the 9-letter alphabet achieved ≥0.95 Pearson correlation with clinical metrics. This work enables CGM profiles to be encoded into text and used in various applications that can benefit from text-processing methods, such as fast searches, anomaly detection, indexing, and pattern analysis.

Supplemental Material

sj-docx-1-dst-10.1177_19322968251323913 – Supplemental material for Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text

sj-docx-1-dst-10.1177_19322968251323913.docx^{(2.7MB, docx)}

Supplemental material, sj-docx-1-dst-10.1177_19322968251323913 for Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text by Tobore Igbe and Boris Kovatchev in Journal of Diabetes Science and Technology

Acknowledgments

This study was conducted under a research agreement between UVA and Dexcom.

Footnotes

Abbreviations: ADA, American Diabetes Association; CGM, continuous glucose monitoring; AGP, ambulatory glucose profile; TIR, time in range; TAR, time above range; TBR, time below range; OGTT, oral glucose tolerance test; RMSE, root mean square error; RLE, run-length encoding; AIC, Akaike Information Criterion; CV, coefficient of variation; SD, standard deviation; T1D, type 1 diabetes; T2D, type 2 diabetes; WHO, World Health Organization.

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Research support from Dexcom, Inc handled by the University of Virginia. Patent royalties from Dexcom, Inc are handled by the UVA Licensing and Ventures Group.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dexcom-UVA master research agreement.

ORCID iDs: Tobore Igbe Inline graphic https://orcid.org/0000-0002-8216-9576

Boris Kovatchev Inline graphic https://orcid.org/0000-0003-0495-3901

Supplemental Material: Supplemental material for this article is available online.

References

1. Kahkoska AR, Adair LA, Aiello AE, et al. Identification of clinically relevant dysglycemia phenotypes based on continuous glucose monitoring data from youth with type 1 diabetes and elevated hemoglobin A1c. Pediatr Diabetes. 2019;20(5):556-566. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Augstein P, Heinke P, Vogt L, et al. Q-Score: development of a new metric for continuous glucose monitoring that enables stratification of antihyperglycaemic therapies. BMC Endocrine Disorders. 2015;15:1-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Woldaregay AZ, Årsand E, Walderhaug S, et al. Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes. Artif Intell Med. 2019;98:109-134. [DOI] [PubMed] [Google Scholar]
4. Merino J, Linenberg I, Bermingham KM, et al. Validity of continuous glucose monitoring for categorizing glycemic responses to diet: implications for use in personalized nutrition. Am J Clin Nutr. 2022;115(6):1569-1576. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Shah VN, DuBose SN, Li Z, et al. Continuous glucose monitoring profiles in healthy nondiabetic participants: a multicenter prospective study. J Clin Endocrinol Metab. 2019;104(10):4356-4364. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Xie J, Wang Q. Benchmarking machine learning algorithms on blood glucose prediction for type I diabetes in comparison with classical time-series models. IEEE Trans Biomed Eng. 2020;67(11):3101-3124. [DOI] [PubMed] [Google Scholar]
7. Munoz-Organero M. Deep physiological model for blood glucose prediction in T1DM patients. Sensors. 2020;20(14):3896. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Ebekozien O, Mungmode A, Miyazaki B, et al. 911-P: longitudinal trends in CGM and pump use: real-world data from the T1D exchange QI collaborative. Diabetes. 2022;71(suppl 1):911-P. [Google Scholar]
9. Pathak S, Kearin K, Kahkoska AR, et al. Impact of expanding access to continuous glucose monitoring systems among insulin users with type 1 or type 2 diabetes. Diabetes Technol Ther. 2023;25(3):169-177. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Bergenstal RM, Ahmann AJ, Bailey T, et al. Recommendations for standardizing glucose reporting and analysis to optimize clinical decision making in diabetes: the Ambulatory Glucose Profile (AGP). Diabetes Technol Ther. 2013;15(3):198-211. [DOI] [PubMed] [Google Scholar]
11. Battelino T, Danne T, Bergenstal RM, et al. Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range. Diabetes Care. 2019;42(8):1593-1603. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Epps BP, Krivitzky EM. Singular value decomposition of noisy data: noise filtering. Experiments in Fluids. 2019;60:1-23. [Google Scholar]
13. Fotso VS, Nguifo EM, Vaslin P. Grasp heuristic for time series compression with piecewise aggregate approximation. RAIRO Oper Res. 2019;53(1):243-259. [Google Scholar]
14. Wilson SJ. Data representation for time series data mining: time domain approaches. Wiley Interdiscip Rev Comput. Stat. 2017;9(1):e1392. [Google Scholar]
15. Kovatchev BP, Clarke WL, Breton M, Brayman K, McCall A. Quantifying temporal glucose variability in diabetes via continuous glucose monitoring: mathematical methods and clinical application. Diabetes Technol Ther. 2005;7(6):849-862. [DOI] [PubMed] [Google Scholar]
16. Kovatchev BP, Cox DJ, Gonder-Frederick LA, Clarke W. Symmetrization of the blood glucose measurement scale and its applications. Diabetes Care. 1997;20(11):1655-1658. [DOI] [PubMed] [Google Scholar]
17. Clarke W, Kovatchev B. Statistical tools to analyze continuous glucose monitor data. Diabetes Technol Ther. 2009;11(suppl 1):S45-S54. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Lobo B, Farhy L, Shafiei M, Kovatchev B. A data-driven approach to classifying daily continuous glucose monitoring (CGM) time series. IEEE Trans Biomed Eng. 2021;69(2):654-665. [DOI] [PubMed] [Google Scholar]
19. Lobo BJ, Kovatchev BP. External validation of a classifier of daily continuous glucose monitoring (CGM) profiles. Comput Biol Med. 2022;143:105293. [DOI] [PubMed] [Google Scholar]
20. Kovatchev BP. Metrics for glycaemic control—from HbA1c to continuous glucose monitoring. Nat Rev Endocrinol. 2017;13(7):425-436. [DOI] [PubMed] [Google Scholar]
21. Herrero P, Reddy M, Georgiou P, Oliver NS. Identifying continuous glucose monitoring data using machine learning. Diabetes Technol Ther. 2022;24(6):403-408. [DOI] [PubMed] [Google Scholar]
22. Coustan DR, Dyer AR, Metzger BE. One-step or 2-step testing for gestational diabetes: which is better? Am J Obstet Gynecol. 2021;225(6):634-644. [DOI] [PubMed] [Google Scholar]
23. American Diabetes Association Professional Practice Committee, American Diabetes Association Professional Practice Committee. 16. Diabetes care in the hospital: Standards of Medical Care in Diabetes—2022. Diabetes Care. 2022;45(suppl1):S244-S253. [DOI] [PubMed] [Google Scholar]
24. American Diabetes Association. “Statistics about diabetes.” Published 2023. Accessed February 26, 2025. https://diabetes.org/about-us/statistics/about-diabetes.
25. Paddock E, Looker HC, Piaggi P, Knowler WC, Krakoff J, Chang DC. One-hour plasma glucose compared with two-hour plasma glucose in relation to diabetic retinopathy in American Indians. Diabetes Care. 2018;41(6):1212-1217. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716-723. [Google Scholar]
27. Panchal G, Ganatra A, Kosta YP, Panchal D. Searching most efficient neural network architecture using Akaike’s information criterion (AIC). Int J Comput Appl. 2010;1(5):41-44. [Google Scholar]
28. Burnham KP, Anderson DR, Huyvaert KP. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav Ecol Sociobiol. 2011;65:23-35. [Google Scholar]
29. Montaser E, Fabris C, Kovatchev B. Essential continuous glucose monitoring metrics: the principal dimensions of glycemic control in diabetes. Diabetes Technol Ther. 2022;24(11):797-804. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-docx-1-dst-10.1177_19322968251323913 – Supplemental material for Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text

sj-docx-1-dst-10.1177_19322968251323913.docx^{(2.7MB, docx)}

[bibr1-19322968251323913] 1. Kahkoska AR, Adair LA, Aiello AE, et al. Identification of clinically relevant dysglycemia phenotypes based on continuous glucose monitoring data from youth with type 1 diabetes and elevated hemoglobin A1c. Pediatr Diabetes. 2019;20(5):556-566. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr2-19322968251323913] 2. Augstein P, Heinke P, Vogt L, et al. Q-Score: development of a new metric for continuous glucose monitoring that enables stratification of antihyperglycaemic therapies. BMC Endocrine Disorders. 2015;15:1-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr3-19322968251323913] 3. Woldaregay AZ, Årsand E, Walderhaug S, et al. Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes. Artif Intell Med. 2019;98:109-134. [DOI] [PubMed] [Google Scholar]

[bibr4-19322968251323913] 4. Merino J, Linenberg I, Bermingham KM, et al. Validity of continuous glucose monitoring for categorizing glycemic responses to diet: implications for use in personalized nutrition. Am J Clin Nutr. 2022;115(6):1569-1576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr5-19322968251323913] 5. Shah VN, DuBose SN, Li Z, et al. Continuous glucose monitoring profiles in healthy nondiabetic participants: a multicenter prospective study. J Clin Endocrinol Metab. 2019;104(10):4356-4364. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr6-19322968251323913] 6. Xie J, Wang Q. Benchmarking machine learning algorithms on blood glucose prediction for type I diabetes in comparison with classical time-series models. IEEE Trans Biomed Eng. 2020;67(11):3101-3124. [DOI] [PubMed] [Google Scholar]

[bibr7-19322968251323913] 7. Munoz-Organero M. Deep physiological model for blood glucose prediction in T1DM patients. Sensors. 2020;20(14):3896. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr8-19322968251323913] 8. Ebekozien O, Mungmode A, Miyazaki B, et al. 911-P: longitudinal trends in CGM and pump use: real-world data from the T1D exchange QI collaborative. Diabetes. 2022;71(suppl 1):911-P. [Google Scholar]

[bibr9-19322968251323913] 9. Pathak S, Kearin K, Kahkoska AR, et al. Impact of expanding access to continuous glucose monitoring systems among insulin users with type 1 or type 2 diabetes. Diabetes Technol Ther. 2023;25(3):169-177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr10-19322968251323913] 10. Bergenstal RM, Ahmann AJ, Bailey T, et al. Recommendations for standardizing glucose reporting and analysis to optimize clinical decision making in diabetes: the Ambulatory Glucose Profile (AGP). Diabetes Technol Ther. 2013;15(3):198-211. [DOI] [PubMed] [Google Scholar]

[bibr11-19322968251323913] 11. Battelino T, Danne T, Bergenstal RM, et al. Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range. Diabetes Care. 2019;42(8):1593-1603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr12-19322968251323913] 12. Epps BP, Krivitzky EM. Singular value decomposition of noisy data: noise filtering. Experiments in Fluids. 2019;60:1-23. [Google Scholar]

[bibr13-19322968251323913] 13. Fotso VS, Nguifo EM, Vaslin P. Grasp heuristic for time series compression with piecewise aggregate approximation. RAIRO Oper Res. 2019;53(1):243-259. [Google Scholar]

[bibr14-19322968251323913] 14. Wilson SJ. Data representation for time series data mining: time domain approaches. Wiley Interdiscip Rev Comput. Stat. 2017;9(1):e1392. [Google Scholar]

[bibr15-19322968251323913] 15. Kovatchev BP, Clarke WL, Breton M, Brayman K, McCall A. Quantifying temporal glucose variability in diabetes via continuous glucose monitoring: mathematical methods and clinical application. Diabetes Technol Ther. 2005;7(6):849-862. [DOI] [PubMed] [Google Scholar]

[bibr16-19322968251323913] 16. Kovatchev BP, Cox DJ, Gonder-Frederick LA, Clarke W. Symmetrization of the blood glucose measurement scale and its applications. Diabetes Care. 1997;20(11):1655-1658. [DOI] [PubMed] [Google Scholar]

[bibr17-19322968251323913] 17. Clarke W, Kovatchev B. Statistical tools to analyze continuous glucose monitor data. Diabetes Technol Ther. 2009;11(suppl 1):S45-S54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr18-19322968251323913] 18. Lobo B, Farhy L, Shafiei M, Kovatchev B. A data-driven approach to classifying daily continuous glucose monitoring (CGM) time series. IEEE Trans Biomed Eng. 2021;69(2):654-665. [DOI] [PubMed] [Google Scholar]

[bibr19-19322968251323913] 19. Lobo BJ, Kovatchev BP. External validation of a classifier of daily continuous glucose monitoring (CGM) profiles. Comput Biol Med. 2022;143:105293. [DOI] [PubMed] [Google Scholar]

[bibr20-19322968251323913] 20. Kovatchev BP. Metrics for glycaemic control—from HbA1c to continuous glucose monitoring. Nat Rev Endocrinol. 2017;13(7):425-436. [DOI] [PubMed] [Google Scholar]

[bibr21-19322968251323913] 21. Herrero P, Reddy M, Georgiou P, Oliver NS. Identifying continuous glucose monitoring data using machine learning. Diabetes Technol Ther. 2022;24(6):403-408. [DOI] [PubMed] [Google Scholar]

[bibr22-19322968251323913] 22. Coustan DR, Dyer AR, Metzger BE. One-step or 2-step testing for gestational diabetes: which is better? Am J Obstet Gynecol. 2021;225(6):634-644. [DOI] [PubMed] [Google Scholar]

[bibr23-19322968251323913] 23. American Diabetes Association Professional Practice Committee, American Diabetes Association Professional Practice Committee. 16. Diabetes care in the hospital: Standards of Medical Care in Diabetes—2022. Diabetes Care. 2022;45(suppl1):S244-S253. [DOI] [PubMed] [Google Scholar]

[bibr24-19322968251323913] 24. American Diabetes Association. “Statistics about diabetes.” Published 2023. Accessed February 26, 2025. https://diabetes.org/about-us/statistics/about-diabetes.

[bibr25-19322968251323913] 25. Paddock E, Looker HC, Piaggi P, Knowler WC, Krakoff J, Chang DC. One-hour plasma glucose compared with two-hour plasma glucose in relation to diabetic retinopathy in American Indians. Diabetes Care. 2018;41(6):1212-1217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr26-19322968251323913] 26. Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716-723. [Google Scholar]

[bibr27-19322968251323913] 27. Panchal G, Ganatra A, Kosta YP, Panchal D. Searching most efficient neural network architecture using Akaike’s information criterion (AIC). Int J Comput Appl. 2010;1(5):41-44. [Google Scholar]

[bibr28-19322968251323913] 28. Burnham KP, Anderson DR, Huyvaert KP. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav Ecol Sociobiol. 2011;65:23-35. [Google Scholar]

[bibr29-19322968251323913] 29. Montaser E, Fabris C, Kovatchev B. Essential continuous glucose monitoring metrics: the principal dimensions of glycemic control in diabetes. Diabetes Technol Ther. 2022;24(11):797-804. [DOI] [PubMed] [Google Scholar]

PERMALINK

Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text

Tobore Igbe, PhD

Boris Kovatchev, PhD

Abstract

Background:

Methods:

Result:

Conclusion:

Introduction

Methods

Figure 1.

Figure 2.

Clinical Category Description

Figure 3.

Evaluating Glucose Categorical Values

Optimization Procedure

Listing 1: Procedure for run-length encoding for a continuous glucose monitoring string

Results

Data

Table 1.

Figure 4.

Encoding Continuous Glucose Monitoring Profile

Figure 5.

Information Criteria Assessment for Finding Optimal Alphabet

Figure 6.

Figure 7.

Figure 8.

Discussion

Conclusion

Supplemental Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases