CGMacros: a pilot scientific dataset for personalized nutrition and diet monitoring

Anurag Das; David Kerr; Namino Glantz; Wendy Bevier; Rony Santiago; Ricardo Gutierrez-Osuna; Bobak J Mortazavi

doi:10.1038/s41597-025-05851-7

. 2025 Sep 25;12:1557. doi: 10.1038/s41597-025-05851-7

CGMacros: a pilot scientific dataset for personalized nutrition and diet monitoring

Anurag Das ¹, David Kerr ², Namino Glantz ³, Wendy Bevier ⁴, Rony Santiago ⁴, Ricardo Gutierrez-Osuna ^1,^✉, Bobak J Mortazavi ¹

PMCID: PMC12462512 PMID: 40998842

Abstract

Tracking food intake is key to using nutrition to prevent or manage common diseases including type 2 diabetes (T2D) and obesity. Several datasets are publicly available to promote research in diet monitoring, but generally contain data from a limited set of sensors (e.g., accelerometry, food images), which limits their application to specific use cases such as activity recognition or image recognition. Also lacking are publicly available datasets with food macronutrients and their associated continuous glucose measurements; datasets containing such rich information are proprietary. To address this gap, we present CGMacros, a dataset containing multimodal information from an activity tracker, two continuous glucose monitors (CGM), food macronutrients, and food photographs, as well as anonymized participant demographics, anthropometric measurements and health parameters from blood analyses and gut microbiome profiles. CGMacros contains data for 45 participants (15 healthy, 16 pre-diabetes, 14 T2D) who consumed meals with varying and known macronutrient compositions in a free-living setting for ten consecutive days. To our knowledge, this is the first database of its kind to be made publicly available. CGMacros, and larger publicly available datasets that we hope may follow, are essential to democratize academic research in personalized nutrition and algorithmic approaches to automated diet monitoring.

Subject terms: Metabolic disorders, Biomedical engineering

Background and related datasets

Poor dietary habits are a major contributor to the development of chronic diseases such as type 2 diabetes, obesity, heart disease, and some cancers. A recent survey examining food consumption across 195 countries estimated that improving diet can prevent one of every five deaths worldwide¹. Therefore, monitoring food intake is an important step towards maintaining a healthy diet and preventing chronic diseases later in life. Current methods for diet monitoring are based on self-report measures (e.g., food journals and mobile apps). Such methods allow users to track their eating habits, adhere to a diet plan, and reflect on situations when they stray away from their plans. However, these methods often require prolonged manual entry, which is cumbersome and can lead to errors, or barcodes, which may promote consumption of processed foods².

The advent (and increased prevalence) of wearable sensors has led to the development of automated methods to detect and recognize moments of food intake. These sensors include smartwatches and smart utensils with embedded accelerometers that track hand-to-mouth gestures as a proxy for detecting eating instances^3,4. Datasets based on these technologies have been publicly released to advance the field of nutrition monitoring. As an example, the Food Intake Cycle (FIC) dataset⁵ contains accelerometer data with annotated start and end meal instances from 21 meal sessions and 12 unique subjects in a cafeteria setting. The original dataset has been extended to include information from participants eating in free living conditions⁶. Additional sensing modalities, such as audio recordings can also be used to detect sounds associated with food intake, such as chewing and swallowing. This type of data is also publicly available. As an example, the audio-based calorie estimation (ACE) dataset⁷ contains accelerometry and audio recordings from seven participants wearing sensors on their head and wrist, and annotations of the amount and type of food and drinks from video feed recorded using a Google Glass. The Clemson Cafeteria dataset⁸ contains data from multiple sensors (wrist motion, a scale embedded on a food tray, and synchronized video cameras recordings from overhead cameras and chest-worn cameras) on a larger set of 264 participants. Similarly, the OREBA dataset⁹ contains inertial measurement unit (IMU) readings (accelerometer and gyroscope) for both hands and synchronized frontal videos of 100 participants consuming a discrete dish and 102 participants sharing a dish, with a total of 9,069 intake gestures. Given the pervasiveness of accelerometers in wearable devices, these types of datasets can support the development of technological approaches to nutrition monitoring at scale, but inertial measurements are restricted to detecting eating moments.

The widespread use of smartphones has also made it possible to log food choices by simply taking a photograph, which opens a broad range of possibilities in free-living conditions -when compared to worn or ambient cameras in controlled environments, which may raise privacy concerns. Modern machine/deep learning techniques can be used to analyze food photographs. Several datasets have been publicly released for this purpose, such as UECF Food 100¹⁰, FoodX-251, and Food2K¹¹, which contain thousands of images with diverse food items and the corresponding nutrition labels. However, extracting accurate estimates of food macronutrients and caloric content is challenging given the variety of ways in which foods can be prepared (e.g., oil amounts, fat content). To address this gap, the Recipe1M¹² dataset contains one million structured cooking recipes and their corresponding images and has been used extensively for food recognition. The closely-related Nutrition5k¹³ dataset also contains annotated nutritional information of over 5000 diverse, real-world food dishes along with the food photographs.

In prior work^14,15, we have shown that CGMs may be used to estimate food macronutrients, avoiding the need for manual annotation or food photography, though the latter can provide complementary information to glucose measurements¹⁶. Since postprandial glucose responses (PPGR) depend not only on carbohydrate content of a meal but also the amounts of protein and fat, analyzing the shape of a PPGR with machine learning (ML) models can provide estimates of a meal’s macronutrient content. At present, however, CGM datasets with macronutrient information are very limited, restricting their use to prediction of hypo/hyperglycemia events in patients with type 1 diabetes (T1D). Several datasets are publicly available for this purpose, including the OhioT1DM¹⁷, D1NAMO¹⁸, and T1DiabetesGranada¹⁹ datasets. The remaining datasets that we are aware of (ARISES, ABC4D) are not publicly available²⁰. Only recently, a dataset of 102 Chinese participants (12 with T1D, 100 with T2D) patients was publicly released²¹. The dataset contains 24-hour glucose data, food names and their quantity (e.g., boiled egg 40g, boiled vegetable 116g) but not the macronutrient composition of those foods.

To our knowledge, our proposed CGMacros dataset²² is the first to include CGM recordings with meal macronutrients, food photographs, physical activity (accelerometry data), and health and demographic parameters. Table 1 summarizes this information for CGMacros and other publicly available datasets.

Table 1.

Summary of CGMacros²² and other existing publicly available datasets for diet monitoring, according to population health, availability of CGM recordings, food macronutrients, food photographs, physical activity, fasting insulin, mocriobiota, and experimental setting.

Dataset	Population	CGM	Macros	Images	Activity	Insulin	Microbiome	Setting
ACE⁷	NA	No	Yes	No	No	No	No	Controlled
OREBA⁹	NA	No	No	No	No	No	No	Controlled
UECF Food 100¹⁰	NA	No	No	Yes	No	No	No	Ambulatory
Nutrition5k¹³	NA	No	Yes	Yes	No	No	No	Ambulatory
OhioT1DM¹⁷	T1DM	Yes	No	No	No	Yes	No	Ambulatory
D1NAMO¹⁸	T1DM	Yes	No	No	No	Yes	No	Ambulatory
Zhao et al.²¹	T1/T2DM	Yes	Yes	No	No	Yes	No	Ambulatory
CGMacros²²	T2DM*	Yes	Yes	Yes	Yes	Yes	Yes	Ambulatory

Open in a new tab

* Includes healthy adults and adults with pre-diabetes.

Methods

Participants for the study were recruited at Sansum Diabetes Research Institute (SDRI), in Santa Barbara, CA. On day 1 of the study, potential participants cleared an initial screening and signed a consent form (Advarra IRB Pro00049227; ClinicalTrials.gov NCT04991142). As part of the screening process, we measured the participant’s body mass index (BMI), glycated hemoglobin (HbA1c), fasting glucose, fasting insulin, triglycerides, and cholesterol levels. At this time, we also recorded their demographic information (age, gender, and race). Exclusion criteria for subjects with type 2 diabetes (T2D) was being treated with oral medicines (other than Metformin) or any injectable GLP-1 receptor agonist or insulin. After the initial screening, an Abbott FreeStyle Libre Pro CGM (15-min sampling period) and a Dexcom G6 Pro CGM (5-min) were placed on the participant’s upper arm and abdomen, respectively. Both CGMs were blinded to prevent glucose readings from influencing participants. Participants were also provided with a Fitbit smartwatch (Fitbit Sense) to log exercise, and were trained to use the MyFitnessPal mobile app to log their meals and take pictures of their foods using the WhatsApp mobile app.

Forty-five participants completed the study, ages 18-69, and body mass index (BMI) 21-46 kgm². Table 2 summarizes the demographic information of study participants. All participants were recruited between 2021 and 2024. Out of 45 participants, 15 had no pre-existing diabetes (HbA1c < 5.7%), 16 had pre-diabetes (5.7≤HbA1c≤6.4%), and 14 had type 2 diabetes (T2D) (HbA1c > 6.4%).

Table 2.

Summary of demographic information of all patients in the CGMacros dataset²².

Characteristics	Measurement
Age (years)	48.11 ± 12.70
Self-reported gender (male/female)	16/29
BMI (kg/m²)	31.15 ± 6.65
Race (White/Hispanic or Latino/African American)	7/34/4
HbA1c (%)	6.16 ± 0.91
Healthy/Pre-diabetes/Type 2 diabetes	15/16/14
Fasting glucose (mg/dL)	120.69 ± 30.23
Fasting insulin	14.43 ± 8.16

Open in a new tab

Data are presented as mean ± standard deviation (SD) or number of subjects in the group. BMI: body mass index; HbA1c: glycated hemoglobin, a measure that correlates with the average glucose levels over the previous 2-3 months.

Each subject recorded their meals for 10 days, including breakfast, lunch and dinner. Breakfasts consisted of protein shakes with varying amounts of carbohydrates, protein, fat, and fiber. Lunches were ordered from a local, fast-casual restaurant chain (Chipotle Mexican Grill). The breakfast and lunch meals were designed to cover a range of macronutrient contents -see Tables 3 and 4. For dinners, participants ate foods of their own choice. To minimize interferences in glucose responses from prior meals, participants were instructed to eat lunch at least 3 hours after breakfast, with only water or coffee (without sugar) in between, and dinner at least 3 hours after lunch. They also took photographs of the meals before and after eating, from which we extracted the meal timestamps and the proportion of the meal they consumed. Stool samples were collected at the start of the study and analyzed using a Viome microbiome kit (Viome Life Sciences, Inc.).

Table 3.

Macronutrient composition of breakfast shakes.

Study day	Meal	Carb (g)	Protein (g)	Fat (g)	Fiber (g)
1	HLLL	66	22	11	00
2	HHLL	66	66	11	00
3	HLHL	66	22	42	00
4	HHHH	73	66	42	07
5	LLLL	24	22	11	00
6	HLLL	66	22	11	00
7	HHLL	66	66	11	00
8	HLHL	66	22	42	00
9	LLLL	24	22	11	00
10	HLHH	66	22	42	07

Open in a new tab

Meals are coded as having high (H) or low (L) macronutrient composition based upon US average daily intake of carbs, proteins, fat, and fiber, respectively³⁵. For example, HLLL denotes a meal high in carbohydrates and low in protein, fat and fiber, whereas HHLL denotes a meal high in carbohydrates and protein, but low in fat and fiber.

Table 4.

Macronutrient composition of lunch meals.

Study day	Meal	Carbs (g)	Protein (g)	Fat (g)	Fiber (g)
1	HHHH	81	88	54.5	18
2	HLHL	92	17	42	10
3	LHLL	16	66	14	4
4	HLLL	94	12	13	5
5	LLLL	19	32	15	5
6	HHHL	93	84	44	4
7	HLLH	76	22	18.5	11
8	LHLH	40	76	17	13
9	LLLH	43	20	20	13
10	HHLL	94	44	20	4

Open in a new tab

Meals are coded as having high (H) or low (L) macronutrient composition based upon US average daily intake of carbs, proteins, fat, and fiber³⁵.

To illustrate the type of postprandial glucose responses (PPGRs) to meals, Fig. 1 shows the glucose profile of the two CGM devices along with meal and exercise information in metabolic equivalent of tasks (METs) over a 24-hour period for one participant. Red markings atop the CGM curve denote times at which meals started and ended. Also shown are food photographs that the participant sent via WhatsApp. The timestamp of those photographs was extracted from WhatsApp as well. To compute METs we used measurements from the Fitbit device provided on a minute-by-minute basis and applied a mean filter with a window size of 20 minutes.

Data Records

All data records in the dataset are accessible on figshare²². The dataset consists of 45 main CSV files (CGMacros-#.csv), where # denotes participant number, and three supplementary CSV files. The main files contain CGM and fitness tracker readings, one row per measurement (plus a heading), and one column per variable. Since the Abbott and Dexcom CGM devices have different sampling rates, we used linear interpolation to obtain a uniform sampling rate of one minute. For the Abbott FreeStyle Libre Pro, we took the two consecutive CGM readings separated by 15 minutes and linearly interpolated between them to obtain CGM readings every minute. Similarly, for the Dexcom G6 Pro, we linearly interpolated between consecutive readings separated by 5 minutes. We also report heart rate and MET from Fitbit at one-minute intervals. At the appropriate time stamp (i.e., row) from WhatsApp photographs, we also report the total caloric content and carbohydrate, protein, fat, and fiber amounts of each meal, the type of meal (breakfast, lunch, dinner) and a path to the file containing the corresponding photograph. Data from each participant spans approximately ten days.

The supplementary spreadsheet (bio.csv) contains demographics (age, gender, ethnicity), anthropometric measurements (height, weight, BMI), blood analytics (HbA1c, fasting glucose, insulin, triglyceride, cholesterol, high-density lipoprotein (HDL), non-HDL, low-density lipoprotein (LDL), very low-density lipoprotein (VLDL) levels, three finger stick glucose measurements and microbiome profile, all taken on the first day along for each study participant, with the corresponding date and time stamp. For each participant, Viome provides two reports, the first one listing all the bacteria that are present in the stool sample, and the second one providing digestive health scores and recommendations that Viome generates (recommendations are not included in this dataset). We combined the Viome reports of bacteria of the 45 participants and generated an indicator variable as a separate column for each of 1,979 bacteria, denoting whether it was present (1) or absent (0) in the corresponding Viome report. From the report of 22 gut health scores generated by Viome, we developed an ordinal variable for each of the tests coded as Good, Average, or Not Optimal. These scores are Viome’s estimate of gut health based upon the bacteria identified and include their estimates of overall gut health²³. Examples of such scores include an overall Gut Health test, Metabolic Fitness, Inflammatory Activity, Digestive Efficiency, Gut Active Microbial Diversity, and summaries of present or non-present bacteria from the first report.

Technical Validation

To establish the validity of the CGMacros dataset²², we report initial analyses of average glucose readings for each of the two CGM devices, stratified by metabolic status (healthy, pre-diabetes, T2D). We also provide an analysis of the time in range (TIR) per group and CGM device at two hyperglycemic thresholds (180 mg/dL, 250 mg/dL). Finally, we predict the 2-hr incremental area-under-the curve (iAUC) and absolute area under the curve (AUC) of the breakfast shakes from features derived from CGM, blood parameters, anthropometrics, and macronutrients and rank them in order of importance.

Figure 2 summarizes the average glucose level across the ten study days, grouped by metabolic status. As expected, we observed a clear increase in average glucose levels when comparing healthy adults, and those with pre-diabetes and T2D -for both CGM devices. The average glucose from the Abbott Freestyle device was 84.89 mg/dL for healthy adults, 105.2 mg/dL (p < 0.01) for pre-diabetes, and 138.04 mg/dL (p < 0.01) for T2D. The average glucose from the Dexcom G6 device was 122.36 mg/dL for healthy adults, 135.1 mg/dL (p < 0.05) for pre-diabetes, and 165.7 mg/dL (p < 0.01) for T2D. We also found significant differences between the two CGM devices for each group: healthy (p < 0.001), pre-diabetes (p < 0.01), and T2D (p < 0.05), with Dexcom G6 glucose readings being higher by up to 58.7 mg/dl (for healthy adults). Inconsistencies between earlier generations of the two devices have been reported in the literature²⁴, and may be related to the anatomical locations of the CGM (upper arm for Abbott vs. abdomen for Dexcom) and the corresponding differences in subcutaneous fat²⁴.

Fig. 2 — Glucose response for healthy adults, pre-diabetes and T2D for the Abbott FreeStyle Libre (red) and the Dexcom G6 (in blue) CGMs. Triangles indicate average glucose; solid lines denote median glucose.

Figure 3 shows the proportion of time (i.e., CGM readings) in different glucose regions for participants, organized by CGM device and metabolic state. To analyze these results, we conducted a repeated-measures two-way ANOVA with CGM device and metabolic state as independent factors. For time below range (<70 mg/dL), we found main effects for device (p < 0.001) and state (p < 0.01), as well as interactions (p < 0.01). Post-hoc tests using Tukey’s Honest Significant Difference (HSD) found that time in hypoglycemia was significantly different between the two devices (p < 0.01) as well as between healthy and T2D groups (p < 0.05). For time-in-range (70-180 mg/dL; TIR), two-way ANOVA revealed a main effect for state (p < 0.01) and interactions (p < 0.01) but no main effect for device (p = 0.31). A post-hoc test using Tukey’s HSD found that the average TIR was significantly different between pre-diabetes and T2D groups (p < 0.05). For time above range (>180 mg/dl; %HG), we found a main effect for device (p < 0.001) and state (p < < 0.001) but no interactions (p = 0.18). Tukey’s HSD for multiple comparisons found that the average time in hyperglycemia was significantly different between the two sensors (p < 0.05) as well as between healthy and T2D groups (p < 0.01) and healthy and prediabetes groups (p < 0.01). Thus, these results indicate that the average time in hypoglycemia and hyperglycemia is influenced by metabolic status (as predicted) and by the CGM device (as previously reported²⁴). However, for time-in-range the effect is only significant for metabolic state.

Fig. 3 — Time in range for individual participants in the three metabolic status groups (healthy, pre-diabetes, type-2 diabetes) for the Abbott FreeStyle Libre (top) and Dexcom G6 (bottom).

Prediction of breakfast postprandial glucose responses

As a final validation step, we built a machine-learning model to reproduce results in the personalized nutrition study by Zeevi et al.²⁵. The model used anthropometric features, blood parameters, and macronutrient amounts to predict breakfast PPGRs. Anthropometric parameters included BMI, age, and gender. Blood parameters included HbA1c, fasting blood glucose (BG), cholesterol, fasting insulin, triglycerides (TG), HDL, non-HDL, LDL, ratio of cholesterol to HDL (CHO/HDL), VLDL, and homeostatic model assessment for insulin resistance (HOMA-IR). We also included baseline glucose at the start of the meal as a predictor. Following Zeevi et al.²⁵, we computed the 2 hour iAUC for each PPGR recorded using the Abbott FreeStyle Libre Pro (results on the Dexcom G6 Pro device are comparable, and thus not reported). Following Zeevi et al.²⁵ as well, we used an extreme gradient boosting (XGBoost)²⁶ model with 80 tree estimators, max depth of 1, learning rate of 0.2, L1 regularization of 1, and no L2 regularization to predict iAUC and AUC from those features. Using a leave one subject out procedure (i.e., train on data from 44 participants, test on the remaining participant), we obtained a Pearson correlation of 0.89 (p < 0.001) between ground truth and predicted AUC; see Fig. 4a. A separate XGBoost model predicts iAUC with a correlation of ρ = 0.64 (p < 0.001) with respect to ground-truth iAUC; see Fig. 4b. These correlations between predicted and actual iAUC in our dataset are consistent with those reported by Zeevi et al. on a cohort of 800 participants (ρ = 0.68) and a separate validation cohort of 100 participants (0.70)²⁵ and Mendes-Soares et al. (ρ = 0.62)²⁷ on a different cohort with 327 participants, which support the validity of the CGMacros dataset²². It is notable that, despite having an order of magnitude fewer participants, predictions from CGMacros are similar to those reported on those studies, which adds further credence to the validity of the CGMacros dataset²².

Fig. 4 — Correlation between (a) predicted and ground truth AUC, and (b) predicted and ground truth AUC.

In a final analysis, we examined the importance of each feature when predicting AUC and iAUC. For this purpose, we trained the XGBoost model on data from all study participants, and computed the SHapley Additive exPlanations (SHAP)²⁸. Figure 5 shows a beeswarm plot of the predictors’ “importance” and their relationship with the dependent variable. Predictors are ranked by their importance (from top to bottom), each point representing an instance (i.e., a meal for a given participant). Each point is color coded according to the value of the corresponding feature on that instance (red: high; blue: low), and its horizontal position denotes whether the feature leads to a higher (right) or lower (left) prediction of AUC or iAUC. Fasting glucose (measured during participant recruitment) is the strongest predictor for iAUC and the second strongest for AUC, in both cases showing a positive correlation that is consistent with the literature²⁹. The amount of carbohydrates in a meal (converted into calories) is also a strong predictor for iAUC (#2) and AUC (#3), in both cases showing an expected positive correlation (i.e., carbohydrates are the main determinant of postprandial glucose). Baseline glucose (i.e., immediately prior to meal intake) is the strongest predictor for AUC, as expected, and third for iAUC, consistent with the fact that baseline glucose is subtracted when computing the iAUC. HbA1c is the fifth strongest predictor for iAUC and fourth for AUC, also with a positive effect that is consistent with findings that elevated HbA1c leads to higher postprandial glucose responses³⁰. The amount of protein in the meal (again converted into calories) is the fourth strongest predictor for iAUC and fifth for AUC, with a negative correlation that is consistent with its suppressing effect on postprandial glucose^31,32. The amount of fat in the meal also shows a negative correlation, reflecting its suppressing effect on postprandial glucose (i.e., due in part to gastric emptying), but is not a strong of a predictor (#12 for iAUC and #9 for AUC) as protein. Overall, these results agree with prior literature on the main contributors to elevated postprandial glucose, further supporting the validity of the CGMacros dataset²².

Fig. 5 — Beeswarm plot of SHAP values for (a) iAUC and (b) AUC predictions. Features are ranked from highest (top) to lowest (bottom). Each point represents an instance (i.e., a meal for a participant), color coded by the feature’s value (red: high; blue: low) on that instance and its horizontal position representing its impact on the corresponding AUC/iAUC (right: positive; left: negative). For example, high carbohydrates (red color) have a positive (right side) contribution to iAUC.

Beyond prediction of postprandial glucose responses to meals, CGMacros could support further research in diet monitoring. A prime example is in the development of models and algorithms to predict the macronutrient composition of meals based on postprandial glucose responses with minimal user intervention^15,16. Such models can be thought of as the “inverse” problem³³ of the one discussed in this section, whose goal is to predict postprandial glucose responses to meals based on their macronutrient content (i.e., the “direct” problem). CGMacros may also be used to develop “meal announcement” algorithms³⁴ for closed-loop insulin delivery systems by identifying moments of dietary intake from CGM recordings. Finally, CGMacros may also be used to develop interpretable models (e.g., parametric) of the relationship between health parameters (e.g., HbA1c, lipid profiles) and macronutrients in mixed meals.

Acknowledgements

This work was supported by National Science Foundation award No. 2014475.

Author contributions

Anurag Das was responsible for data curation and data analytics and implementation of machine learning models, and a major contributor to manuscript preparation. David Kerr, MD, was co-Investigator on the NSF award, led the human subject study design and provided medical oversight to the research team. Namino Glantz, PhD, Wendy Bevier, PhD, and Ronny Santiago were responsible for participant recruitment and project management and contributed to study design and implementation. Ricardo Gutierrez-Osuna, PhD. was co-Investigator of the NSF award; he co-led data analytics for the NSF award and was responsible for manuscript preparation. Bobak J. Mortazavi, PhD, was Principal Investigator of the NSF award; he was responsible for project oversight and co-lead data analytics and meal design. We would like to acknowledge Hooman Sajjadi and Sicong Huang at TAMU for contributions to data collection and curation, Anna Spence at SDRI for contributing to coordination of the human subjects study, and Megan McCrory for designing the macronutrient composition of breakfast meals and Chipotle lunch meals.

Data availability

The CGMacros dataset is available on figshare²². The code for analyzing the dataset and generating all figures are also available at https://github.com/PSI-TAMU/CGMacros.

Competing interests

Dr. Mortazavi discloses a relationship with McAndrews, Held, and Malloy Ltd and Kirkland & Ellis, LLP for expert testimony.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Afshin, A. et al. Health effects of dietary risks in 195 countries, 1990–2017: a systematic analysis for the global burden of disease study 2017. The lancet393, 1958–1972 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Cordeiro, F. et al. Barriers and negative nudges: Exploring challenges in food journaling. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 1159-1162 (2015). [DOI] [PMC free article] [PubMed]
3.Bedri, A., Li, D., Khurana, R., Bhuwalka, K. & Goel, M. Fitbyte: Automatic diet monitoring in unconstrained situations using multimodal sensing on eyeglasses. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–12 (2020).
4.Bell, B. M. et al. Automatic, wearable-based, in-field eating detection approaches for public health research: a scoping review. NPJ digital medicine3, 38 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.from Inertial, I.-M. E. B. Modeling wrist micromovements to measure in-meal eating behavior from inertial sensor data. IEEE journal of biomedical and health informatics23, 2325 – 2334 (2019). [DOI] [PubMed] [Google Scholar]
6.Kyritsis, K., Diou, C. & Delopoulos, A. A data driven end-to-end approach for in-the-wild monitoring of eating behavior using smartwatches. IEEE Journal of Biomedical and Health Informatics25, 22–34 (2020). [DOI] [PubMed] [Google Scholar]
7.Mirtchouk, M., Merck, C. & Kleinberg, S. Automated estimation of food type and amount consumed from body-worn audio and motion sensors. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 451–462 (2016).
8.Shen, Y., Salley, J., Muth, E. & Hoover, A. Assessing the accuracy of a wrist motion tracking method for counting bites across demographic and food variables. IEEE journal of biomedical and health informatics21, 599–606 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Rouast, P. V., Heydarian, H., Adam, M. T. & Rollo, M. E. Oreba: A dataset for objectively recognizing eating behavior and associated intake. IEEE Access8, 181955–181963 (2020). [Google Scholar]
10.Matsuda, Y., Hoashi, H. & Yanai, K. Recognition of multiple-food images by detecting candidate regions. In 2012 IEEE international conference on multimedia and expo, 25–30 (IEEE, 2012).
11.Min, W. et al. Large scale visual food recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence45, 9932–9949 (2023). [DOI] [PubMed] [Google Scholar]
12.Salvador, A. et al. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3020–3028 (2017).
13.Thames, Q. et al. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8903–8911 (2021).
14.Sajjadi, S. et al. Towards the development of subject-independent inverse metabolic models. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3970–3974 (IEEE, 2021).
15.Das, A. et al. Predicting the macronutrient composition of mixed meals from dietary biomarkers in blood. IEEE Journal of Biomedical and Health Informatics26, 2726–2736 (2021). [DOI] [PubMed] [Google Scholar]
16.Zhang, L. et al. Joint embedding of food photographs and blood glucose for improved calorie estimation. In 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), 1–4 (IEEE, 2023).
17.Marling, C. & Bunescu, R. The ohiot1dm dataset for blood glucose level prediction: Update 2020. In CEUR workshop proceedings, vol. 2675, 71 (NIH Public Access, 2020). [PMC free article] [PubMed]
18.Dubosson, F. et al. The open d1namo dataset: A multi-modal dataset for research on non-invasive type 1 diabetes management. Informatics in Medicine Unlocked13, 92–100 (2018). [Google Scholar]
19.Rodriguez-Leon, C. et al. T1diabetesgranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data10, 916 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Zhu, T., Li, K., Herrero, P. & Georgiou, P. Personalized blood glucose prediction for type 1 diabetes using evidential deep learning and meta-learning. IEEE Transactions on Biomedical Engineering70, 193–204 (2022). [DOI] [PubMed] [Google Scholar]
21.Zhao, Q. et al. Chinese diabetes datasets for data-driven machine learning. Scientific Data10, 35 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Gutierrez-Osuna, R., Kerr, D., Mortazavi, B. & Das, A. CGMacros: a scientific dataset for personalized nutrition and diet monitoring. figshare (2025).
23.Tily, H. et al. Gut microbiome activity contributes to individual variation in glycemic response in adults. Biorxiv 641019 (2019). [DOI] [PMC free article] [PubMed]
24.Howard, R., Guo, J. & Hall, K. D. Imprecision nutrition? different simultaneous continuous glucose monitors provide discordant meal rankings for incremental postprandial glucose in subjects without diabetes. The American journal of clinical nutrition112, 1114–1119 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell163, 1079–1094 (2015). [DOI] [PubMed] [Google Scholar]
26.Chen, T. Xgboost: extreme gradient boosting. R package version 0.4-21 (2015).
27.Mendes-Soares, H. et al. Model of personalized postprandial glycemic response to food developed for an israeli cohort predicts responses in midwestern american individuals. The American journal of clinical nutrition110, 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Lundberg, S. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30 (2017)
29.Berry, S. et al. Personalised responses to dietary composition trial (predict): an intervention study to determine inter-individual differences in postprandial response to foods (2020).
30.Ketema, E. B. & Kibret, K. T. Correlation of fasting and postprandial plasma glucose with hba1c in assessing glycemic control; systematic review and meta-analysis. Archives of Public Health73, 1–9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Rytz, A. et al. Predicting glycemic index and glycemic load from macronutrients to accelerate development of foods and beverages with lower glucose responses. Nutrients11, 1172 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Wolever, T. M., Zurbau, A., Koecher, K. & Au-Young, F. The effect of adding protein to a carbohydrate meal on postprandial glucose and insulin responses: a systematic review and meta-analysis of acute controlled feeding trials. The Journal of Nutrition (2024). [DOI] [PubMed]
33.Mortazavi, B. J. & Gutierrez-Osuna, R. A review of digital innovations for diet monitoring and precision nutrition. Journal of Diabetes Science and Technology17, 217–223 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Petrovski, G. et al. Simplified meal announcement versus precise carbohydrate counting in adolescents with type 1 diabetes using the minimed 780g advanced hybrid closed loop system: A randomized controlled trial comparing glucose control. Diabetes Care46, 544–550 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Wolfe, R. R., Cifelli, A. M., Kostas, G. & Kim, I.-Y. Optimizing protein intake in adults: interpretation and application of the recommended dietary allowance compared with the acceptable macronutrient distribution range. Advances in Nutrition8, 266–275 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The CGMacros dataset is available on figshare²². The code for analyzing the dataset and generating all figures are also available at https://github.com/PSI-TAMU/CGMacros.

[CR1] 1.Afshin, A. et al. Health effects of dietary risks in 195 countries, 1990–2017: a systematic analysis for the global burden of disease study 2017. The lancet393, 1958–1972 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Cordeiro, F. et al. Barriers and negative nudges: Exploring challenges in food journaling. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 1159-1162 (2015). [DOI] [PMC free article] [PubMed]

[CR3] 3.Bedri, A., Li, D., Khurana, R., Bhuwalka, K. & Goel, M. Fitbyte: Automatic diet monitoring in unconstrained situations using multimodal sensing on eyeglasses. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–12 (2020).

[CR4] 4.Bell, B. M. et al. Automatic, wearable-based, in-field eating detection approaches for public health research: a scoping review. NPJ digital medicine3, 38 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.from Inertial, I.-M. E. B. Modeling wrist micromovements to measure in-meal eating behavior from inertial sensor data. IEEE journal of biomedical and health informatics23, 2325 – 2334 (2019). [DOI] [PubMed] [Google Scholar]

[CR6] 6.Kyritsis, K., Diou, C. & Delopoulos, A. A data driven end-to-end approach for in-the-wild monitoring of eating behavior using smartwatches. IEEE Journal of Biomedical and Health Informatics25, 22–34 (2020). [DOI] [PubMed] [Google Scholar]

[CR7] 7.Mirtchouk, M., Merck, C. & Kleinberg, S. Automated estimation of food type and amount consumed from body-worn audio and motion sensors. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 451–462 (2016).

[CR8] 8.Shen, Y., Salley, J., Muth, E. & Hoover, A. Assessing the accuracy of a wrist motion tracking method for counting bites across demographic and food variables. IEEE journal of biomedical and health informatics21, 599–606 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Rouast, P. V., Heydarian, H., Adam, M. T. & Rollo, M. E. Oreba: A dataset for objectively recognizing eating behavior and associated intake. IEEE Access8, 181955–181963 (2020). [Google Scholar]

[CR10] 10.Matsuda, Y., Hoashi, H. & Yanai, K. Recognition of multiple-food images by detecting candidate regions. In 2012 IEEE international conference on multimedia and expo, 25–30 (IEEE, 2012).

[CR11] 11.Min, W. et al. Large scale visual food recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence45, 9932–9949 (2023). [DOI] [PubMed] [Google Scholar]

[CR12] 12.Salvador, A. et al. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3020–3028 (2017).

[CR13] 13.Thames, Q. et al. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8903–8911 (2021).

[CR14] 14.Sajjadi, S. et al. Towards the development of subject-independent inverse metabolic models. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3970–3974 (IEEE, 2021).

[CR15] 15.Das, A. et al. Predicting the macronutrient composition of mixed meals from dietary biomarkers in blood. IEEE Journal of Biomedical and Health Informatics26, 2726–2736 (2021). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Zhang, L. et al. Joint embedding of food photographs and blood glucose for improved calorie estimation. In 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), 1–4 (IEEE, 2023).

[CR17] 17.Marling, C. & Bunescu, R. The ohiot1dm dataset for blood glucose level prediction: Update 2020. In CEUR workshop proceedings, vol. 2675, 71 (NIH Public Access, 2020). [PMC free article] [PubMed]

[CR18] 18.Dubosson, F. et al. The open d1namo dataset: A multi-modal dataset for research on non-invasive type 1 diabetes management. Informatics in Medicine Unlocked13, 92–100 (2018). [Google Scholar]

[CR19] 19.Rodriguez-Leon, C. et al. T1diabetesgranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data10, 916 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Zhu, T., Li, K., Herrero, P. & Georgiou, P. Personalized blood glucose prediction for type 1 diabetes using evidential deep learning and meta-learning. IEEE Transactions on Biomedical Engineering70, 193–204 (2022). [DOI] [PubMed] [Google Scholar]

[CR21] 21.Zhao, Q. et al. Chinese diabetes datasets for data-driven machine learning. Scientific Data10, 35 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Gutierrez-Osuna, R., Kerr, D., Mortazavi, B. & Das, A. CGMacros: a scientific dataset for personalized nutrition and diet monitoring. figshare (2025).

[CR23] 23.Tily, H. et al. Gut microbiome activity contributes to individual variation in glycemic response in adults. Biorxiv 641019 (2019). [DOI] [PMC free article] [PubMed]

[CR24] 24.Howard, R., Guo, J. & Hall, K. D. Imprecision nutrition? different simultaneous continuous glucose monitors provide discordant meal rankings for incremental postprandial glucose in subjects without diabetes. The American journal of clinical nutrition112, 1114–1119 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell163, 1079–1094 (2015). [DOI] [PubMed] [Google Scholar]

[CR26] 26.Chen, T. Xgboost: extreme gradient boosting. R package version 0.4-21 (2015).

[CR27] 27.Mendes-Soares, H. et al. Model of personalized postprandial glycemic response to food developed for an israeli cohort predicts responses in midwestern american individuals. The American journal of clinical nutrition110, 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Lundberg, S. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30 (2017)

[CR29] 29.Berry, S. et al. Personalised responses to dietary composition trial (predict): an intervention study to determine inter-individual differences in postprandial response to foods (2020).

[CR30] 30.Ketema, E. B. & Kibret, K. T. Correlation of fasting and postprandial plasma glucose with hba1c in assessing glycemic control; systematic review and meta-analysis. Archives of Public Health73, 1–9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Rytz, A. et al. Predicting glycemic index and glycemic load from macronutrients to accelerate development of foods and beverages with lower glucose responses. Nutrients11, 1172 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Wolever, T. M., Zurbau, A., Koecher, K. & Au-Young, F. The effect of adding protein to a carbohydrate meal on postprandial glucose and insulin responses: a systematic review and meta-analysis of acute controlled feeding trials. The Journal of Nutrition (2024). [DOI] [PubMed]

[CR33] 33.Mortazavi, B. J. & Gutierrez-Osuna, R. A review of digital innovations for diet monitoring and precision nutrition. Journal of Diabetes Science and Technology17, 217–223 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Petrovski, G. et al. Simplified meal announcement versus precise carbohydrate counting in adolescents with type 1 diabetes using the minimed 780g advanced hybrid closed loop system: A randomized controlled trial comparing glucose control. Diabetes Care46, 544–550 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Wolfe, R. R., Cifelli, A. M., Kostas, G. & Kim, I.-Y. Optimizing protein intake in adults: interpretation and application of the recommended dietary allowance compared with the acceptable macronutrient distribution range. Advances in Nutrition8, 266–275 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

CGMacros: a pilot scientific dataset for personalized nutrition and diet monitoring

Anurag Das

David Kerr

Namino Glantz

Wendy Bevier

Rony Santiago

Ricardo Gutierrez-Osuna

Bobak J Mortazavi

Abstract