Abstract
The offering of grocery stores is a strong driver of consumer decisions. While highly processed foods like packaged products, processed meat, and sweetened soft drinks have been increasingly associated with unhealthy diets, information on the degree of processing characterizing an item in a store is not straightforward to obtain, limiting the ability of individuals to make informed choices. GroceryDB, a database with over 50,000 food items sold by Walmart, Target, and Whole Foods, shows the degree of processing of food items and potential alternatives in the surrounding food environment. The extensive data gathered on ingredient lists and nutrition facts enables a large-scale analysis of ingredient patterns and degrees of processing, categorized by store, food category, and price range. Furthermore, it allows the quantification of the individual contribution of over 1,000 ingredients to ultra-processing. GroceryDB makes this information accessible, guiding consumers toward less processed food choices.
Food ultra-processing has drastically increased productivity and shelf-time, addressing the issue of food availability to the detriment of food systems sustainability and health [1–4]. Indeed, there is increasing evidence that over-reliance on ultra-processed food (UPF) has fostered unhealthy diet [5]. The sheer number of peer-reviewed articles investigating the link between the degree of food processing and health embodies a general consensus among independent researchers on the health relevance of UPF, contributing up to 60% of consumed calories in developed nations [6–8]. For instance, recent studies have linked the consumption of UPF to non-communicable diseases like metabolic syndrome [9–15], and exposure to industrialized preservatives and pesticides [16–20]. This body of work has driven a paradigm shift from focusing solely on food security, which emphasizes access to affordable food, to prioritizing nutrition security [21, 22]. Nutrition security stresses equitable access to healthy, safe, and affordable foods essential for optimal health and well-being, as defined by the USDA [23, 24], echoing the recent White House Conference on Hunger, Nutrition, and Health [25].
Much of UPF reaches consumers through grocery stores, as documented by the National Health and Nutrition Examination Survey (NHANES), indicating that in the U.S. over 60% of the food consumed comes from grocery stores (Figure S1). The high reliance on UPF and their potential negative health effects raise numerous critical questions, such as: 1) How can the degree of processing of food items be determined? 2) What methods can be used to quantify the extent of food processing in the food supply? 3) What alternatives can be identified to reduce UPF consumption?
Measuring the degree of food processing is a key step in addressing these questions, but it is not straightforward. Indeed, food labels often display mixed messages, partly driven by reductionist metrics focusing on one nutrient at a time [26], and partly because o the contrasting criteria on how to classify processed foods [27]. The ambiguity and inconsistency of current food processing classification systems (FPCS) have led to conflicting results regarding their role as risk factors for non-communicable chronic diseases [28, 29]. Some of these classification systems also suffer from poor inter-rater reliability and lack of reproducibility, issues rooted in purely descriptive expertise-based approaches, leaving room for ambiguity and differences in interpretation [27, 28, 30]. Hence, there is a growing call among scientists for a more objective definition of the degree of food processing, grounded in biological mechanisms instead of varying subjective interpretations across research groups [28]. Among the proposed areas for aligning food processing definitions, the nutritional profile of food is currently the only aspect consistently regulated and reported worldwide [27, 28, 31].
The research efforts outlined in [28] align with a growing demand for high-quality and internationally comparable statistics to promote objective metrics, reproducibility, and data-driven decision-making, advancing convergence towards the Sustainable Development Goals (SDGs) [32, 33]. Artificial intelligence (AI) methodologies [33–36], in particular, are increasingly being utilized for their potential as more objective, data-driven tools to advance nutrition security, a concept underpinning SDGs such as ‘zero-hunger’, ‘good health and well-being’, ‘industry, innovation, and infrastructure’, and ‘reduce in-equalities’.
Responding to the need for objective and scalable metrics to ensure nutrition security, recent efforts harnessed machine learning (ML) to create and fully automate a Food Processing Score (FPro) [37]. FPro is a continuous index derived by training an ML model to predict manual labels of processing techniques based on the overall nutrient profile of a food (see Methods and Section S4). To teach the algorithm how to score processing from nutrients, labels provided by NOVA — the most widely used system for classifying foods based on processing-related criteria— were leveraged, offering a rich array of epidemiological literature for comparative analysis [9, 38, 39]. However, the FPro algorithm can accommodate different FPCS such as EPIC [40], UNC [41], or SIGA [42]. The predictive power of FPro was rigorously tested for epidemiological outcomes with an Environment-Wide Association Study (EWAS), leveraging multiple cycles of the USDA model food databases and national food consumption surveys [37].
Here, building on the versatility and scalability of the FPro algorithm, we extend our analysis beyond “model foods” tailored for epidemiological databases, analyzing real-world data encompassing over 50,000 products from major U.S. grocery store websites. This extensive dataset underpins the development of GroceryDB, an open-source database of foods and beverages, featuring comprehensive metadata on nutritional content, ingredient list, and price for each item, collected from publicly available online markets of Walmart, Target, and Whole Foods. Our objective is to demonstrate how ML can effectively analyze large-scale real-world food composition data, and translate this wealth of information into the degree of processing for any food in grocery stores, facilitating consumer decision-making and informing public health initiatives aimed at enhancing the overall quality of the food environment. This initiative not only lays the groundwork for similar efforts globally, aimed at promoting better-informed dietary choices, but also underscores the critical role of open-access, internationally comparable data in advancing global nutrition security.
Results
For each food, we automated the process of determining the extent of food processing using FPro, which translates the nutritional content of a food item into its degree of processing [37]. Figure 1 illustrates the use of FPro by presenting the processing scores of three products in the bread and yogurt categories, enabling the comparison of their processing levels. For example, the Manna Organics multi-grain bread is made from whole wheat kernels, barley, rice without additives, added salt, oil, and yeast, resulting in a low processing score of . In contrast, the Aunt Millie’s () and Pepperidge Farmhouse () breads include ‘resistant corn starch’, ‘soluble corn fiber’, and ‘oat fiber’, requiring additional processing to extract starch and fiber from corn and oat to be used as an independent ingredient (Figure 1a), yielding higher processing scores. Similarly, in the yogurt category, the Seven Stars Farm yogurt () is a whole milk yogurt made from ‘grade A pasteurized organic milk’, while the Siggi’s yogurt () uses ‘Pasteurized Skim Milk’ that requires further processing to obtain 0% fat. Finally, the Chobani Cookies & Cream yogurt relies on cane sugar as the second most dominant ingredient, and contains cocktails of additives like ‘caramel color’, ‘fruit pectin’, and ‘vanilla bean powder’ making it a highly processed yogurt, resulting in a high processing score .
Figure 1: Degrees of Food Processing in Three Categories.

FPro can assess the extent of food processing in three major U.S. grocery stores, and it is best suited to rank foods within the same category. (a) In breads, the Manna Organics multi-grain bread, offered by Whole Foods, is mainly made from ‘whole wheat kernels’, barley, and brown rice without any additives, added salt, oil, and yeast, with . However, the Aunt Millie’s () and Pepperidge Farmhouse () breads, found in Target and Walmart, include ‘soluble corn fiber’ and ‘oat fiber’ with additives like ‘sugar’, ‘resistant corn starch’, ‘wheat gluten’, and ‘monocalcium phosphate’. (b) The Seven Stars Farm yogurt () is made from ‘grade A pasteurized organic milk’. The Siggi’s yogurt () declares ‘Pasteurized Skim Milk’ as the main ingredient that has 0% fat milk, requiring more food processing to eliminate fat. Lastly, the Chobani Cookies & Cream yogurt () has cane sugar as the second most dominant ingredient combined with multiple additives like ‘caramel color’, ‘fruit pectin’, and ‘vanilla bean powder’, making it a highly processed yogurt. Image credits: “round glossy ice cream cup. Photo-realistic packaging mockup template with sample design. 3d illustration. White round matte ice cream cup” (by Shubby Studio for Adobe Stock), “rural meadow and cow” (by djvstock2 from Giuseppe Ramos D for CanvaPro via Canva.com), “cow gradient icon” (by Eucalyp from amethyststudio for CanvaPro via Canva.com), “Blue Ceramic Vase” (by BlueRingMedia from GraphicsRF for CanvaPro via Canva.com), “Yogurt And Ice Cream Tub” (by DEVASHISH˙RAVAT from Getty images), “cookie bite icon” (by otomedream from otomedream for CanvaPro via Canva.com), “ice cream topping” (by otomedream from otomedream for CanvaPro via Canva.com), “cookie bite icon” (by otomedream from otomedream for CanvaPro via Canva.com), “Mauve Paint Brushstroke” (by DSAP Project from DSAP Project’s Images).
GroceryDB assigns an FPro score to all foods collected from Walmart, Target, and Whole Foods leveraging the ML classifier FoodProX, which takes mandatory information from nutrition labels as input (see Methods). The distribution of the FPro scores in the three stores displays a high degree of similarity: each store exhibits a monotonically increasing curve (Figure 2a), indicating that minimally-processed products (low FPro) represent a relatively small fraction of the inventory of grocery stores, the majority of the offerings being in the ultra-processed category (high FPro). Although less-processed items make up a smaller share of the overall inventory, they likely account for a proportionally larger portion of actual purchases, highlighting a discrepancy between sales data and available food options. Nevertheless, systematic differences between stores emerge: Whole Foods offers a greater selection of minimally processed items and fewer ultra-processed options, whereas Target has a particularly high proportion of ultra-processed products (high FPro). FPro also captures the inherent variability in the degree of processing per food category. As illustrated in Figure 2b, there is a small variability of FPro scores in categories like jerky, popcorn, chips, bread, biscuits, and mac & cheese, indicating that consumers have limited choices in terms of degree of processing for these food groups (Section S7 for harmonizing categories between stores). Yet, in categories like cereals, milk & milk-substitute, pasta-noodles, and snack bars, FPro shows considerable variation, reflecting a wider extent of possible choices from a food processing perspective.
Figure 2: Food Processing in Grocery Stores.

(a) The distribution of FPro scores from the three stores follows a similar trend, a monotonically increasing curve, indicating that the number of low FPro items (unprocessed and minimally-processed) offered by the grocery stores is relatively lower than the number of high FPro items (highly-processed and ultra-processed items), and the majority of offerings are ultra-processed (Methods for FPro calculation). (b) Distribution of FPro scores for different categories of GroceryDB. The distributions indicate that FPro has a remarkable variability within each food category, confirming the different degrees of food processing offered by the stores. Unprocessed foods like eggs, fresh produce, and raw meat are excluded (Section S7). Sample sizes range from 126 for Baby Food to 2,043 for Prepared Meals Dishes (see Source Data for exact values). For the box plots, the minimum is the lower quartile, the central line is the median, and the maximum is the upper quartile. The whiskers show data outside of the inter-quartile range. Diamonds represent outliers. (c) The distributions of FPro scores in GroceryDB compared to two USDA nationally representative food databases: the USDA Food and Nutrient Database for Dietary Studies (FNDDS) and FoodData Central Branded Products (BFPD). The similarity between the distributions of FPro scores in GroceryDB, BFPD, and FNDDS suggests that GroceryDB offers a comprehensive coverage of foods and beverages (Section S6).
The distribution of FPro in GroceryDB was compared with the latest USDA Food and Nutrient Database for Dietary Studies (FNDDS), offering a representative sample of the consumed food supply (Figure 2c). The similarity between the distributions of FPro scores obtained from GroceryDB and FNDDS suggests that GroceryDB also offers a representative sample of foods and beverages in the supply chain. Additionally, the comparison of GroceryDB with the USDA Global Branded Food Products Database (BFPD), which contains 1,142,610 branded products, reveals that the FPro distributions in GroceryDB and BFPD follow similar trends (Figure 2c). While BFPD contains 22 times more foods than GroceryDB, only an estimated 44% of GroceryDB’s products are represented in BFPD, even after accounting for potential variability in food names and ingredient lists (Section S6). This indicates that while BFPD offers an extensive representation of branded products, it does not fully capture the current offering of stores. Furthermore, a comparison of GroceryDB with Open Food Facts (OFF) [43], an extensive crowd-sourced collection of branded products containing 426,000 items with English ingredient lists, reveals that fewer than 40% of the products in GroceryDB are present in OFF (Figure S4). This limited overlap suggests that monitoring products currently offered in grocery stores may provide a more accurate account of the food supply available to consumers.
Food Processing and Caloric Intake
The depth and the resolution of the data collected in GroceryDB unveils some of the complexity regarding the relation between price and calories. Among all categories in GroceryDB, a 10% increase in FPro results in 8.7% decrease in the price per calorie of products, as captured by the dashed line in Figure 3A. However, the relationship between FPro and price per calorie strongly depends on the food category (Section S8). For example, in soups & stews the price per calorie drops by 24.3% for 10% increase in FPro (Figure 3b), a trend observed also in cakes, mac & cheese, and ice cream (Figure S8). This means that on average, the most processed soups & stews, with , are 67.72% cheaper per calorie than the minimally-processed alternatives with (Figure 3e). In contrast, the price per calorie for cereals drops only by 1.2% for 10% increase in FPro (Figure 3c), a slow decrease observed also for seafood and yogurt products (Figure S8). Interestingly, there is an increasing trend between FPro and price in the milk & milk-substitute category (Figure 3d), partially explained by the higher price of plant-based milk substitutes, that require more extensive processing than the dairy-based milks.
Figure 3: Price and Food Processing.

(a) Using robust linear models, the relationship between price and food processing are assessed (Figure S8 for regression coefficients of all categories). The price per calories drops by 24.3% (soup & stew, n = 505) and 1.2% (cereal, n = 659) for 10% increase in FPro. Also, a 8.7% decrease is observed across all foods in GroceryDB (n = 19,345) for 10% increase in FPro. Interestingly, in milk & milk-substitute (n = 240), price per calorie increases by 1.6% for 10% increase in FPro, partially explained by the higher price of plant-based milks that are more processed than regular dairy milk. The shaded area for each line is the 95% confidence interval of the standard error. (b-d) Distributions of price per calorie in the linear bins of FPro scores for each store (Figure S7 illustrates the correlation between price and FPro for all categories). In soup & stew, there is a steep decreasing slope between FPro and price per calorie, while in cereals the effect is smaller. In milk & and milk-substitute, price tends to slightly increase with higher values of FPro. For the box plots, the minimum is the lower quartile, the central line is the median, and the maximum is the upper quartile. The whiskers show data outside of the inter-quartile range. Diamonds represent outliers. (e) Percentage of change in price per calorie from the minimally-processed products to ultra-processed products in different food categories. This analysis was performed by comparing the average price per calorie of the top 10% most processed items with the top 10% least processed items within each category. In the full GroceryDB, on average, the ultra-processed items are 52.09% cheaper than their minimally-processed alternatives. n > 4 for all statistics, see Source Data for exact sample sizes.
Choice Availability and Food Processing
Not surprisingly, GroceryDB documents differences in the product offerings of the three stores analyzed. For instance, in the cereal category — one of the most popular staple foods, consumed by 283 million Americans in 2020 [44] — Whole Foods offers a selection with a broad spectrum of processing levels, while Walmart’s cereal options are primarily limited to products with higher FPro (Figure 4a). To investigate the roots of these differences, we examined the ingredients of cereals available at each store. The analysis showed that cereals sold at Whole Foods typically contain less sugar, fewer artificial and natural flavors, and fewer added vitamins compared to those at Walmart and Target, where products are more likely to include corn syrup, a sweetener associated with enhanced dietary fat absorption and weight gain (Figure 4b) [45]. Additives such as butylated hydroxytoluene (a preservative) and calcium carbonate (an acidity regulator and anti-caking agent) are largely absent in the Whole Foods cereals, partially explaining the wider range of processing scores characterizing cereals at this store (Figure 4a).
Figure 4: The Difference between Stores in Term of Processing.

The degree of processing of food choices depends on the grocery store and food category. (a) The degree of processing of food items offered in grocery stores, stratified by food category. For example, in cereals, Whole Foods shows a higher variability of FPro, implying that consumers have a choice between low and high processed cereals. Yet, in pizzas, all supermarkets offer choices characterized by high FPro values. Lastly, all cheese products are minimally-processed, showing consistency across different grocery stores. For the box plots, the minimum is the lower quartile, the central line is the median, and the maximum is the upper quartile. The whiskers show data outside of the inter-quartile range. Diamonds represent outliers. n > 4 for all statistics, see Source Data for exact sample sizes. (b) The top 30 most reported ingredients in cereals shows that Whole Foods tends to eliminate corn syrup, uses more sunflower oil and less canola oil, and relies less on vitamin fortification. In total, GroceryDB has 1,168 cereals from which 973 have ingredient lists (Walmart=309, Target=260, Whole Foods=395). (c) The brands of cereals offered in stores partially explains the different patterns of ingredients and variation of FPro. While Walmart and Target have a larger intersection in the brands of their cereals, Whole Foods tends to supply cereals from brands not available elsewhere.
The brands offered by each store could also explain the different FPro patterns. Indeed, while Walmart and Target have a large overlap in the list of brands they carry, Whole Foods relies on different suppliers (Figure 4c), largely unavailable in other grocery stores. In general, Whole Foods offers less processed soups & stews, yogurt & yogurt drinks, and milk & milk-substitute (Figure 4a). In these categories Walmart’s and Target’s offerings are limited to higher FPro values. Lastly, some food categories like pizza, mac & cheese, and popcorn are highly processed in all stores (Figure 4a). Pizzas available in all three chains, for example, consistently have high FPro values, partly due to the use of substitute ingredients like “imitation mozzarella cheese” instead of real “mozzarella cheese”.
While grocery stores sell a large variety of products, the offered processing choices can be identical in multiple stores. For example, GroceryDB has a comparable number of cookies & biscuits in each chain, with 453, 373, and 402 items in Walmart, Target, and Whole Foods, respectively. The degree of processing of cookies & biscuits in Walmart and Target are nearly identical (), limiting consumer nutritional choices in a narrow range of processing (Figure 4a). In contrast, Whole Foods not only offers a large number of items (402 cookies & biscuits), but also provides a wider choices of processing ().
Organization of Ingredients in the Food Supply
Food and beverage companies are required to report the list of ingredients in descending order of the amount used in the final product. When an ingredient itself is a composite, consisting of two or more ingredients, the FDA mandates parentheses to declare the corresponding sub-ingredients (Figure 5a–b) [48]. By organizing the ingredient list as a tree (Methods), differences between highly processed and less processed options can be analyzed (Figure 5). In general, products with complex ingredient trees are more processed than products with simpler and fewer ingredients (Section S9.3). For example, the ultra-processed cheesecake in Figure 5a has 43 ingredients, 26 additives, and 3 branches with sub-ingredients. In contrast, the minimally-processed cheesecake has only 14 ingredients, 5 additives, and 2 sub-ingredient branches (Figure 5b). As illustrated by the cheesecake example, the ingredients used in the food supply provide valuable insights into the type and extent of processing of the final product, prompting the question: which ingredients contribute the most to the degree of processing of a product? To answer this, we introduce the Ingredient Processing Score (IgFPro), defined as
| (1) |
where ranks an ingredient in decreasing order based on its position in the ingredient list of each food that contains (Section S9.5). IgFPro ranges between 0 (unprocessed) and 1 (ultra-processed), enabling the rank-order of ingredients based on their contribution to the degree of processing of the final product. This analysis reveals that not all additives contribute equally to ultra-processing. For example, the ultra-processed cheesecake (Figure 5a) has sodium tripolyphosphate (a stabilizer used to improve the whipping properties with ), polysorbate 60 (an emulsifier used in cakes for increased volume and fine grain with ), and corn syrup (a corn sweetener with ) [49], each of which emerging as signals of ultra-processing with high IgFPro scores. In contrast, both the minimally-processed and ultra-processed cheese-cakes (Figure 5) contain xanthan gum (), guar gum (), locust bean gum (), and salt (). Indeed, the European Food Safety Authority (EFSA) reported that xanthan gum as a food additive does not pose any safety concern for the general population, and the FDA classified guar gum and locust bean gum as ‘generally recognized as safe’ (GRAS) [49].
Figure 5: Ingredient Trees.

GroceryDB organizes the ingredient list of products into structured trees, where the additives are marked as orange nodes (Methods and Section S9). (a) The highly processed cheesecake contains 43 ingredients from which 26 are additives, resulting in a complex ingredient tree with 3 branches of sub-ingredients. (b) The minimally-processed cheesecake has a simpler ingredient tree with 14 ingredients, 5 additives, and 2 sub-ingredient branches. Additives are identified according to the FDA [46, 47]. Image credits: ”Watercolor Cheesecake Illustration” (by greywalnutstudio from greywalnutstudio for CanvaPro via Canva.com), ”Gold Line Stripes” (by Laut Biru Studio for CanvaPro via Canva.com), ”Delicious cheesecake on white background” (by Africa Studio for Adobe Stock).
By the same token, when evaluating oils used as ingredients in branded products, IgFPro identifies brain octane oil (), flaxseed oil (), and olive oil () as the highest quality options, having the smallest contribution to ultra-processing. On the other hand, palm oil (), vegetable oil (), and soybean oil () represent strong signals of ultra-processing (Figure 6a). Notably, flaxseed oil is high in omega-3 fatty acids with several health benefits [50]. In contrast, blending of vegetable oils — a signature of UPF — is a straightforward practice to achieve desired texture, stability, and nutritional profiles [51].
Figure 6: Ingredient Processing Score (IgFPro).

To investigate which ingredients contribute most to ultra-processed products, Eq. 1 is used. With the introduction of IgFPro, over 12,000 ingredients are ranked by their prevalence and contribution to ultra-processed products prioritizing ingredients and food groups for targeted intervention. (a) The IgFPro of all ingredients that appeared in at least 10 products are calculated, rank-ordering ingredients based on their contribution to UPF. The popular oils used as an ingredient are highlighted, with the brain octane, flaxseed, and olive oils contributing the least to ultra-processed products. In contrast, the palm, vegetable, and soybean oils contribute the most to ultra-processed products (Section S9.5). (b) The patterns of ingredients in the least-processed tortilla chips vs. the ultra-processed tortilla chips. The bold fonts track the IgFPro of the oils used in the three tortilla chips. The minimally-processed tortilla chips () uses avocado oil (), and the more processed El Milagro tortilla () has corn oil (). In contrast, the ultra-processed Doritos () relies on a blend of vegetable oils (), and is accompanied with a much more complex ingredient tree, indicating that there is no single ingredient “biomarker” for UPF. Ingredients trees contain both ingredients (blue) and additives (orange). Additives are identified according to the FDA [46, 47]. Image credits: “Food Packaging, Foil and plastic snack bags mockup isolated on white background, Purple colored pillow packages for food production on White PNG File” (by Juraiwan for Adobe Stock), “Nachos by Magic Design Educational Worksheet Geometric” (by sparklestroke for CanvaPro via Canva.com), “brown paper bag” (by Graphic for Adobe Stock), “Foil and plastic snack bags mockup isolated on white background, Dark blue colored pillow packages for food production, snack wrappers on White Background With clipping path” (by MERCURY studio fro Adobe Stock), “Flying mexican nachos chips, isolated on white background“ (by Yeti Studio for Adobe Stock), “Tortilla Chip Illustration” (by eyewave for CanvaPro via Canva.com).
Finally, to illustrate the ingredient patterns characterizing UPF in Figure 6b, three tortilla chips are ranked from the “minimally-processed” to ultra-processed. Relative to the snack-chips category, Siete tortilla is minimally-processed (), made with avocado oil and blend of cassava and coconut flours. The more processed El Milagro tortilla () is cooked with corn oil, grounded corn, and has calcium hydroxide, a GRAS additive made by adding water to calcium oxide (lime) to promote dispersion of ingredients [49]. In contrast, the ultra-processed Doritos () have corn flours, blend of vegetable oils, and rely on 12 additives to ensure a palatable taste and the texture of the tortilla chip, demonstrating the complex patterns of ingredients and additives needed for ultra-processing (Figure 6b).
In summary, complex ingredient patterns accompany the production of UPF (Section S9.4). IgFPro enables the assessment of processing characteristics across the entire food supply, as well as the contribution of individual ingredients.
Discussion
GroceryDB, accessible to the public at http://TrueFood.Tech/, offers both the data and methodologies needed to quantify food processing and analyze the structure of ingredients within the U.S. food supply. By combining large-scale data on food composition and ML, GroceryDB uncovers insights on the current state of food processing in the U.S. grocery landscape, obtaining distributions of food processing scores that capture a remarkable variability in the offerings of different grocery stores. The differences in FPro’s distributions (Figure 2A) indicate that multiple factors drive the range of choices available in grocery stores, from the cost of food and the socioeconomic status of the consumers to the distinct declared missions of the supermarket chains: “quality is a state of mind” for Whole Foods Market and “helping people save money so they can live better” for Walmart [52, 53]. Furthermore, the continuous nature of FPro allows for data-driven investigations on the relationship between price and food processing stratified by food category. Overall, food processing in GroceryDB tends to be associated with the production of more affordable calories, a positive correlation that raises the likelihood of habitual consumption among lower-income populations, ultimately contributing to growing socioeconomic disparities in terms of nutrition security [54–59]. However, it is important to note that the strength and direction of this correlation varies depending on the specific food category under consideration, as exemplified by the opposite trend of milk & milk-substitutes compared to soups & stews (Section S8). Further in-depth analyses are needed to evaluate the effectiveness of intervention strategies targeting specific food groups within diverse food environments.
Governments increasingly acknowledge the impact of processed foods on population health, and its long-term effect on healthcare [60, 61]. For example, the UK spends £18 billion annually on direct medical costs related to non-communicable diseases like obesity [62], while the U.S. incurs $1.1 trillion in yearly food-related human health costs [63, 64]. GroceryDB serves as a valuable resource for both consumers and policymakers, offering essential insights to gauge the level of food processing within the food supply. For instance, in categories like cereals, milk & milk alternatives, pasta-noodles, and snack bars, FPro exhibits a wide range, highlighting the substantial variations in the processing levels of products. If consumers had access to this processing data, they could make informed choices, selecting items with significantly different degrees of processing (Figure 2B). Yet, the comprehension of nutrient and ingredient data disclosed on food packaging often poses a challenge to consumers due to unrealistic serving sizes and confusing health claims based on one or a few nutrients. Our primary objective lies in translating this wealth of data into an actionable scoring system, enabling consumers to make healthier food choices and embrace effective dietary substitutions, without overwhelming them with excessive information. Additionally, this approach holds great potential for public health initiatives aimed at improving the overall quality of the food environment, such as strategies reorganizing supermarket layouts, optimizing shelf placements, and thoughtfully designing counter displays [56, 65, 66]. Transforming health-related behaviors is a challenging task [67, 68], hence easily adoptable dietary modifications along with environmental nudges could make it easier for individuals to embrace healthier choices.
Currently, FPro partially draws from expertise-based food processing classifications due to limited data concerning compound concentrations indicative of food matrix alterations, such as cellular wall transformations or industrial processing techniques. However, a comprehensive mapping of the “Dark Matter of Nutrition”, encompassing chemical concentrations for additives and processing byproducts, aims to evolve FPro into an unsupervised system, independent of manual classifications [69, 70]. Unlike expertise-based systems, FPro functions as a quantitative algorithm, utilizing standardized inputs to generate reproducible continuous scores, facilitating sensitivity analysis and uncertainty estimations [37] (Section S5). These important features enhance reliability, transparency, and interpretability of the analyses while reducing errors associated with the descriptive nature of manual classifications [28], which have displayed a low degree of consistency among nutrition specialists [30].
The chemical composition of branded products is partially captured by the nutrition facts table and partially reported in the ingredient list, which includes additives like artificial colors, flavors, and emulsifiers. However, comprehensive and internationally well-regulated data on food ingredients is currently limited, as documented by the GS1 UK data crunch analysis which reported an average of 80% inconsistency in products’ data [31], leading us to focus on the nutrition facts to enhance the algorithm’s portability and reproducibility. The nutrition facts alone exhibit excellent performance in discriminating between NOVA classes, confirming how food processing consistently alters nutrient concentrations with reproducible patterns, effectively harnessed by ML [37]. While FPro assesses the degree of food processing by holistically evaluating nutrient concentrations, the few nutrients available on food packaging increase the risk of identifying products with similar nutrition facts but distinct food matrices (e.g., pre-frying, puffing, extrusion-cooking). Indeed, if the chemical panel used to train the algorithm fails to exhaustively capture matrix modifications induced by processing and cooking, FPro and the substitution algorithm implemented at http://TrueFood.Tech/, remain blind to these chemical-physical changes. Incorporating disambiguated ingredients in FPro, such as the ultra-processing markers characterized by SIGA [71], may offer a solution until larger composition tables for branded products become available (Section S5).
In summary, this work represents a departure from traditional food classification systems, advancing toward the use of ML methodologies to model the chemical complexity of food [72] (Section S1). Despite the limited information provided by FDA-regulated nutrition labels, GroceryDB and FPro offer a data-driven approach that enables a substitution algorithm capable of recommending similar but less processed alternatives for any food in GroceryDB. Together, GroceryDB and the TrueFood platform highlight the importance of data transparency in grocery store inventories, a key factor that directly shapes consumer choices.
Methods
Data Collection
Publicly accessible data on food products were compiled from the online platforms of Walmart, Target, and Whole Foods. Each store organizes its food items hierarchically. Utilizing these categorizations, the stores’ websites are systematically navigated to identify specific food items. To ensure consistency, the food category hierarchy within GroceryDB are standardized by comparing and aligning the classification systems employed by each store. The stores sourced nutrition facts from physical food labels and provided digital versions for each food item. This data allowed us to standardize nutrient concentrations to a uniform measure of 100 grams and employ FoodProX to evaluate the degree of food processing for each item. Lastly, all data for this manuscript was collected in May 2021.
Calculation of the Food Processing Score (FPro)
Processing alters the nutrient profile of food, changes that are detectable and categorizable using ML [37, 72, 73]. Hence, FoodProX [37], a random forest classifier, translates the combinatorial changes in the nutrient amounts induced by food processing into a food processing score (FPro). Extensive tests and validations on the stability of FPro were performed in several databases such as the U.S. Food and Nutrient Database for Dietary Studies (FNDDS) and the international Open Food Facts. FPro enabled the implementation an in-silico study based on U.S. cross-sectional population data, showing that on average substituting only a single food item in a person’s diet with a minimally-processed alternative from the same food category can significantly reduce the risk of developing metabolic syndrome (12.25% decrease in odds ratio) and increase vitamin blood levels (4.83% and 12.31% increase of vitamin B12 and vitamin C blood concentration) [37].
FoodProX takes as input 12 nutrients reported in the nutrition facts (Table S1), and returns FPro, a continuous score ranging between 0 (unprocessed foods like fruits and vegetables) and 1 (UPF like instant soups and shelf-stable breads). The manual NOVA classifications were applied to the USDA Standard Reference (SR) and FNDDS databases to train FoodProX. In the original classification, NOVA labels were assigned by inspecting the ingredient list and the food description, but without taking into account nutrient content.
FPro does not assess individual nutrients in isolation but, rather, learns from the configurations of correlated nutrient changes within a fixed quantity of food (100 grams) [37]. Consequently, a single high or low nutrient value does not dictate a food’s FPro but the final score depends on the likelihood of observing the overall pattern of nutrient concentrations in unprocessed food versus UPF. For instance, while fortified food may mirror mineral and vitamin content in unprocessed food, the algorithm identifies unique concentration signatures unlikely to be found in minimally processed food, resulting in a higher FPro [37].
The calculation of FPro for all food in GroceryDB represents a generalization task, where the model faces “never-before-seen” data [72, 74]. More details on the training dataset, including class heterogeneity and imbalance, are available in Section S4.
Price for calories trends
Robust linear models with Huber’s t-norm [75–77] were applied to calculate regression coefficients and p-values for the relationship . The detailed regression results for each food category are presented in Figure S8, while the overall trend across GroceryDB is depicted in Figure 3A. To illustrate the price disparity at the extremes of food processing, the percentage change in price per calorie shown in Figure 3E was calculated by comparing the average price per calorie of the top 10% minimally processed items to that of the top 10% ultra-processed items within each category.
Ingredient Trees
An ingredient list is a reflection of the recipe used to prepare a branded food item. The ingredient lists are sorted based on the amount of ingredients used in the preparation of an item as required by the FDA. An ingredient tree can be created in two ways: (a) with emphasis on capturing the main and sub-ingredients, similar to a recipe, as illustrated in Figure S17A; (b) with emphasis on the order of ingredients as a proxy for their amount in a final product, as illustrated in Figure S17B, where the distance from root, , reflects the amount of an individual ingredient relative to all ingredients. We opted for (b) to calculate IgFPro, as ranking the amount of an ingredient in a food is essential to quantify the contribution of individual ingredients to ultra-processing. In Eq. 1, ranks the amount of an ingredient in food , where captures the distance from the root (Figure S17B for an example). Finally, IgFPro shows remarkable variability when compared to the average FPro of products containing the selected ingredient (Figure S18), suggesting distinctive patterns of correlation between the products’ FPro and the ranking of ingredients in their ingredient lists [78].
Database Structure
The database comprises two main files, both stored in CSV format for ease of use and accessibility:
-
GroceryDB Foods File. This file contains comprehensive information about all the foods included in GroceryDB. Each row represents a distinct food item. This file includes the following columns:
name: The name of the food item, typically as it appears on the product packaging.
brand: The brand or manufacturer of the food item.
harmonized single category: The general category or type of food (e.g., seafood, cereal, etc.).
store: The retail store where the food item is available (e.g., Walmart, Target, Whole Foods).
f_FPro: Average FPro score of the food across the ensemble of classifiers. The FPro score is calculated using the FoodProX algorithm, taking into account the nutrition facts of the food.
f_FPro_P: a string indicating if the food has enough nutritional descriptors as detailed in Section S4.
f_min_FPro: Minimum FPro score across the ensemble of classifiers.
f_std_FPro: The standard deviation of the FPro score across the ensemble of classifiers.
f_FPro_class: expected NOVA class assigned according to FoodProX.
has10_nuts: boolean value indicating if the food is described by the 10 key nutrients described in Section S4.
is_Nuts_Converted_100g: Indicator if the food nutrients are converted per 100 grams.
nutritional information: Detailed nutritional information for the food item, including protein, total fat, carbohydrate, total sugars, total dietary fiber, calcium, iron, sodium, vitamin C, cholesterol, total saturated fatty acids, and total vitamin A.
Please note that the prices of the food items are not included in this public release due to potential restrictions on public disclosure. However, this information is available upon request. The file is available at https://github.com/Barabasi-Lab/GroceryDB/blob/main/data/GroceryDB_foods.csv.
-
GroceryDB IgFPro File. This file contains data related to the IgFPro score of the ingredients listed in GroceryDB. Each row corresponds to a specific ingredient. The file is available at https://github.com/Barabasi-Lab/GroceryDB/blob/main/data/GroceryDB_IgFPro.csv. The columns in this file are as follows:
ingredient_name: The standardized name of the ingredient.
count_of_products: The total number of products in the database that contain this ingredient.
ingredient_FPro: IgFPro calculated for the selected ingredient.
average_FPro_of_products: The average FPro score of the products containing the selected ingredient.
average_distance_to_root: The average distance of the ingredient from the root in the ingredient tree, representing its relative amount in the food item. Ingredients closer to the root contribute more significantly to the calculation of IgFPro.
ingredient_normalization_term: A numerical value used to normalize a food’s contribution to the IgFPro score, based on the ingredient’s overall ranking across all foods.
Substitution Algorithm at TrueFood.Tech
http://TrueFood.Tech/ provides food substitution recommendations aimed at gently nudging consumers towards less processed alternatives. To accomplish this, we first identify food items that belong to the same category and share partial semantic similarity with the targeted item (range 0.10–0.95), based on both food names and ingredient lists. This approach increases the diversity of displayed recommendations while ensuring they remain within the same category.
The popular term frequency–inverse document frequency (Tf–idf) algorithm is used to measure the significance of words to foods in GroceryDB, adjusting for commonality across entries [79]. The similarity between weighted word vectors is calculated leveraging cosine similarity. The final similarity between the queried food and other food items is determined by multiplying the ingredient-list-based similarity and the food-name-based similarity.
Next, the semantically filtered foods are sorted by their FPro scores, ranking the recommendations in ascending order of FPro. This method can identify the most similar food items with a lower FPro compared to the targeted item. Up to 50 items, listed in increasing order of FPro, are displayed on the website.
Supplementary Material
Acknowledgments
We thank Dwijay Shanbhag at Northeastern University for his help on data collection and cleaning. We thank Daria Koshkina for help in designing the figures. A.-L.B is partially supported by NIH grant 1P01HL132825, American Heart Association grant 151708, and ERC grant 810115-DYNASET. G.M. is supported by NIH/NHLBI K25HL173665 and AHA 24MERIT1185447.
Footnotes
Competing Interests
A.-L.B. is the founder of Scipher Medicine and Naring Health, companies that explore the use of network-based tools in health and food, and Datapolis, that focuses on urban data. All other authors declare no competing interests.
Code Availability
All code generated for the analysis are available at the BarabasiLab GitHub repository via https://github.com/Barabasi-Lab/GroceryDB/. The analysis was done in Python==3.11.7 with the packages: jupyter notebook==6.5.4, pymongo==4.8.0, pandas==2.1.4, numpy==1.26.4, seaborn==0.12.2, statsmodels==0.14.0, scipy==1.11.4, matlabplot==3.8.0, plotly==5.9.0, and certifi==2024.6.2.
Data Availability
The data in GroceryDB was scraped from Walmart, Target, and Whole Foods in 2021. GroceryDB is available to the public and consumers at http://TrueFood.Tech/. The data is also openly available on MongoDB servers with a read only key found at BarabasiLab GitHub repository via https://github.com/Barabasi-Lab/GroceryDB/. The USDA Food and Nutrient Database for Dietary Studies (FNDDS) dataset can be found at the same GitHub repository. Source data is published with this paper.
References
- [1].Seferidi P et al. The neglected environmental impacts of ultra-processed foods. The Lancet Planetary Health 4, e437–e438 (2020). [DOI] [PubMed] [Google Scholar]
- [2].Fardet A & Rock E Ultra-processed foods and food system sustainability: What are the links? Sustainability 12, 6280 (2020). [Google Scholar]
- [3].Macdiarmid JI The food system and climate change: are plant-based diets becoming unhealthy and less environmentally sustainable? Proceedings of the Nutrition Society 81, 162–167 (2022). [DOI] [PubMed] [Google Scholar]
- [4].Ambikapathi R et al. Global food systems transitions have enabled affordable diets but had less favourable outcomes for nutrition, environmental health, inclusion and equity. Nature Food 3, 764–779 (2022). [DOI] [PubMed] [Google Scholar]
- [5].Lane MM et al. Ultra-processed food exposure and adverse health outcomes: umbrella review of epidemiological meta-analyses. BMJ 384, e077310 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Lustig RH Processed Food—An Experiment That Failed. JAMA Pediatrics 171, 212–214 (2017). [DOI] [PubMed] [Google Scholar]
- [7].Milanlouei S et al. A systematic comprehensive longitudinal evaluation of dietary factors associated with acute myocardial infarction and fatal coronary heart disease. Nature Communications 11, 1–14 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Martínez Steele E, Popkin BM, Swinburn B & Monteiro CA The share of ultra-processed foods and the overall nutritional quality of diets in the us: evidence from a nationally representative cross-sectional study. Population Health Metrics 15, 6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Monteiro CA et al. NOVA. The star shines bright. World Nutrition 7, 28–38 (2016). [Google Scholar]
- [10].Steele EM et al. Ultra-processed foods and added sugars in the U.S. diet: Evidence from a nationally representative cross-sectional study. BMJ Open 6, e009892 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Steele EM & Monteiro CA Association between dietary share of ultra-processed foods and urinary concentrations of phytoestrogens in the US. Nutrients 9, 209 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Adjibade M et al. Prospective association between ultra-processed food consumption and incident depressive symptoms in the French NutriNet-Santé cohort. BMC Medicine 17, 1–13 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Fiolet T et al. Consumption of ultra-processed foods and cancer risk: Results from NutriNet-Santé prospective cohort. BMJ 360, k322 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Srour B et al. Ultra-processed food intake and risk of cardiovascular disease: Prospective cohort study (NutriNet-Santé). BMJ 365, l1451 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Hall KD, Ayuketah A, Brychta R, Cai H, Cassimatis T, Chen KY et al. Ultra-Processed Diets Cause Excess Calorie Intake and Weight Gain: An Inpatient Randomized Controlled Trial of Ad Libitum Food Intake. Cell Metabolism 30, 1–11 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Martínez Steele E, Khandpur N, da Costa Louzada ML & Monteiro CA Association between dietary contribution of ultra-processed foods and urinary concentrations of phthalates and bisphenol in a nationally representative sample of the US population aged 6 years and older. PLOS ONE 15, 1–21 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Nerín C, Aznar M & Carrizo D Food contamination during food process. Trends in Food Science & Technology 48, 63–68 (2016). [Google Scholar]
- [18].Rather IA, Koh WY, Paek WK & Lim J The sources of chemical contaminants in food and their health implications. Frontiers in Pharmacology 8, 830 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Arisseto AP Chapter 21 - furan in processed foods. In Kotzekidou P (ed.) Food Hygiene and Toxicology in Ready-to-Eat Foods, 383–396 (Academic Press, San Diego, 2016). [Google Scholar]
- [20].Buckley JP, Kim H, Wong E & Rebholz CM Ultra-processed food consumption and exposure to phthalates and bisphenols in the us national health and nutrition examination survey, 2013–2014. Environment International 131, 105057 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Mozaffarian D, Fleischhacker S & Andrés JR Prioritizing Nutrition Security in the US. JAMA 325, 1605–1606 (2021). [DOI] [PubMed] [Google Scholar]
- [22].Livings MS et al. Food and nutrition insecurity: Experiences that differ for some and independently predict diet-related disease, los angeles county, 2022. The Journal of Nutrition (2024). [DOI] [PubMed] [Google Scholar]
- [23].Food and nutrition security — usda. URL https://www.usda.gov/nutrition-security.
- [24].Volpp KG et al. Food is medicine: A presidential advisory from the american heart association. Circulation 148, 1417–1439 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Mozaffarian D, Andrés JR, Cousin E, Frist WH & Glickman DR The White House conference on hunger, nutrition and health is an opportunity for transformational change. Nature Food 3, 561–563 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Mozaffarian D, Rosenberg I & Uauy R History of modern nutrition science-implications for current research, dietary guidelines, and food policy. BMJ 361, k2392 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Sadler CR et al. Processed food classification: Conceptualisation and challenges. Trends in Food Science & Technology 112, 149–162 (2021). [Google Scholar]
- [28].Gibney MJ & Forde CG Nutrition research challenges for processed food and health. Nature Food 3, 104–109 (2022). [DOI] [PubMed] [Google Scholar]
- [29].Lacy-Nichols J & Freudenberg N Opportunities and limitations of the ultra-processed food framing. Nature Food 3, 975–977 (2022). [DOI] [PubMed] [Google Scholar]
- [30].Braesco V et al. Ultra-processed foods: how functional is the NOVA system? European Journal of Clinical Nutrition 76, 1245–1253 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Data crunch report: The impact of bad data on profits and customer service in the UK grocery industry. GS1 UK and Cranfield University School of Management. https://dspace.lib.cranfield.ac.uk/bitstream/handle/1826/4135/Data_crunch_report.pdf (2009). (accessed April 4, 2022). [Google Scholar]
- [32].THE 17 GOALS — Sustainable Development. URL https://sdgs.un.org/goals.
- [33].Methods and Standards — Food and Agriculture Organization of the United Nations. URL https://www.fao.org/statistics/methods-and-standards/en/.
- [34].Sarku R, Clemen UA & Clemen T The application of artificial intelligence models for food security: A review. Agriculture 13, 2037 (2023). [Google Scholar]
- [35].Hu G, Ahmed M & L’Abbé MR Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods. The American Journal of Clinical Nutrition 117, 553–563 (2023). [DOI] [PubMed] [Google Scholar]
- [36].AI for Good — Impact Initiative. URL https://aiforgood.itu.int/.
- [37].Menichetti G, Ravandi B, Mozaffarian D & Barabási A-L Machine learning prediction of the degree of food processing. Nature Communications 14, 2312 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Chen X et al. Consumption of ultra-processed foods and health outcomes: A systematic review of epidemiological studies Nutrition Journal 19, 86 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Mendoza K, Smith-Warner SA, Rossato SL, Khandpur N, Manson JE, Qi L et al. Ultra-processed foods and cardiovascular disease: analysis of three large US prospective cohorts and a systematic review and meta-analysis of prospective cohort studies The Lancet Regional Health - Americas 37, 100859 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Slimani N et al. Contribution of highly industrially processed foods to the nutrient intakes and patterns of middle-aged populations in the european prospective investigation into cancer and nutrition study. European Journal of Clinical Nutrition 63, S206–S225 (2009). [DOI] [PubMed] [Google Scholar]
- [41].Poti JM, Mendez MA, Ng SW & Popkin BM Is the degree of food processing and convenience linked with the nutritional quality of foods purchased by US households? American Journal of Clinical Nutrition 101, 1251–1262 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Davidou S, Christodoulou A, Fardet A & Frank K The holistico-reductionist siga classification according to the degree of food processing: an evaluation of ultra-processed foods in french supermarkets. Food & function 11, 2026–2039 (2020). [DOI] [PubMed] [Google Scholar]
- [43].Open Food Facts. https://world.openfoodfacts.org/discover. (accessed March 1, 2022).
- [44].U.S. population: Consumption of breakfast cereals (cold) from 2011 to 2024. https://www.statista.com/statistics/281995/us-households-consumption-of-breakfast-cereals-cold-trend/. 2021. (accessed February, 2022).
- [45].Bray GA, Nielsen SJ & Popkin BM Consumption of high-fructose corn syrup in beverages may play a role in the epidemic of obesity. The American Journal of Clinical Nutrition 79, 537–543 (2004). [DOI] [PubMed] [Google Scholar]
- [46].FDA Substances Added to Food. https://www.cfsanappsexternal.fda.gov/scripts/fdcc/?set=FoodSubstancesl. 2021. (accessed November 1, 2021).
- [47].FDA Substances Added to Food. https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=172. 2021. (accessed November 1, 2021).
- [48].Guidance for industry: Food labeling guide. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-industry-food-labeling-guide. 2021. (accessed Nov 1, 2021).
- [49].Igoe RS Dictionary of food ingredients (Springer Science & Business Media, 2011). [Google Scholar]
- [50].Goyal A, Sharma V, Upadhyay N, Gill S & Sihag M Flax and flaxseed oil: an ancient medicine & modern functional food. Journal of Food Science and Technology 51, 1633–1653 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Hashempour-Baltork F, Torbati M, Azadmard-Damirchi S & Savage GP Vegetable oil blending: A review of physicochemical, nutritional and health effects. Trends in Food Science & Technology 57, 52–58 (2016). [Google Scholar]
- [52].Whole Foods mission and values. https://www.WholeFoodsmarket.com/mission-values. (accessed March 1, 2022).
- [53].Walmart history. https://corporate.walmart.com/about/history. (accessed March 1, 2022).
- [54].Gupta S, Hawk T, Aggarwal A & Drewnowski A Characterizing ultra-processed foods by energy density, nutrient density, and cost. Frontiers in Nutrition 6 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Zenk SN, Tabak LA & Pérez-Stable EJ Research Opportunities to Address Nutrition Insecurity and Disparities. JAMA 327, 1953–1954 (2022). [DOI] [PubMed] [Google Scholar]
- [56].Venkataramani AS, O’Brien R, Whitehorn GL & Tsai AC Economic influences on population health in the United States: Toward policymaking driven by data and evidence. PLoS Medicine 17, e1003319 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Erndt-Marino J, O’Hearn M & Menichetti G An integrative analytical framework to identify healthy, impactful, and equitable foods: a case study on 100% orange juice. International Journal of Food Sciences and Nutrition 74, 668–684 (2023). [DOI] [PubMed] [Google Scholar]
- [58].Coletro HN et al. The combined consumption of fresh/minimally processed food and ultra-processed food on food insecurity: COVID Inconfidentes, a population-based survey. Public Health Nutrition 26, 1414–1423 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Hutchinson J & Tarasuk V The relationship between diet quality and the severity of household food insecurity in canada. Public Health Nutrition 25, 1013–1026 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Griffith R, Jenneson V, James J & Taylor A The impact of a tax on added sugar and salt. Tech. Rep., IFS Working Paper (2021). URL http://hdl.handle.net/10419/242920. [Google Scholar]
- [61].Mozaffarian D, Blanck HM, Garfield KM, Wassung A & Petersen R A Food is Medicine approach to achieve nutrition security and improve health. Nature Medicine 28, 2238–2240 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].The national food strategy: The plan. https://www.nationalfoodstrategy.org/ (2021). (accessed March 23, 2022).
- [63].True Cost of Food: Measuring What Matters to Transform the U.S. Food System - The Rockefeller Foundation. URL https://www.rockefellerfoundation.org/report/true-cost-of-food-measuring-what-matters-to-transform-the-u-s-food-system/.
- [64].Nasirian F & Menichetti G Molecular Interaction Networks and Cardiovascular Disease Risk: The Role of Food Bioactive Small Molecules. Arteriosclerosis, thrombosis, and vascular biology 43, 813–823 (2023). [DOI] [PubMed] [Google Scholar]
- [65].Adams J Rebalancing the marketing of healthier versus less healthy food products. PLoS Medicine 19, e1003956 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Shaw SC, Ntani G, Baird J & Vogel CA A systematic review of the influences of food store product placement on dietary-related outcomes. Nutrition Reviews 78, 1030–1045 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Shepherd R Resistance to changes in diet. Proceedings of the Nutrition Society 61, 267–272 (2002). [DOI] [PubMed] [Google Scholar]
- [68].Kelly MP & Barker M Why is changing health-related behaviour so difficult? Public Health 136, 109–116 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Barabási AL, Menichetti G & Loscalzo J The unmapped chemical complexity of our diet. Nature Food 1, 33–37 (2020). [Google Scholar]
- [70].Menichetti G, Barabasi A-L & Loscalzo J Decoding the Foodome: Molecular Networks Connecting Diet and Health. Annual Review of Nutrition 44, 257–288 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [71].Davidou S, Christodoulou A, Frank K & Fardet A A study of ultra-processing marker profiles in 22,028 packaged ultra-processed foods using the siga classification. Journal of Food Composition and Analysis 99, 103848 (2021). [Google Scholar]
- [72].Menichetti G & Barabási A-L Nutrient concentrations in food display universal behaviour. Nature Food 3, 375–382 (2022). [DOI] [PubMed] [Google Scholar]
- [73].Hooton F, Menichetti G & Barabási AL Exploring food contents in scientific literature with FoodMine. Scientific Reports 10, 16191 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Chatterjee A et al. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nature Communications 14, 1989 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Robust linear models - statsmodels 0.14.1. URL https://www.statsmodels.org/stable/rlm.html.
- [76].Huber PJ Robust regression: asymptotics, conjectures and monte carlo. The annals of statistics 1, 799–821 (1973). [Google Scholar]
- [77].Croux C & Rousseeuw PJ Time-efficient algorithms for two highly robust estimators of scale. In Computational statistics, 411–428 (Springer, 1992). [Google Scholar]
- [78].Brown GG & Rutemiller HC Means and Variances of Stochastic Vector Products with Applications to Random Linear Models. Management Science 24, 210–216 (1977). [Google Scholar]
- [79].Beel J, Gipp B, Langer S & Breitinger C Research-paper recommender systems: a literature survey. International Journal on Digital Libraries 17, 305–338 (2016). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data in GroceryDB was scraped from Walmart, Target, and Whole Foods in 2021. GroceryDB is available to the public and consumers at http://TrueFood.Tech/. The data is also openly available on MongoDB servers with a read only key found at BarabasiLab GitHub repository via https://github.com/Barabasi-Lab/GroceryDB/. The USDA Food and Nutrient Database for Dietary Studies (FNDDS) dataset can be found at the same GitHub repository. Source data is published with this paper.
