Over the last century, nutritional science uncovered that proper nutrition is a lot more nuanced than the total amount of calories a person consumes per day. This understanding permeated popular culture, as evident, for example, by the huge supplement market (>290 billion USD/year) (1). Revealing the connection between nutrition and health is the challenging realm of nutritional epidemiology, and in this issue of PNAS, Petrone et al. (2) provide strong support for the utility of a new and potentially transformative tool in the search for links between diet and health—sequencing plant DNA from stool samples.
Humans typically consume a diverse diet that can change drastically and quickly, and it is easy to imagine how hard it is to infer causal links between what and how much a person eats and their health status, even if given perfect information about a person’s diet. On top of this complexity, researchers must deal with incomplete information because the main tool used in the field is based on self-reporting, which suffers strong biases. For example, it is known that there is a systematic underreporting of total dietary intake (3). Recognizing these problems led researchers to look for alternative, less subjective sources of data such as metabolic biomarkers collected from urine or stool samples (4).
Finding good nutritional biomarkers is a hard problem. Humans consume a large and diverse array of dietary items that can vary strongly with geographical location and dietary preferences. A good biomarker should capture this diversity with high accuracy and specificity in a quantitative manner and in a minimally intrusive way. Establishing a statistically significant relationship between nutrition and health is difficult because of the high variability in individual diet and the possible influence of confounding covariates. Nutritional epidemiological studies therefore analyze large number of individuals to gain statistical power, and require methods that can be applied at such large scales. Just imagine the logistical challenge of collecting blood several times a month or analyzing questionaries from a cohort of >10,000 people. Finally, a good biomarker should be easily harmonized between different studies to unlock the full statistical power of meta-analysis. It turns out that plant-derived DNA found in human stool ticks all the boxes to be a useful nutritional biomarker (Fig. 1).
Fig. 1.

Inferring the diversity of plant items consumed in the past using a DNA-based crystal ball. Plant items consumed 12 to 48 h before stool samples are collected are represented in stool-derived DNA. The crystal ball looks into the past by converting highly variable DNA samples into consumed plant diversity measures. DNA found in human feces contains mostly nonplant DNA, but in this study, it is shown that “fishing out” the trnL gene using a molecular technique called PCR and then sequencing the amplified DNA are sufficient to infer consumed plant diversity measures that correlate well with “gold standard” nutritional epidemiology questionnaire-based data.
What makes plant-derived DNA, an admittedly surprising source of nutritional information, so promising? First, it is cheap to obtain in a noninvasive way. Isolating DNA from stool is common practice in clinical settings, and the cost of sequencing has dropped dramatically in the last decade, far outpacing Moore’s law. Second, DNA sequences are both universal and highly specific. For example, the trnL gene used by Petrone et al. as a “molecular barcode” is found in all plants, and the specific sequence of nucleotides in this gene can accurately identify the plant species it came from. DNA sequences therefore have the potential to inform a researcher about a large diversity of food sources at a very high resolution.
In PNAS, Petrone et al. provide strong support for the utility of a new and potentially transformative tool in the search for links between diet and health—sequencing plant DNA from stool samples.
There are, however, possible limitations to the use of DNA content in stool. First, human stool contains a complicated community of microorganisms, and plant-derived sequences represent a small minority or total DNA, requiring clever methods to specifically study their variation. Even after contaminating DNA is dealt with, it may be that processing plants, for instance, by cooking them or making chocolate out of them, before consuming them can alter their representation in the stool DNA. If cooking processing DNA, the DNA content in your stool can vary whether you ate a cooked or uncooked carrot. Additionally, different physiological conditions and feeding patterns may alter the way plants are represented in the stool DNA pool, making comparison between studies challenging, and limit the promise of data harmonization.
In their current study, Petrone et al. addressed all these issues directly, providing strong support for the utility of trnL metabarcoding in nutritional studies. Using data from dietary two intervention studies where food intake is tightly controlled and well documented, the authors were able to match a high fraction of foods consumed 1 to 2 d prior to sampling from stool-derived DNA. The authors conjured up a computational crystal ball, clearing the clouds obscuring our vision into the past to reveal what study participants ate.
Although these studies exposed a quantitative relationship between the representation of plant DNA in feces and consumed volumes, there was high variability in this correlation between plant species. Importantly, this variability was conserved for the different plant species between the two studies, indicating that factors intrinsic to each plant such as digestibility mediate the differences between plant species and not extrinsic factors such as feeding regime or physiology. The authors therefore chose a summary statistic, the diversity of plant sequences found in stool, which is also correlated with the diversity of consumed plant items, as their summary metric for downstream analyses. The diversity of consumed foods is a highly useful metric in nutritional epidemiology linked to many health outcomes (5) and, importantly, is not affected by the systematic differences in the detection of specific plant species. This approach allowed the study of a third and large cohort of individuals consuming their regular diets where questionnaire data were unavailable, recapitulating previous findings linking diet diversity to age and household income. Analyzing dietary plant consumption diversity from stool is therefore an accurate drop-in replacement for questionnaire-based assessment, that is highly scalable and harmonizable across studies.
It is an exciting time for sequence-based approaches. Better databases, improved computational tools, and low sequencing costs all contribute to the adoption of sequence-based analysis in diverse and unexpected fields of science, health, and agriculture. Examples range from following the abundance of specific COVID lineages in wastewater, to predicting ecosystem-level processes, and figuring out what a person ate in the past. Elegant studies like the one by Petrone et al. make the point that exciting new avenues are on the horizon abundantly clear.
Acknowledgments
Author contributions
S.P. wrote the paper.
Competing interests
The author declares no competing interest.
Footnotes
See companion article, “Diversity of plant DNA in stool is linked to dietary quality, age, and household income,” 10.1073/pnas.2304441120.
References
- 1.Grand View Research, “Global nutraceuticals market size & trends [2023 Report]” (12 June 2023). https://www.bloomberg.com/press-releases/2022-03-15/nutraceuticals-market-size-worth-991-09-billion-by-2030-grand-view-research-inc.
- 2.Petrone B. L., et al. , Diversity of plant DNA in stool is linked to dietary quality, age, and household income. Proc. Natl. Acad. Sci. U.S.A. 120, e2304441120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Subar A. F., et al. , Addressing current criticism regarding the value of self-report dietary data. J. Nutr. 145, 2639–2645 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hedrick V. E., et al. , Dietary biomarkers: Advances, limitations and future directions. Nutr. J. 11, 109 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Herforth A., et al. , A global review of food-based dietary guidelines. Adv. Nutr. 10, 590–605 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
