It is generally recognized that obesity is a heterogeneous state, where individuals with obesity can have different underlying body compositions and, subsequently, differential health risks. Understanding the independent contributions of tissue types (adipose tissues, muscles, bones, etc.) to specific disease processes can provide etiologic insight and can better tailor interventions; however, the current body composition analytic paradigm ignores the compositional nature of body composition data. This Perspective highlights the need for viewing body composition through a compositional data analytic (CoDA) lens.
What are compositional data?
Compositions are data consisting of two or more components which sum up to a whole. These data are strictly positive, with each component value being restricted between 0 (not present in the composition) and the total composition size (composition comprised entirely of only that component). Compositional data can also be rescaled as proportions of the total compositional size, such that they consist of dimensionless values ranging from 0-100%. Because compositional data are assessed with respect to some total, they convey only relative information (1, 2). Examples of types of data in the health sciences which are currently being looked at as compositional include high throughput sequencing output (3), physical activity/time allocation (4), dietary intake (5), and relevant to this Perspective, body composition (6, 7).
That compositional data is constrained to some total results in a unique relationship among components. For a composition comprised of D components, knowledge of the first D-1 components provides information on the final Dth component. Additionally, you cannot change the value of one component without influencing the values of at least one other component. This inherent negative correlational structure among components defines their sample space, which in mathematical terms is called a simplex.
Why does compositionality matter?
The statistical models we use should be appropriate to the sample space of our data. Many often-used statistics rely on data being in a Euclidean sample space; for example, ordinary least squares (OLS) regression fits the line with the smallest Euclidean distances to the data. Importantly, the simplex sample space is not a Euclidean sample space, instead comprising an entirely different geometry where a Euclidean distance does not make sense. This is not a statistical modeling assumption which can be acceptably violated or adjusted away. The application of Euclidean distance-based statistical methods to data in a simplex sample space can lead to numerous problems, with a key issue being the generation of spurious correlations (8). Thus, statistical approaches such as OLS regression should not be used on data with a simplex sample space (i.e. compositional data).
However, there exist methods which can transform data from the simplex space to a Euclidean space, thus removing the correlational simplex structure and allowing us to apply traditional statistical methodologies to compositional predictors and outcomes. The theoretical concepts and methods for compositional data analysis (CoDA) transformations were developed by Aitchison (1) in the 1980s, and have since been expanded upon (9). These transformations are explained elsewhere in greater detail (1, 2, 9); however, the heart of all CoDA transformations is in their use of a logratio transformation. These transformations capture the relative relationships of compositional data by using ratios of characteristics of the composition (such as ratios of the components to each other, or of components to the composition’s geometric mean); and because ratios can lead to spurious correlations (8), a log transformation (which provides better mathematical properties) is applied to each ratio, resulting in data that also now reside in a D-1 dimensional Euclidean sample space.
Body composition data should be thought of in compositional terms and analyzed compositionally
Even though the name itself evokes compositionality, several reasons suggest CoDA methods may be appropriate for body composition analyses. First, partial body imaging (such as single-slice scans) results in incomplete estimations of total tissue amounts and are impacted by scanned location, scanner size, and slice number and thickness. The physical constraints imposed by the measuring instrument, the choice of scanning location, and the resulting partial tissue measures argue against treating the “absolute” (measured) tissue sizes as representative of their respective totals and instead as percentages of the scanned region.
Additionally, compositionality is an inevitable consequence of current modeling approaches. We explicitly assume compositionality whenever we utilize ratios, such as the percentage of lean tissue in a body region, or the ratio of visceral to subcutaneous adipose tissue – these are relative approaches that use some “total” or compositional characteristic as a reference. Compositionality also arises implicitly in modeling when controlling for body size metrics (weight and height, BMI, waist circumference, etc.). Because tissues have a size and mass that contribute to body size metrics, if we wish to hold body size constant in an analysis, then a unit increase in one tissue type must come at the expense of at least one other tissue type. In essence, we are constraining the data by controlling for body size, and thus are modeling the data as if they were in a simplex. The choice, then, is either to not control for body size metrics and use the absolute (measured) tissue sizes as independent variables, or to control for body size and treat the tissues as compositional multivariate data.
Consequences of ignoring compositionality in body composition data
CoDA requires a reframing of body composition problems and a different interpretation of results which may be uncommon to many standard body composition analysts. One may wonder, then, if there are any practical benefits to applying CoDA methodology beyond adherence to some theoretical mathematical properties. We provide in the supplementary materials a simulation study demonstrating that non-CoDA approaches to a 3-component body composition analysis results in biased estimates, and consequently, the non-CoDA methods have inflated type 1 error rates that intensify with increasing sample sizes (Figure). Thus, ignoring the compositional nature of body composition data can lead to spurious and biased findings.
Figure:

Type 1 error rate for a non-predictive component using different body composition models
Applications of CoDA in body composition
For examples of applications of CoDA specifically in body composition, we direct readers to the works of Dumuid et al who provide excellent examples of body composition as a compositional outcome (6) and as a compositional predictor (7). Further, Dumuid et al (7) provides extensive background and guided application of applying CoDA transformations, interpreting results, and visualizing compositional data. This work also contains unique descriptions and applications of CoDA to body composition analyses, discussed below.
Incorporating absolute amounts
As CoDA focuses on relative associations, the absolute amounts of tissue(s) are no longer considered. However, it may be the case that absolute compositional size is still relevant to understanding health risks. CoDA with a total (tCoDA) (10) allows for the investigation of absolute composition size independent of the relative tissue amounts.
Isocompartmental substitution
One can also assess the theoretical impact of component reallocations on health outcomes. In non-CoDA applications, these include the isotemporal substitution in time-use epidemiology and the isocaloric substitution in nutritional epidemiology. These methods were recently adapted for compositional approaches to time-use (4), and for body composition (“isocompartmental substitution”) analyses (7). Interestingly, CoDA iso-substitution models can allow for asymmetry in effects depending on the direction of reallocation (4, 7), which may provide unique insights unobtainable in non-CoDA iso-substitution models.
A call for compositional approaches in obesity/body composition research
Compositional data, which are multivariate and constrained, behave differently than standard univariate data; thus, they require different statistical treatment. Body composition data are compositional, and ignoring this property in our statistical analyses is likely to result in misleading findings and contribute to inconsistencies across studies. Furthermore, interpretations in non-compositional approaches do not account for the codependence of tissues and thus are not as meaningful as they first appear. Applying CoDA methods in the body composition field will require both a conceptual shift and reframing of body composition problems. Still, this paradigmatic change in approach is necessary for the identification of true associations and meaningful interpretations.
Supplementary Material
FUNDING:
Dr. Tilves was supported by an NHBLI grant 2T32HL083825-11 to the University of Pittsburgh. Dr. Miljkovic was supported by National Institutes of Health grant R01-DK097084 (PI: Miljkovic) from the National Institute of Diabetes and Digestive and Kidney Diseases.
Footnotes
DISCLOSURE: The authors declared no conflict of interest
References
- 1.Aitchison J The Statistical Analysis of Compositional Data. Journal of the Royal Statistical Society Series B (Methodological). 1982;44(2):139–77. [Google Scholar]
- 2.Egozcue JJ, Pawlowsky-Glahn V. Basic Concepts and Procedures. Compositional Data Analysis. 2011:12–28. doi: doi: 10.1002/9781119976462.ch2. [DOI] [Google Scholar]
- 3.Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology. 2017;8(2224). doi: 10.3389/fmicb.2017.02224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chastin SFM, Palarea-Albaladejo J, Dontje ML, Skelton DA. Combined Effects of Time Spent in Physical Activity, Sedentary Behaviors and Sleep on Obesity and Cardio-Metabolic Health Markers: A Novel Compositional Data Analysis Approach. PLoS One. 2015;10(10):e0139984-e. doi: 10.1371/journal.pone.0139984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Leite ML. Applying compositional data methodology to nutritional epidemiology. Statistical methods in medical research. 2016;25(6):3057–65. Epub 2014/11/21. doi: 10.1177/0962280214560047. [DOI] [PubMed] [Google Scholar]
- 6.Dumuid D, Wake M, Clifford S, Burgner D, Carlin JB, Mensah FK, et al. The Association of the Body Composition of Children with 24-Hour Activity Composition. The Journal of pediatrics. 2019;208:43–9.e9. Epub 2019/02/02. doi: 10.1016/j.jpeds.2018.12.030. [DOI] [PubMed] [Google Scholar]
- 7.Dumuid D, Martín-Fernández JA, Ellul S, Kenett RS, Wake M, Simm P, et al. Analysing body composition as compositional data: An exploration of the relationship between body composition, body mass and bone strength. Statistical methods in medical research. 2020:962280220955221. Epub 2020/09/18. doi: 10.1177/0962280220955221. [DOI] [PubMed] [Google Scholar]
- 8.Pearson K Mathematical Contributions to the Theory of Evolution.--On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs. Proceedings of the Royal Society of London. 1896;60:489–98. [Google Scholar]
- 9.Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology. 2003;35(3):279–300. doi: 10.1023/A:1023818214614. [DOI] [Google Scholar]
- 10.Pawlowsky-Glahn V, Egozcue JJ, Lovell D. Tools for compositional data with a total. Statistical Modelling. 2014;15(2):175–90. doi: 10.1177/1471082X14535526. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
