We thank McSkimming et al. (1) for their comments on our paper (2), which provide an opportunity for us to expand the discussion of quantification of temporal instability of the human microbiome and the implications for large-scale prospective epidemiology studies.
In our original paper (2), we reported relatively low intraclass correlation coefficients (ICCs) for phylum-level relative abundance (RA) data from the Human Microbiome Project (HMP) (3, 4). McSkimming et al. reported much higher ICC values after centered log-ratio (CLR) transformation (5). Building upon this, we developed a data-generative model to investigate whether CLR improved ICC values only for the HMP data or consistently for other scenarios. To proceed, we assume that the underlying long-term RA composition vector for taxa is . The time-specific RA vector follows a Dirichlet distribution . Here, the parameter models the over-time instability, with a large value of implying smaller variability and thus a higher ICC. We used this model to simulate RA vectors for 1,000 subjects at 2 time points to evaluate ICC numerically. We found that naive RA estimates without transformation typically had the lowest ICC values and that CLR transformation consistently improved ICC for individual taxa, particularly for rare taxa. In Figure 1, using the naive estimate for uncommon taxa (average RA = 1.2%), the ICC is practically zero; log transformation modestly improves ICC; CLR improves ICC to 0.83. While it is mathematically very complicated, if not impossible, to explicitly derive a formula for ICC after CLR, numerical examination suggested that the high value of ICC after CLR was driven primarily by other, more common taxa that have relatively high ICC values.
The motivation for calculating ICC metrics is to evaluate the temporal-instability power loss to detect associations. More specifically, an ICC value is relevant only to the specific disease model. If the risk of developing a disease depends linearly on the RA of a taxon, the relevant ICC should be evaluated using the naive RA of the taxon. If the disease risk depends on log(RA), the relevant ICC should be evaluated using the log-transformed RA. The observed high ICC after CLR transformation in the HMP data would suggest a potentially small power loss due to over-time variability. However, effect size change due to the CLR transformation is another important factor that affects statistical power. CLR-based ICC might be overly optimistic for power estimates if effect-size change is ignored. Thus, careful investigation of transformation is warranted under different disease models by jointly considering ICC and effect sizes. Given that very few prospective studies have been performed to suggest an unambiguous disease model, we would recommend evaluating ICCs under different transformations that do or do not consider the compositional feature of the microbiome (i.e., log transformation, CLR (5), and isometric transformation (6), as was suggested by McSkimming et al. (1)). In conclusion, transformation of microbiome data should be compatible not only with compositional structure of the data but also with the postulated disease model.
ACKNOWLEDGMENTS
This work was funded by the Intramural Research Program of the National Cancer Institute.
Conflict of interest: none declared.
REFERENCES
- 1. McSkimming DI, Banack HR, Genco R, et al. . Re: “Quantification of human microbiome stability over 6 months: implications for epidemiologic studies”. Am J Epidemiol. 2019;188(4):808–809. [DOI] [PubMed] [Google Scholar]
- 2. Sinha R, Goedert JJ, Vogtmann E, et al. . Quantification of human microbiome stability over 6 months: implications for epidemiologic studies. Am J Epidemiol. 2018;187(6):1282–1290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Human Microbiome Project Consortium A framework for human microbiome research. Nature. 2012;486(7402):215–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Aitchison J. The Statistical Analysis of Compositional Data. London, United Kingdom: Chapman and Hall Ltd; 1986. [Google Scholar]
- 6. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, et al. . Isometric logratio transformations for compositional data analysis. Math Geol. 2003;35(3):279–300. [Google Scholar]