Skip to main content
. 2021 Apr 25;60(8):4115–4130. doi: 10.1007/s00394-021-02545-9

Table 2.

Selected complementary exploratory dietary pattern analysis approaches

Exploratory dietary pattern approach Features/aims Peculiarity Strengths Limitations References
Treelet transform analysis (TT) TT is a cross-method of principal component analysis and hierarchical clustering analysis

1. Forces sparse DPs

2. Produces cluster tree that reflects local dependency relationships of the dietary variables as well as a coordinate system for the dietary data at each level of the cluster tree

3. Cluster tree level (cut-level) that must be chosen influences both the sparsity and the composition of DPs

4. Implemented as a Mata function in STATA statistical software

1. Uses the correlation or covariance matrix of dietary intake to derive DPs

2. TT procedure enables the selection of the number of DPs and a cut-level by a data-driven (cross-validation) method

3. Several functions to aid in model selection and output analysis

4. Stability assessment analysis by subsampling approach

5. Sparsity is optimal because a dietary variable is loaded on only one DP

6. DPs are potentially simpler to interpret

7. The latent variable is numerical so DP scores (continuous values) can be assigned to study participants

DPs are not independent [17, 49, 50]
Sparse latent factor models (SLFM) SLFM provide parsimonious relations between high-dimensional variables and latent factors by forcing less influential associations to have a zero association in the model

1. Uses the posterior inclusion probability obtained from the sparse latent factor analysis to determine food and DP pairs that have non-zero associations

2. A model-based solution for dealing with missing dietary data

3. Sparsity of DPs: sparsity-inducing priors is an integral part of SLFM

4. Implemented in software for Bayesian factor regression models

1. Interpretability of DPs is achieved by its sparsity and inclusion threshold of dietary variables showing a significant and a posterior inclusion probability > 0.95 for a given DP

2. Several approaches to infuse biological or non-dietary variables

Sparsity is not optimal because a dietary intake variable may be loaded on one or a few DPs [13, 51]
Gaussian graphical models (GGM) GGM identifies conditional independence structure in dietary variables by assessing pairwise correlation between two variables, while controlling for effects of the other variables in the network model

1. Selects a model on rank-based dietary variable

2. Estimates a single DP as a unique solution for the estimated model

3. Sparsity of DPs depends on a regularization parameter that can be derived using different criteria

4. Implemented in R statistical software

1. It can also be used for ordinal data or data comprising a mixture of categorical and continuous dietary variables

2. Stability assessment analysis

Requires Gaussian-distributed data

The latent variable is categorical so DP scores (continuous values) cannot be assigned to study participants

Spurious relationships may affect network interpretation

A single DP may not fully exploit the multivariate dietary data sets

[18, 53, 54]
Random forest with classification tree analysis (RF-CTA) RF-CTA belongs to the family of classification trees. This approach generates many decision trees that are constructed using different sets of randomly selected dietary variables

1. Intuitive approach: easy to implement and understand

2. Produces decision trees that can be visualized

3. Implemented in R and other statistical software

1. It is a highly accurate classifier which produces an internal unbiased estimation of the generalization error

2. The model is robust to outliers

3. No standardization or scaling required prior to the analysis

4. Ability to handle missing values

The latent variable is categorical so DP scores (continuous values) cannot be assigned to study participants

Less transparent

Highly data dependent

[5557]