Skip to main content
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: Crit Care Med. 2021 Jan 1;49(1):e63–e79. doi: 10.1097/CCM.0000000000004710

Table 2.

Summary of key steps and recommendations when setting up the data to perform Latent class analysis.

Step Description Recommendation Presentation
Study Design & Data Set-up Indicator Selection The indicators selected will dictate the nature of the clusters. - Select indicators based on research question.
- Exclude indicators that are composite of other indicators in the model.
- Exclude outcome data as indicators.
- Present clear rationale for indicator selection.
Data Processing Transforming data to minimize extreme scales is more likely to yield informative classes. - Categorical variables: Consider collapsing categories with less than 10% of the sample.
- Non-parametric data should be transformed such that they are normally distributed and uniformly scaled.
- Clearly describe the procedures used for data transformation and collapsing of categories.
Local Independence Assumes that within class, observed indicators are independent. - Test correlation of indicator variables in the complete dataset and within each class.
- Consider removing one or more indicator if there is collinearity.
- If a single pair is highly correlated consider relaxing the assumption.(31)
- Present the correlation coefficients of the most highly correlated indicators.
- Clearly describe any variables that were excluded from the analysis.
Sample Size Power sample to:
1. Determine the true number of classes.
2. Detect pertinent differences between the classes.
- When N < 300 it is recommended to perform Monte Carlo simulation to determine adequacy of sample size.(38)
- Standard power calculations should be performed to determine the sample size needed to detect significant inter-class differences.
- Present clear rationale for the sample size and any power calculations performed.
Handling Missing Data Approaches for missing data:

1. Full information maximum likelihood
2. Multiple Imputation
- Full information maximum likelihood and multiple imputation are recommended methods of dealing with missing data.


- Lower levels of research biomarker assay detection (LLD) impute either LLD, LLD/2, or multiple imputation.(44)
- Present methods used for handling missing data.
- Present differences in indicators and outcomes between missing and complete cases.
- Sensitivity analysis with missing data / non-imputed data.