Table 2.
Step | Description | Recommendation | Presentation | |
---|---|---|---|---|
Study Design & Data Set-up | Indicator Selection | The indicators selected will dictate the nature of the clusters. | - Select indicators based on research question. - Exclude indicators that are composite of other indicators in the model. - Exclude outcome data as indicators. |
- Present clear rationale for indicator selection. |
Data Processing | Transforming data to minimize extreme scales is more likely to yield informative classes. | - Categorical variables: Consider collapsing categories with less than 10% of the sample. - Non-parametric data should be transformed such that they are normally distributed and uniformly scaled. |
- Clearly describe the procedures used for data transformation and collapsing of categories. | |
Local Independence | Assumes that within class, observed indicators are independent. | - Test correlation of indicator variables in the complete dataset and within each class. - Consider removing one or more indicator if there is collinearity. - If a single pair is highly correlated consider relaxing the assumption.(31) |
- Present the correlation coefficients of the most highly correlated indicators. - Clearly describe any variables that were excluded from the analysis. |
|
Sample Size | Power sample to: 1. Determine the true number of classes. 2. Detect pertinent differences between the classes. |
- When N < 300 it is recommended to perform Monte Carlo simulation to determine adequacy of sample size.(38) - Standard power calculations should be performed to determine the sample size needed to detect significant inter-class differences. |
- Present clear rationale for the sample size and any power calculations performed. | |
Handling Missing Data | Approaches for missing data: 1. Full information maximum likelihood 2. Multiple Imputation |
- Full information maximum likelihood and multiple imputation are recommended methods of dealing with missing data. - Lower levels of research biomarker assay detection (LLD) impute either LLD, LLD/2, or multiple imputation.(44) |
- Present methods used for handling missing data. - Present differences in indicators and outcomes between missing and complete cases. - Sensitivity analysis with missing data / non-imputed data. |