Table 1. Simulation parameters representing a scope of clustering problems in clinical research.
Parameters were chosen to represent problems from clinical trials to retrospective electronic health record studies. Single and mixed data types were simulated from all combinations of population characteristics with multiple independent replications.
Population Characteristics |
Data Types | Replications |
---|---|---|
# patients | Single data types | 100 |
200, 800, 3200 | Continuous, binary, nominal, ordinal, | |
# features | categorical1 | |
9, 27, 81, 243 | Mixed data types | 30 |
# clusters | Balanced2, unbalanced continuous, | |
2, 6, 16 | unbalanced binary, unbalanced categorical3 |
A mixture of nominal and ordinal data.
Mixed data simulated with 33% each of binary, categorical, and continuous data.
Unbalanced mixtures are dominated by 78% of the listed data type.