Skip to main content
. Author manuscript; available in PMC: 2022 Jun 1.
Published in final edited form as: J Biomed Inform. 2021 Apr 20;118:103788. doi: 10.1016/j.jbi.2021.103788

Table 1. Simulation parameters representing a scope of clustering problems in clinical research.

Parameters were chosen to represent problems from clinical trials to retrospective electronic health record studies. Single and mixed data types were simulated from all combinations of population characteristics with multiple independent replications.

Population
Characteristics
Data Types Replications
# patients Single data types 100
 200, 800, 3200 Continuous, binary, nominal, ordinal,
# features categorical1
9, 27, 81, 243 Mixed data types 30
# clusters Balanced2, unbalanced continuous,
 2, 6, 16 unbalanced binary, unbalanced categorical3
1

A mixture of nominal and ordinal data.

2

Mixed data simulated with 33% each of binary, categorical, and continuous data.

3

Unbalanced mixtures are dominated by 78% of the listed data type.