. 2021 Jul 23;9(7):e19905. doi: 10.2196/19905

Table 3.

Clustering performance on interval evaluation indexes based on various patient representations.

Representation	Parameters for training				True label		Cluster 1 patients, n		Cluster 2 patients, n		Evaluation indexes
	Corpus used	Shuffle	Window size							Precision		Recall	F1 score
Embedding-based	Full	Yes	5	IS^a		6495		340		0.938		0.950	0.944^b
Embedding-based	Full	Yes	5	HS^c		427		970		0.740		0.694	0.717^d
	Stroke	Yes	5	IS		6530		305		0.928		0.955	0.942
	Stroke	Yes	5	HS		506		891		0.745		0.638	0.687
	Full	No	5	IS		6587		248		0.855		0.964	0.906
	Full	No	5	HS		1117		280		0.530		0.200	0.291
	Stroke	No	5	IS		6472		363		0.903		0.947	0.924
	Stroke	No	5	HS		699		698		0.658		0.500	0.568
	Full	No	255	IS		6305		530		0.927		0.922	0.925
	Full	No	255	HS		493		904		0.630		0.647	0.639
	Stroke	No	224	IS		6378		457		0.932		0.933	0.932
	Stroke	No	224	HS		467		930		0.671		0.666	0.668
Multi-hot^e	N/A^f	N/A	N/A	IS		5874		961		0.938		0.859	0.897
	N/A	N/A	N/A	HS		388		1009		0.512		0.722	0.599
Mixture^g	N/A	N/A	N/A	IS		5945		890		0.957		0.870	0.911
	N/A	N/A	N/A	HS		269		1128		0.559		0.807	0.661

^aIS: ischemic stroke.

^bHighest F1 score for cluster 1.

^cHS: hemorrhagic stroke.

^dHighest F1 score for cluster 2.

^eMulti-hot: representation method of the combinations of one-hot codes.

^fN/A: not applicable.

^gMixture: representation method of the combination of multi-hot codes for discrete features and real numbers for continuous values of age and laboratory tests.