Skip to main content
. 2021 Jul 23;9(7):e19905. doi: 10.2196/19905

Table 3.

Clustering performance on interval evaluation indexes based on various patient representations.

Representation Parameters for training True label Cluster 1 patients, n Cluster 2 patients, n Evaluation indexes

Corpus used Shuffle Window size


Precision Recall F1 score
Embedding-based Full Yes 5 ISa 6495 340 0.938 0.950 0.944b
HSc 427 970 0.740 0.694 0.717d

Stroke Yes 5 IS 6530 305 0.928 0.955 0.942
HS 506 891 0.745 0.638 0.687

Full No 5 IS 6587 248 0.855 0.964 0.906
HS 1117 280 0.530 0.200 0.291

Stroke No 5 IS 6472 363 0.903 0.947 0.924
HS 699 698 0.658 0.500 0.568

Full No 255 IS 6305 530 0.927 0.922 0.925
HS 493 904 0.630 0.647 0.639

Stroke No 224 IS 6378 457 0.932 0.933 0.932
HS 467 930 0.671 0.666 0.668
Multi-hote N/Af N/A N/A IS 5874 961 0.938 0.859 0.897

N/A N/A N/A HS 388 1009 0.512 0.722 0.599
Mixtureg N/A N/A N/A IS 5945 890 0.957 0.870 0.911

N/A N/A N/A HS 269 1128 0.559 0.807 0.661

aIS: ischemic stroke.

bHighest F1 score for cluster 1.

cHS: hemorrhagic stroke.

dHighest F1 score for cluster 2.

eMulti-hot: representation method of the combinations of one-hot codes.

fN/A: not applicable.

gMixture: representation method of the combination of multi-hot codes for discrete features and real numbers for continuous values of age and laboratory tests.