Table 3.
Clustering performance on interval evaluation indexes based on various patient representations.
| Representation | Parameters for training | True label | Cluster 1 patients, n | Cluster 2 patients, n | Evaluation indexes | |||||||||
|
|
Corpus used | Shuffle | Window size |
|
|
|
Precision | Recall | F1 score |
|
||||
| Embedding-based | Full | Yes | 5 | ISa | 6495 | 340 | 0.938 | 0.950 | 0.944b |
|
||||
| HSc | 427 | 970 | 0.740 | 0.694 | 0.717d |
|
||||||||
|
|
Stroke | Yes | 5 | IS | 6530 | 305 | 0.928 | 0.955 | 0.942 |
|
||||
| HS | 506 | 891 | 0.745 | 0.638 | 0.687 |
|
||||||||
|
|
Full | No | 5 | IS | 6587 | 248 | 0.855 | 0.964 | 0.906 |
|
||||
| HS | 1117 | 280 | 0.530 | 0.200 | 0.291 |
|
||||||||
|
|
Stroke | No | 5 | IS | 6472 | 363 | 0.903 | 0.947 | 0.924 |
|
||||
| HS | 699 | 698 | 0.658 | 0.500 | 0.568 |
|
||||||||
|
|
Full | No | 255 | IS | 6305 | 530 | 0.927 | 0.922 | 0.925 |
|
||||
| HS | 493 | 904 | 0.630 | 0.647 | 0.639 |
|
||||||||
|
|
Stroke | No | 224 | IS | 6378 | 457 | 0.932 | 0.933 | 0.932 |
|
||||
| HS | 467 | 930 | 0.671 | 0.666 | 0.668 |
|
||||||||
| Multi-hote | N/Af | N/A | N/A | IS | 5874 | 961 | 0.938 | 0.859 | 0.897 |
|
||||
|
|
N/A | N/A | N/A | HS | 388 | 1009 | 0.512 | 0.722 | 0.599 |
|
||||
| Mixtureg | N/A | N/A | N/A | IS | 5945 | 890 | 0.957 | 0.870 | 0.911 |
|
||||
|
|
N/A | N/A | N/A | HS | 269 | 1128 | 0.559 | 0.807 | 0.661 |
|
||||
aIS: ischemic stroke.
bHighest F1 score for cluster 1.
cHS: hemorrhagic stroke.
dHighest F1 score for cluster 2.
eMulti-hot: representation method of the combinations of one-hot codes.
fN/A: not applicable.
gMixture: representation method of the combination of multi-hot codes for discrete features and real numbers for continuous values of age and laboratory tests.