Skip to main content
. 2020 Mar 10;15(3):e0229963. doi: 10.1371/journal.pone.0229963

Fig 6. Unsupervised report classification using fastText, ivis, and Gaussian Mixture Model clustering.

Fig 6

A. Two-dimensional ivis representation of 50-dimensional fastText embeddings of n = 495,179 unlabelled radiological reports from NHS GGC. Colour gradient reflects posterior probability of Normal and Abnormal report cluster. B. Scatterplot of predicted ivis embeddings for n = 3,715 expert-labelled reports in the internal testing set. Blue and red points represent manually-labelled Normal and Abnormal reports respectively. Colour gradient reflects contours of posterior probability distributions obtained from GMM model trained on two-dimensional ivis representations of n = 495,179 unlabelled radiological reports. C. Scatterplot of predicted ivis embeddings for n = 456 expert-labelled reports in the MIMIC-CXR testing set. Blue and red points represent manually-labelled Normal and Abnormal reports respectively. Colour gradient reflects contours of posterior probability distributions obtained from GMM model trained on two-dimensional ivis representations of n = 495,179 unlabelled radiological reports. D. ROC curves of unsupervised GMM classifier applied to 50-dimensional fastText embeddings of internal (n = 3,715) and external (n = 456) manually-labelled reports. E-F. ROC curves of unsupervised GMM classifier applied to two- and ten-dimensional ivis embeddings of manually labelled internal (n = 3,715) and external (n = 456) reports.