Skip to main content
. 2021 Jul 20;32(1):725–736. doi: 10.1007/s00330-021-08132-0

Fig. 5.

Fig. 5

Visualisation of word-level attention weights including representative examples of false positive and false negative misclassification. Darker colour represents a higher contribution to the report representation used by the model for report classification. In a (true positive classification), the model assigned high weighting to several words in the sentence describing a ‘focus of restricted diffusion…consistent with an acute infarct’. In b (true negative classification), the model assigned the highest weighting to the words ‘normal’, ‘Intracranial’, and ‘appearances’. In c (false positive), the highest weighting was assigned to words describing a ‘well defined lesion’ which ‘remains unchanged in size’. However, this report was marked by our team of neuroradiologists as normal due to the likelihood that it represents a prominent perivascular space, a finding which our team consider normal unless excessively large. In d (false negative), the highest weighting was assigned to several instances of the phrase ‘normal intracranial appearances’. This example highlights a case where the neuroradiologist who reported the original scan reasonably deemed a finding insignificant—and used language accordingly—whereas our labelling team, in order to be as sensitive as possible, marked this report as abnormal. These representative examples demonstrate how our labelling framework errs towards the safest clinical decision. Additional examples of erroneous classification are available in the supplemental material