Figure 3. KNN classifier for the automatic labelling of data.
(a) Flowchart depicting development of the classifier (top) and its use to classify data (bottom) (see also Supplementary Methods). (b) Examples of movie frames used to hand-score behaviour for training the classifier. (c) Contingency matrix of the hand-scored movie-frame data sets from the two researchers. Shade of grey indicates the abundances of (dis)agreement between the two researchers. Colours here and in e and f indicate behaviour types. Black numbers at right and bottom, respectively, indicate the false-negative and false-positive rates (%) for each behavioural category with respect to the JK manual scoring. Grey numbers indicate the total number of frames manually annotated with each behaviour. (d) Including higher-order features with the raw data increases the classifier’s accuracy (accuracy was measured as the total fraction of frames scored correctly, that is, the unbalanced accuracy; see Supplementary Methods). Behaviours such as forward/backward running and complex motion are qualitatively contiguous; the ‘plausible’ accuracy metric ignores errors between such pairs of behaviours (see text). Cross-validated error bars are ±s.e.m. calculated across n=5 flies. (e) Contingency matrix of the JK hand-scored data set to the classifier trained on the JK-scored data set. Numbers as in c. (f) Sequences of behaviour scores from both JK and BD manual data sets and from the classifier for the same 120-s window (top). Magnification of a ~8-sec subset along with the raw data used by the classifier (bottom).