. 2020 Apr 15;11:268. doi: 10.3389/fpsyt.2020.00268

Table 2.

Summary of annotated electronic health records documents used to train the named entity recognition model.

Variable	Number of annotated text spans
	Phase 1	Phase 2
History of violence	391	350
History of self-harm	559	397
Formal education	174	200
Medication	1774	3860
Benefits recipient	188	195
Drug/alcohol use disorder	190	130
(Parental) suicide	19	77
Psychiatric admission	332	260
Total:	3,627	5,469

Text spans are words or word combinations that refer to the concept of interest (the variable), as selected by the manual annotator. The model was trained in two phases: first using GATE software and second using Prodigy—an active learning-based annotation tool. The annotated documents shown in this table constituted the “gold-standard” training dataset used in model development. EHR, electronic health record.