Skip to main content
. 2020 Apr 15;11:268. doi: 10.3389/fpsyt.2020.00268

Table 2.

Summary of annotated electronic health records documents used to train the named entity recognition model.

Variable Number of annotated text spans
Phase 1 Phase 2
History of violence 391 350
History of self-harm 559 397
Formal education 174 200
Medication 1774 3860
Benefits recipient 188 195
Drug/alcohol use disorder 190 130
(Parental) suicide 19 77
Psychiatric admission 332 260
Total: 3,627 5,469

Text spans are words or word combinations that refer to the concept of interest (the variable), as selected by the manual annotator. The model was trained in two phases: first using GATE software and second using Prodigy—an active learning-based annotation tool. The annotated documents shown in this table constituted the “gold-standard” training dataset used in model development. EHR, electronic health record.