Many important clinical characteristics of patients are sequestered in unstructured clinical free-text progress notes.
In the field of ophthalmology, there has been little previous work directed towards extracting ophthalmology examination components from free-text progress notes, despite the importance of these findings for cohort identification and characterization.
Bidirectional Encoder Representations from Transformers (BERT) models have enabled a leap in performance in biomedical named entity recognition tasks.
However, training BERT models requires annotated corpora which are difficult to produce on a sufficiently large scale.
|
We develop the first deep learning models to identify findings from the ophthalmology exam documented in unstructured ophthalmology progress notes in electronic health records.
We leverage routinely captured data from the electronic health records to develop a weakly supervised approach that amasses a large training corpus with minimal noise and without laborious manual annotation.
Our BERT-based models outperformed a baseline regular-expression based model and also performed better on a manually annotated “ground truth” test set than against the weakly supervised labels.
|