Skip to main content
. Author manuscript; available in PMC: 2023 Nov 1.
Published in final edited form as: Int J Med Inform. 2022 Sep 16;167:104864. doi: 10.1016/j.ijmedinf.2022.104864
What is known? What does this add?
  • Many important clinical characteristics of patients are sequestered in unstructured clinical free-text progress notes.

  • In the field of ophthalmology, there has been little previous work directed towards extracting ophthalmology examination components from free-text progress notes, despite the importance of these findings for cohort identification and characterization.

  • Bidirectional Encoder Representations from Transformers (BERT) models have enabled a leap in performance in biomedical named entity recognition tasks.

  • However, training BERT models requires annotated corpora which are difficult to produce on a sufficiently large scale.

  • We develop the first deep learning models to identify findings from the ophthalmology exam documented in unstructured ophthalmology progress notes in electronic health records.

  • We leverage routinely captured data from the electronic health records to develop a weakly supervised approach that amasses a large training corpus with minimal noise and without laborious manual annotation.

  • Our BERT-based models outperformed a baseline regular-expression based model and also performed better on a manually annotated “ground truth” test set than against the weakly supervised labels.