Skip to main content
. 2021 Jan 6;21:3. doi: 10.1186/s12911-020-01364-y

Fig. 1.

Fig. 1

Workflow of Genetic Information Extraction. (1) We extracted all sentences mentioning “BRCA1” and “BRCA2” gene from patients’ clinical notes using an NLP system, MedTagger. (2) We characterized the universality of extracted sentences using sf-ipf. The sf-ipf setting is similar to tf-idf commonly used in document-topic modeling. (3) We applied point-wise mutual information to automatically rank each word based on their inequality score and identified topic-indicating words. (4) We developed rules for automatic context inference in an iterative process by examining sentences with sf-ipf < 0.05 and a random sample of the rest of the sentences. (5) Manual evaluation was conducted by experts