Skip to main content
. Author manuscript; available in PMC: 2021 Nov 1.
Published in final edited form as: Artif Intell Med. 2020 Nov 1;110:101977. doi: 10.1016/j.artmed.2020.101977

Figure 1:

Figure 1:

Diagram of the workflow. Processing steps are in the diamond boxes; narratives, concepts, and features are in the rectangular boxes. Two major types of configurations are employed in this study, conventional machine learning classifiers and knowledge-guided convolutional neural network (K-CNN). Features are built from free-texted progress notes and pathology reports, as well as structured clinical data. Word vectors and Unified Medical Language System (UMLS) Concept Unique Identifier (CUI) are generated from clinical notes using natural language processing (NLP) techniques. Based on the previous knowledge, a subset of disease-re I a ted CUIs is extracted. Different combinations of word vectors, CUIs, a subset of CUIs, and structured clinical data are fed into various machine learning classifiers for distant recurrence prediction. On the other hand, we generate word embedding and CUI embedding using pre-trained embedding dictionaries. The embedding integrated with structured clinical data is utilized for training and evaluating the K-CNN configuration on breast cancer distant recurrence prediction.