Skip to main content
. 2021 Mar 29;12:6. doi: 10.1186/s13326-021-00236-2

Fig. 1.

Fig. 1

Summary of the proposed de-identification approach. a Corpus creation, annotation and manual revision, further detailed in Fig. 2. b Selection of databases to develop a randomizer script. The script is used to create the synthetic corpus. c Training and testing of different neural networks to select the best performing model. d When a new report needs to be de-identified, the selected model labels the words that belong to one of the defined named entities. Finally, the randomizer script creates a de-identified report with synthetic information