Skip to main content
. Author manuscript; available in PMC: 2025 Aug 25.
Published in final edited form as: Annu Rev Biomed Data Sci. 2025 Apr 1;8(1):251–274. doi: 10.1146/annurev-biodatasci-102224-074736

Figure 2.

Figure 2

Analysis of training corpora and domains. (a) Number of articles utilizing different types of training data. Note that percentages are calculated based on 82 articles; multiple corpora usage in individual papers means the total does not sum to 100%.

(b) Subcategorization of textual training data. Abbreviation: EHR, electronic health record.