Fig. 5. NLP-ML models can correctly classify tissue-associated biological processes and diseases based on their text descriptions.
A Model performances on a set of manually curated tissue-specific Gene Ontology biological process (GOBP) terms30. B Model performances on a set of manually curated tissue-specific Disease Ontology terms30. In both panels, The models are indicated along the y-axis with the number of annotated GOBP/DO terms in parentheses next to the tissue name. Performance is shown on the x-axis using the logarithm of the area under the precision-recall curve (auPRC) over the prior, where the prior is the fraction of all GOBP/DO terms specific to a particular tissue. This metric accounts for the variable number of annotated terms per tissue. Source data are provided as a Source Data file.