Skip to main content
. Author manuscript; available in PMC: 2011 Aug 10.
Published in final edited form as: J Biomed Inform. 2009 Mar 24;42(5):824–830. doi: 10.1016/j.jbi.2009.03.009

Figure 5.

Figure 5

Separation of terms for developing Clinical Trial ontology and identifying irrelevant ontology terms randomly selected from Gene Ontology. A) The Precision/Recall curves for three sets of data; each set contains the terms assembled by experts for developing the Clinical Trial ontology, as well as randomly selected ontology terms from GO. The overrepresentation of each data set was computed under the domain terms “Clinical Trial” or “Clinical-Trial”. B) p-value from three tests in A) where highest F-measures were obtained. C) Terms developed by experts are evaluated using the p-value=0.0744 as a threshold for enrichment test. The list of terms and their p-values are in the supplemental Table 1. About 22% of the terms are not overrepresented in the abstracts that have the term “Clinical Trial” or “Clinical-Trial”. D) Out of 316 terms developed by experts, there are 136 terms in the glossary of CDISC. For these 136 terms, 87% (119) are evaluated as overrepresented in the PubMed abstracts that contain “Clinical Trial” or “Clinical-Trial”. A hypergeometric test indicates that this distribution had a p-value of 0.0003, indicating our enrichment test on 316 terms indeed had a strong bias toward terms involved in clinical trials.