Skip to main content
. 2022 Mar 24;2022:baac019. doi: 10.1093/database/baac019

Figure 2.

Figure 2.

Performance improvement of the text mining channel. As shown in the ROC curves, text mining performs markedly better in the new version of DISEASES (Dict2,FullText2021) compared to the originally published one (Dict1,PubMed2013). To quantify the sources of improvements, we show two additional curves: one using the new dictionary on the latest abstract collection only (Dict2,PubMed2021) and another using the old dictionary on the same abstracts (Dict1,PubMed2021). Comparing the curves reveals that most of the improvement stems from the addition of full-text articles, but that the new disease and gene dictionaries also led to considerable improvement. By contrast, the growth in PubMed abstracts from 2013 to 2021 made only a minor difference. The insert shows a zoom of the high-confidence part of the plot.