Skip to main content
. 2020 Jan 30;21:35. doi: 10.1186/s12859-020-3375-3

Table 1.

Statistics of entity ambiguity for the Bio-ID corpus

Properties Training set Test set
# Mentions 4440 1715
# Monosemous 3031 1265
# Polysemous/Ambiguity Rate 1409 / 2.79 450 / 2.41

The left column reports four types of attributes, which are the number of unique proteins/genes mention terms (#Mentions), the number of #Mentions with only one entity ID attested in the corpus (#Monosemous), the number of #Mentions with two or more IDs attested in the corpus (#Polysemous), and the average number of candidate IDs that a polysemous target mention has (Ambiguity Rate)