. 2020 Jan 30;21:35. doi: 10.1186/s12859-020-3375-3

Table 1.

Statistics of entity ambiguity for the Bio-ID corpus

Properties	Training set	Test set
# Mentions	4440	1715
# Monosemous	3031	1265
# Polysemous/Ambiguity Rate	1409 / 2.79	450 / 2.41

The left column reports four types of attributes, which are the number of unique proteins/genes mention terms (#Mentions), the number of #Mentions with only one entity ID attested in the corpus (#Monosemous), the number of #Mentions with two or more IDs attested in the corpus (#Polysemous), and the average number of candidate IDs that a polysemous target mention has (Ambiguity Rate)