TABLE 2.
The strategies to measure CASBERT performance. There are three query sets and eight retrieval methods including BM25 as the gold standard.
| Q-E type | # Of query-entities | Description | |
|---|---|---|---|
| PMR-CA | BioModels-CA | ||
| noPredicate | 338 | 834 | The original query-entity pairs extracted from the PMR and the BioModels-CA. |
| withPredicate | 534 | 1541 | The expanded noPredicate set by randomly adding terms in composite annotation predicate to the associated existing query terms |
| combine | 509 | 1777 | Combination of noPredicate and withPredicate where the data used for QC model training is removed |
| Retrieval method | Terms used to generate query embedding | List of entity embeddings used | |
|---|---|---|---|
| macro | whole query terms | E 1(w p = 0) | |
| macroWP | whole query terms | E 2(0 < w p < 1) | |
| micro | ontology class concept phrase | E 1(w p = 0) | |
| microWP | ontology class concept phrase | E 2(0 < w p < 1) | |
| mixed | whole query terms & ontology class concept phrases | E 1(w p = 0) | |
| mixedWP | whole query terms & ontology class concept phrases | E 2(0 < w p < 1) | |
| mixedCl | whole query terms & ontology class concept phrases | select between L 1 or L 2 | |
| BM25 | Retrieval uses a bag-of-words method, BM25 | ||