Table 2.
Algorithm | Scoring | Global | WPM | Remarks |
---|---|---|---|---|
tf-idf | Term frequency | No | No | Frequent resources in the collection have a low score. In ontologies, a common term does not necessarily mean less relevant. Frequent terms can be a product of reuse by other ontologies |
BM25 | Term frequency | Yes | No | Suffers from the same issue has tf-idf, but the cumulative score ranks domain ontologies higher |
VSM | Vector similarity | No | No | Uses tf-idf to weight vectors and also considers the tf-idf of the query, aggravating the tf-idf drawback |
PageRank | Links between ontologies | Yes | No | Ranks based on popularity, which may lead to popular but less relevant resources, being ranked higher |
CMM | Coverage of the set of queries | Yes | Yes | Ontologies with a large number of partial matches will be scored higher than ontologies with few exact matches |
SMM | Closeness between ontological resources | Yes | No | Although this algorithm can be useful when considering similarity among the matched resources of two or more query terms of a multi-keyword query, it performs poorly on single-word queries |
Note: Scoring summarizes the main scoring method of the algorithm. Global indicates if the score attributed by the algorithm is per resource or per ontology. WPM (weights partial matches) shows if the ontology distinguishes between partial and exact matches.