Skip to main content
[Preprint]. 2024 Feb 5:arXiv:2402.03484v1. [Version 1]

Figure 3:

Figure 3:

Overview of the PubCLogs dataset construction process: for each coclicked article pair, the initial article represents the seed article, and the related article clicked subsequently represents the similar article. For each token in the title of the similar article, we aggregate the number of coclicks from queries that included the title token. We apply a softmax function to these click counts and establish a predefined threshold, P, to identify the most frequently queried similar article title tokens, which are then used as the ground truth labels for the PMID pair.