Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 Oct 30;4(12):100865. doi: 10.1016/j.patter.2023.100865

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2023 The Authors

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Search behavior depends on canonicalization

(A) Similarity metrics for all CheSS searches between each canonicalized query and its respective top 250 results. Compared to consistent canonicalization, the top 250 results from alternative canonicalizations were significantly more dissimilar in structural, scaffold, string, and shared token similarity. Structural similarity measured by whole-molecule fingerprint Tanimoto similarity, scaffold similarity measured by scaffold fingerprint Tanimoto similarity, string similarity measured by gestalt pattern matching. Asterisks indicate the level of statistical significance for two-sided independent t tests (ns, p < 1.0; ∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001; ∗∗∗∗p < 0.0001).

(B–D) The index rank of each canonicalization’s top 250 results for zidovudine compared to the index rank that these same molecules scored in a fingerprint Tanimoto search. Black dot indicates molecules functionally similar to the query, as determined by the LLM-assisted patent search. Rank plots for all queries are listed in Figure S7.