Skip to main content
[Preprint]. 2023 Oct 16:2023.10.13.562298. [Version 1] doi: 10.1101/2023.10.13.562298

Figure 1. The PARSE algorithm for interpretable protein function annotation.

Figure 1.

Starting from top left, we first (A) build a reference database containing all residues associated with each functional group (here, enzymes from the Catalytic Site Atlas). Then, for a query protein to be annotated, we (B) embed the local environment around each residue using COLLAPSE (colored squares) and compute the pairwise cosine distance to the embedding of each residue in the reference database (colored circles). Database residues are then ranked by the minimum distance to any residue in the query and (C) an enrichment score is computed for each functional group relative to this ranked list. (D) Key residues for a given function are mapped to the query protein using the leading-edge subset of database residues which achieve scores greater than the maximum running enrichment score in the ranked list. Finally, to assess significance and reduce the influence of low-specificity functional labels, we (E) compute an empirical p-value based on a function-specific background score distribution.

HHS Vulnerability Disclosure