Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Dec 13;20(5):1002–1012. doi: 10.1016/j.gpb.2022.11.009

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2022 The Authors

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Benchmark data preparation and evaluation

A. The experimentally identified epitope-containing regions were collected from the IEDB database. B. Identical protein sequences were integrated and the verified epitope regions were aggregated. C. Sequence redundancy was cleaned for the similar proteins by CD-HIT. D. Proteins with the largest number of epitope-containing regions were retained. The curated dataset was divided into epitopes and non-epitopes according to epitope assay information. We defined all epitope-containing regions that were tested by at least two PAs as epitopes to avoid possible chance of a single test result. Moreover, all epitope-containing regions that were tested in at least two assays but not tested as positive in any assay were stored as non-epitopes. All other epitope-containing regions with inconsistent test responses that did not meet both criteria were excluded. E. The length distribution of epitopes. F. The length distribution of non-epitopes. G. Taxonomic distribution in super-kingdoms and families at the protein level. H. Taxonomic distribution in super-kingdoms and families at the verified epitope level. PA, positive assay.