Skip to main content
. 2024 Mar 5;20(3):e1011881. doi: 10.1371/journal.pcbi.1011881

Fig 1. Per-dataset prevalence of sequence liabilities for five open-source databases: Genbank, literature, NGS, patents, and therapeutics.

Fig 1

Please note that the NGS dataset and therapeutics were paired, so the number of liabilities can not be directly compared to the single-chain datasets. Genbank, patents, and literature datasets contained unpaired heavy and light sequences. In the top portion (sequences) counts are given as a percentage of the total number of sequences in a dataset. In the lower portion (liabilities), the total count of liabilities in the dataset is given. In each case, we show the number of remaining sequences of liabilities or total liabilities after applying individual flags or their combinations.