Skip to main content
. 2022 Mar 3;23:69. doi: 10.1186/s13059-022-02624-y

Fig. 4.

Fig. 4

Customized long-read-derived protein database for protein isoform detection. a–c Overlap of peptide (a), gene (b), and protein isoform group (c) identifications from GENCODE versus PacBio database searches. d Example of a “Subset” case in which the sample is inferred to express fewer isoforms, based on the sample-specific PacBio-Hybrid database, than those inferred from the reference (GENCODE) database search. Based on the peptide evidence, the protein isoform expressed is ambiguous when relying on reference models, but precise (PB.2555.5 identified) when using the long-read database. e Example of a “Partial Overlap” case in which the sample expresses fewer isoforms than the reference but, at the same time, expresses additional novel isoforms not accounted for in the reference model. f Example of a “Distinct” case in which the sample expresses isoforms that are entirely distinct from those isoform models in the reference. Though the peptide maps to isoforms in the reference and sample, it is most likely arising from the novel protein isoform annotated from the long-read data. In d–f, the PacBio-derived isoform label follows this format: <gene>|<PB accession>|<SQANTI Protein class>|<CPM>. The peptide sequences display the flanking AA which is not part of the identified sequence. CDS, protein coding sequences