Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 May 17;95(21):8189–8196. doi: 10.1021/acs.analchem.2c05244

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2023 The Authors. Published by American Chemical Society

Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Overview of the pipeline for proteoform feature identification in TopFD. (a) Preprocessing. Experimental centroided peaks are processed to remove those that have a low intensity or appear in only one MS1 spectrum. Then, MS-Deconv is used to deconvolute MS1 spectra to obtain seed envelopes. (b) Proteoform feature extraction. (1) The reported seed envelopes are ranked based on the sum of the peak intensities of the theoretical envelope. The one with the highest intensity is selected. (2) To extract an envelope set, peaks in the seed theoretical envelope are matched with experimental peaks and extended in both forward and backward directions until no matching experimental peaks are found. (3) The RT boundaries of the reported envelope set are refined if it contains peaks from neighboring envelope sets. (4) The charge state of the envelope set is evaluated and corrected if needed. (5) Once an envelope set is extracted, the neighboring charge states are explored to find other envelope sets in the envelope collection. (6) The experimental peaks included in the envelope collection are removed from the data. The six steps are repeated for the next seed envelope, which has the highest intensity in the remaining seed list. (c) Postprocessing. The precursor masses of reported envelope collections are first refined. Envelope collections are then merged if they have similar precursor masses and similar retention time ranges. Finally, an ECScore is computed for each envelope collection and those with low ECScore are removed.