Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2024 Jun 27;23(8):3552–3559. doi: 10.1021/acs.jproteome.4c00188

A Handle on Mass Coincidence Errors in De Novo Sequencing of Antibodies by Bottom-up Proteomics

Douwe Schulte 1, Joost Snijder 1,*
PMCID: PMC11301774  PMID: 38932690

Abstract

graphic file with name pr4c00188_0006.jpg

Antibody sequences can be determined at 99% accuracy directly from the polypeptide product by using bottom-up proteomics techniques. Sequencing accuracy at the peptide level is limited by the isobaric residues leucine and isoleucine, incomplete fragmentation spectra in which the order of two or more residues remains ambiguous due to lacking fragment ions for the intermediate positions, and isobaric combinations of amino acids, of potentially different lengths, for example, GG = N and GA = Q. Here, we present several updates to Stitch (v1.5), which performs template-based assembly of de novo peptides to reconstruct antibody sequences. This version introduces a mass-based alignment algorithm that explicitly accounts for mass coincidence errors. In addition, it incorporates a postprocessing procedure to assign I/L residues based on secondary fragments (satellite ions, i.e., w-ions). Moreover, evidence for sequence assignments can now be directly evaluated with the addition of an integrated spectrum viewer. Lastly, input data from a wider selection of de novo peptide sequencing algorithms are allowed, now including Casanovo, PEAKS, Novor.Cloud, pNovo, and MaxNovo, in addition to flat text and FASTA. Combined, these changes make Stitch compatible with a larger range of data processing pipelines and improve its tolerance to peptide-level sequencing errors.

Keywords: mass spectrometry, de novo sequencing, sequence assembly, antibodies, alignment, isobaric

Introduction

Antibodies are essential components of the adaptive immune system that can recognize a vast array of antigens in the fight against pathogens or in autoimmune disease.14 The diversity of antigens recognized by antibodies is mirrored by the diversity of the antibody sequences. This diversity is generated by somatic recombination and hypermutation of the paired heavy and light chains, taking place in singular B cells. Studying these unique antibody sequences therefore is crucial to understanding the immune response in health and disease and to developing diagnostics, therapeutics, and affinity reagents for life science research based on antibody-mediated recognition and binding of target antigens.

Established methods for antibody sequencing target the coding mRNA in single B cells.512 While these methods have laid the foundation for our current understanding of the antibody response, they do not directly access the functional secreted products found in bodily fluids in the same way as they are probed in common serological assays (to determine the binding and neutralization titers following natural infection or immunization). Additionally, B cells reside in the spleen, bone marrow, and blood, of which only the latter population can be easily sampled in human subjects. In contrast, mass spectrometry-based methods can probe specific antibody sequences directly from the secreted polypeptide product, thereby circumventing the need to sample the antibody-producing B-cell clone and providing a direct glimpse into the so-called serum compartment of the immunoglobulin repertoire.1326

Using a bottom-up proteomics approach, accurate peptide sequences can be determined and assembled into complete heavy- and light-chain sequences. We have recently developed the software tool Stitch to perform template-based assembly of peptide sequences against the coding gene segments of antibodies available in immunogenetics databases.27 Stitch has been shown to enable the accurate reconstruction of monoclonal antibody sequences, as well as the sequencing of isolated Fab fragments from patient serum, M-proteins in monoclonal gammopathies, antibody light chains from urine, and the profiling of whole IgG from COVID-19 patient sera.1315,26 Sequence accuracies of ∼99% can be obtained, which is sufficient to reverse engineer functional antibody products.21,22,28,29 Remaining sequencing errors stem in large parts from common mass coincidences of isobaric residues like leucine/isoleucine but also from incomplete fragmentation spectra in which the order of two or more residues remains ambiguous due to lacking fragment ions for the intermediate positions. Likewise, different combinations of amino acids, of potentially different lengths, can also coincide to the same mass (e.g., GG=N, GA=Q). In addition to limiting the accuracy of the input peptide sequences, these mass coincidence errors may also hamper proper assembly of the short peptides against the antibody template sequences.

Here, we discuss several recent updates to Stitch (v1.1 up to and including v1.5) in light of common MS-based de novo sequencing errors. These updates include a mass-based alignment algorithm (akin to Meta-SPS-contig30), a post-processing procedure to assign I/L residues based on secondary fragments, a built-in spectrum viewer, a graph-based analysis that tracks the connectivity between identified sequence variants, and the ability to use a wider selection of de novo peptide sequencing programs as input, including Casanovo, MaxNovo, Novor.Cloud, PEAKS, and pNovo.3138 Combined, these changes make Stitch compatible with a larger range of data processing pipelines and improve its tolerance to peptide-level sequencing errors.

Methods

Mass Coincidence Rate

The mass coincidence rates over different precisions and mass ranges (Figure 2A,B) were calculated by generating all possible amino acid sequences from the canonical amino acids for each mass range (without rotations), including the following modifications: Carbamidomethyl (fixed on C), Oxidation (variable on WHM), Deamidated (variable on NQ), Glu → pyro-Glu (variable on N-term E), and Gln → pyro-Glu (variable on N-term Q). For each of these sets of possible sequences, the number of unique masses was calculated by going through the possible sets sorted on mass and greedily combining any set with the previous if within the given precision.

Figure 2.

Figure 2

Mass coincidence errors in de novo peptide sequencing. (A) All possible amino acid sequences up to 200 Da, including common modifications (carbamidomethyl fixed on C, pyro-Glu, deamidation on N and Q, and oxidation on M and H). (B) The fraction of sequences that can be told apart (specificity) is based on the precision in Da and the upper mass limit (not counting rotations). (C) Illustration of how the mass-based alignment provides a better handle on mass coincidence errors.

Smith-Waterman Comparison to Mass-Based Alignment

The comparison between Smith–Waterman alignment (SWA) and mass-based alignment (MBA) (Figure 3A) was made by running Stitch with different PEAKS ALC peptide cutoff scores. The other parameters used were enforce unique 0.9, cutoff score template matching 5, cutoff score recombination 5, with the default templates for Homo sapiens heavy and light chains, and common contaminants. The identity was determined using a version of MBA by comparing the consensus sequence as given by these runs to the known sequence. The coverage was determined by counting the fractions of positions that had at least one peptide matching as given in the Stitch FASTA export.

Figure 3.

Figure 3

Mass-based alignment results. (A) Average normalized score (score/alignment length) as a function of PEAKS ALC. Note that the missing points for CutoffALC < 55 for Smith–Waterman alignment is because no peptides with these scores could be placed by this alignment algorithm. (B) Result of running Stitch with the Herceptin and F59 sample data with different read cutoff scores presented as accuracy, defined here as identity against the true sequence, in a solid line, and coverage, the fraction of positions covered by at least one peptide, drawn in a dotted line. Note that the results were generated running Stitch with a low cutoff score for template matching. (C) An excerpt of the alignment of the highest scoring light chain variable gene from the Stitch report for Herceptin using mass-based alignment and CutoffALC 85.

The average scores for each ALC bin (Figure 2B) were calculated using the Herceptin data at PEAKS ALC cutoff 50 from Figure 2A. The score was normalized by dividing the absolute score by the length of each peptide it matched on the template. These scores were grouped by ALC of the peptide, and the average and standard deviation were calculated.

Accuracy/Error Estimates

The likelihood of a given number of mistakes (Figure 4) was calculated following a simple binomial model, in which the probability P(C, T) of obtaining C correct amino acids in a sequence of length T, with a given accuracy a, equals aC × (1 – a)TC, which can be simplified to aT when C equals T, meaning no mistakes are made, which can then be solved for a as Inline graphic.

Figure 4.

Figure 4

Probability of the indicated number of errors in a sequence of 120 amino acids given the sequencing accuracy per amino acid.

Accuracy of I/L Assignments

The accuracy at I/L positions (Figure 5B) was determined by running Stitch with the same data as the SWA vs MBA comparison, with PEAKS cutoff ALC 95, enforce unique 0.9, template matching and recombine cutoff score 10, mass-based alignment, the default templates for Homo sapiens heavy and light chain, and common contaminants, the raw data, and by turning on XleDisambiguation on the input file. For each I/L location, the known true sequence, Stitch outcome, and germline sequence were compared.

Figure 5.

Figure 5

Satellite ion usage to identify I/L. (A) Screenshot from the spectrum viewer of a CDRH3 peptide of Herceptin. (B) The fraction of correct I/L identifications using just the germline (previous version of Stitch) or the new satellite ions; alignment identity is used as measure of accuracy. (C) An example spectrum with satellite ions annotated as Stitch would display it.

Results

Stitch assembles peptides from de novo sequencing programs for bottom-up LC-MS/MS data in the correct framework of the heavy and light chains of an antibody. An outline of the assembly and processing performed by Stitch, including the new features described here, is presented in Figure 1. The previously published version of Stitch was already able to handle PEAKS, FASTA, and plain text data, which is now extended to include the output from Casanovo, MaxNovo, Novor.Cloud, and pNovo.3138 These peptides are aligned to the germline sequences of the corresponding gene segments (V/J/C) for the appropriate species and placed when the alignment score exceeds a user-defined cutoff. The consensus sequence for a single template is made from all placed peptides, weighed for the confidence and abundance of the individually placed peptides as reported by the peptide de novo sequencing programs. This procedure robustly places peptide reads in the correct framework of the full antibody sequence except for the hypervariable Complementarity Determining Region 3 of the heavy chain (CDRH3). CDRH3 is formed by the junction of three coding gene segments V, D, and J, of which D is exceptionally short and variable, such that germline sequences do not provide a functional template to guide the assembly. Instead, Stitch extends the flanking V- and J-segments with a user-defined number of wildcard “X” characters, such that overhanging reads from the V- and J-segments can be used to extend the consensus sequence into the CDRH3 region. The resulting overhanging V- and J-sequences are then aligned to find the overlap from both sides to reconstruct CDRH3. In contrast, CDR3 of the light chain is encoded only with V- and J-segments and can typically be reconstructed from the first-order template matching step alone. These consensus sequences of the recombined V- and J-segments are then used for a second template matching step to reconstruct the final consensus sequence of the antibody. It should be noted that this procedure is only suitable for monoclonal antibodies (although it is tolerant to a moderate background of polyclonal sequences).1315 The reconstruction of CDRH3 from disperse polyclonal mixtures currently requires manual curation of the candidate sequences. The template-based assembly (and recombination) results of Stitch are written out as an interactive report where all peptide alignments can be inspected, and overview statistics like total score/area are collected for every template sequence. To aid in the analysis of disperse polyclonal mixtures, this new version of Stitch introduces the so-called “variant graph”, which shows how co-occurring sequence variations in reference to the templates are connected by peptide reads.

Figure 1.

Figure 1

Overview of the program Stitch.

We also introduced a fundamentally different alignment algorithm for the placement of input reads against template sequences. We previously used a modified version of the Smith–Waterman algorithm (SWA) using a matrix based on BLOSUM62.3941 The SWA only accounts for mismatches between input reads and template sequences that stem from evolutionary processes like somatic hypermutation observed in mature antibody sequences compared to the germline precursors. In our experimental data, however, these mismatches are further convoluted, with errors inherent to an MS-based approach to peptide sequencing.

To illustrate the sequencing errors that MS is prone to, Figure 2A plots the residue masses of the 20 common amino acids (with common modifications) on a scale of 50 to 200 Da, also including the amino acid combinations that fall within this range. Considering that mass analyzers in current proteomics setups typically operate within a precision range of 0.01–0.50 Da, we can broadly distinguish between four categories of MS-based sequencing errors. The first category is the isobaric residues isoleucine and leucine, which simply have identical masses and cannot be distinguished without the presence of secondary fragment ions (as further discussed below). The second category of common MS-based errors includes larger isobaric sets of amino acids, where different combinations of residues amount to identical masses and cannot be distinguished. This second category stems from missing fragment ions in the MS/MS spectra. There are three subcategories of errors in this second category: (a) simple rotations within a set (e.g., AS = SA), b) isobaric sets (e.g., AS = GT), and (c) isobaric sets of different lengths (e.g., N = GG, Q = GA). For sets of three residues or more, combinations of these subcategories may occur. The third category of common MS-based errors includes (modified) residues that are not strictly identical in mass but are so similar that they cannot be distinguished, given the experimental precision limits of the mass analyzer. For instance, the mass difference between K and Q is 0.036 Da, between Mox and F 0.033 Da. This third category of MS-based errors can be eliminated with the use of a high-precision mass analyzer (TOF, FT-MS), but it poses a significant limitation to de novo sequencing at lower precision (ion trap). The fourth category of common MS-based errors does not necessarily stem from analytical limitations but rather from artifacts of the sample processing pipeline in the form of deamidation of Q and especially N, converting these residues to E and D, respectively.

Figure 2B illustrates how these common MS-based errors accumulate as a function of precision for progressively larger mass intervals (i.e., a higher number of missing fragments). A few obvious but fundamental requirements and limitations for de novo peptide sequencing can be observed from this “survival” analysis, showing what fraction of possible unique residues (or residue combinations) can be distinguished based on their mass, not counting rotations. First, it is of the utmost importance to achieve complete fragmentation as category 2 errors (isobaric subsets) readily accumulate over larger mass intervals. Second, category 3 errors (similar residue masses) can be eliminated with high precision mass analyzers. Important to note is that the required precision to eliminate category 3 errors is readily achieved on modern mass analyzers for small peptide fragments with low charge states but that this becomes a significant limitation for larger fragments with higher charge states in middle- and top-down approaches. Considering these common MS-based errors, we propose here a modification of the SWA to perform the template-based assembly for antibody sequencing.

One of the most important drawbacks in the context of these MS-based errors is that in SWA, a substitution can only be a single amino acid to another single amino acid while the category 2 errors affect multiple consecutive residues. The new mass-based alignment we devised considers these errors by aligning not only single amino acids but also larger sets, illustrated in Figure 2C. It does this by supplementing the alignment steps looked for in SWA (i.e., match/mismatch/insertion/deletion), with two additional steps: rotation and isobaric. These last two steps can be of length up to N steps on both the template and peptide (e.g., match 2 amino acids on the template with 3 on the input peptide). With this addition, the category 2 errors, up to N amino acids, can easily be found. The score for each aligned set of amino acids is looked up in a premade two-dimensional array with the same modified BLOSUM62 base matrix in the top left and all category 2 errors given a nonzero score in the array based on their mass identity. It is important for the behavior of the alignment that these rotations and isobaric matches score lower than any direct match. Hence, rotations score 3 per amino acid involved and isobaric matches score 2 per the maximum number of amino acids involved. The alignment algorithm looks for any nonzero score in the set of steps of up to N residues, alongside the normal cases for SWA, and the highest scoring of the possible steps is chosen. This approach is similar to Meta-SPS contig, with the main difference that alignment and assembly is performed on the level of the output peptide sequences, rather than the MS/MS data itself.30

Use of the mass-based alignment (MBA) versus SWA improves the tolerance to mass coincidence errors during template-based assembly. This difference is illustrated in Figure 3 using PEAKS input data for the two monoclonal antibodies Herceptin and F59. PEAKS provides a peptide-level average local confidence (ALC) score between 0 and 99, where peptides with a lower ALC typically miss more fragment ions to support the given sequence. Figure 3A shows how the average length-normalized alignment score during the assembly develops as a function of the ALC cutoff for MBA vs SWA. By design, the alignment scores are consistently higher using MBA vs SWA, the margin illustrating just how common mass coincidence errors are in the input data. The higher degrees of freedom in MBA evidently allow for better placement of the input reads, while we see no signs of overfitting, even at the relatively low alignment score cutoff of 5 used in this analysis. Peptides with higher ALC (and fewer errors) yield higher alignment scores, as expected, but to achieve optimal accuracy in the final consensus sequences, there is an important trade-off to be made between selection of the highest-scoring input reads and achieving optimal coverage. As the coverage plummets with ALC cutoff scores >95, it follows that a substantial tolerance to mass coincidence errors is required to assemble the complete sequences (given the limitations of the experimental input data) by including lower ALC (<95) peptides (Figure 3B). Of note, SWA cannot align any peptides below ALC 55, while these may still be placed using MBA.

The accuracy of the final consensus sequences of Herceptin and F59 is plotted as a function of ALC cutoff in Figure 3B, illustrating that MBA outperforms SWA across the sampled range. The accuracy increases from 0.95 (SWA) to 0.99 (MBA). While this may seem like a moderate increase at first, it represents a better than 100-fold improvement in the probability to determine the sequence of a 120 amino acid long variable domain with zero mistakes (P = 0.002 for accuracy 0.95 vs P = 0.299 for accuracy 0.99). Of note, to achieve a probability of P > 0.99 to make zero mistakes in a sequence of this length, the accuracy needs to further improve to >0.9999 (see Figure 4).

In addition to the new MBA, Stitch is also updated to read and display fragmentation spectra in an interactive spectrum viewer (Figure 5A). This allows the user to manually inspect the underlying data for an assigned sequence. Moreover, accessing the fragmentation spectra in Stitch also allowed us to implement a new postprocessing procedure to differentiate between leucine/isoleucine based on secondary fragment ions that can be observed in some fragmentation modes, notably electron transfer high energy collision dissociation data.42 These diagnostic satellite ions (known as w/d-ions) are formed by secondary fragmentation of z- and a-ions, by transfer of the free electron to the side chain, and breaking of a bond to the first C atom in the side chain (Cβ). As the side chains are different between isoleucine and leucine, the possible mass shifts of the satellite ions are different, being 15 and 29 Da for isoleucine (loss of C1H3 and C2H5) and 43 Da for leucine (loss of C3H7). Stitch gives the option to use these satellite ions to find the most likely candidate for each I/L position. This improves the fraction of correctly identified I/L positions from 0.81 (when defaulting to the germline template sequence or to L in the case of newly introduced I/L mutations) to 0.94 when using the satellite ions, as seen in Figure 5B.

Discussion

These recent updates to Stitch provide a better handle on mass coincidence errors for de novo antibody sequencing by LC-MS/MS. These improvements extend to any MS-based de novo sequencing application beyond antibodies, and indeed, Stitch can perform the same tasks on an arbitrary set of template sequences.

Implementation of the mass-based alignment shed light on fundamental requirements and limitations for MS-based de novo sequencing and set out some important goals for the near future. To minimize peptide-level sequencing errors, it is crucially important to analyze at a precision of at least 0.02 Da, achieve complete fragmentation, and obtain secondary fragment ions to differentiate I/L. The latter two points call for use of complementary fragmentation techniques beyond CID. Currently, accuracies of 0.99 can be achieved on the consensus sequence level, but to achieve a milestone 99% confidence in error-free sequences, this needs to improve to >0.9999 for the case of an antibody variable domain of 120 amino acids. Based on our own work, this requires further improvement of I/L assignments, but even more so elimination of deamidation errors (primarily N/D) from the sample processing workflows. Moreover, the 0.99 accuracy is currently based on consensus sequences from overlapping peptides, and as we expand applications to complex polyclonal antibody mixtures, the depth of coverage will inevitably suffer, again calling for improved fragmentation to minimize sequencing errors at the peptide level. In addition to improved fragmentation, recent improvements in retention time prediction may also be leveraged in de novo peptide sequencing algorithms to improve peptide-level sequence accuracy.43,44

Stitch is now compatible with a wide range of current de novo sequencing algorithms, providing an improved strategy for sequence assembly with mass-based alignment and the opportunity to browse spectrum-level evidence for the determined sequence to aid in manual curation. It is a versatile tool for MS-based antibody sequencing and beyond and may provide a springboard to dive into the serum compartment of the antibody repertoire with proteomics techniques.

Acknowledgments

The authors thank Bastiaan de Graaf for the fruitful discussions regarding the design of Stitch and the mass-based alignment algorithm. We would also like to thank Lukas Käll for helpful input and feedback on the manuscript. This research was funded by the Dutch Research Council NWO Gravitation 2013 BOO, Institute for Chemical Immunology (ICI; 024.002.009), and the European Research Council Executive Agency HORIZON ERC-2022-STG (FLAVIR; 101077640).

Data Availability Statement

The source code of Stitch is available on GitHub: https://github.com/snijderlab/stitch. All Stitch HTML results related to this study are provided as Supporting Data. The raw data of the monoclonal antibodies Herceptin and F59 are available under identifier PXD023419 and doi 10.6084/m9.figshare.13194005, respectively.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00188.

  • Isobaric statistics; mass-based benchmark, and xln disambiguation (ZIP)

  • All Stitch HTML results related to this study (PDF)

The authors declare no competing financial interest.

Supplementary Material

pr4c00188_si_001.zip (142.9MB, zip)
pr4c00188_si_002.pdf (58KB, pdf)

References

  1. Casadevall A.; Pirofski L. A New Synthesis for Antibody-Mediated Immunity. Nat. Immunol 2012, 13 (1), 21–28. 10.1038/ni.2184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Davies D. R.; Metzger H. Structural Basis of Antibody Function. Annu. Rev. Immunol. 1983, 1 (1), 87–115. 10.1146/annurev.iy.01.040183.000511. [DOI] [PubMed] [Google Scholar]
  3. Lleo A.; Invernizzi P.; Gao B.; Podda M.; Gershwin M. E. Definition of Human Autoimmunity — Autoantibodies versus Autoimmune Disease. Autoimmun Rev. 2010, 9 (5), A259–A266. 10.1016/j.autrev.2009.12.002. [DOI] [PubMed] [Google Scholar]
  4. Naparstek Y.; Plotz P. H. The Role of Autoantibodies in Autoimmune Disease. Annu. Rev. Immunol. 1993, 11 (1), 79–104. 10.1146/annurev.iy.11.040193.000455. [DOI] [PubMed] [Google Scholar]
  5. Avram O.; Kigel A.; Vaisman-Mentesh A.; Kligsberg S.; Rosenstein S.; Dror Y.; Pupko T.; Wine Y. PASA: Proteomic Analysis of Serum Antibodies Web Server. PLoS Comput. Biol. 2021, 17 (1), e1008607 10.1371/journal.pcbi.1008607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Lavinder J. J.; Horton A. P.; Georgiou G.; Ippolito G. C. Next-Generation Sequencing and Protein Mass Spectrometry for the Comprehensive Analysis of Human Cellular and Serum Antibody Repertoires. Curr. Opin Chem. Biol. 2015, 24, 112–120. 10.1016/j.cbpa.2014.11.007. [DOI] [PubMed] [Google Scholar]
  7. Lee J.; Boutz D. R.; Chromikova V.; Joyce M. G.; Vollmers C.; Leung K.; Horton A. P.; DeKosky B. J.; Lee C.-H.; Lavinder J. J.; Murrin E. M.; Chrysostomou C.; Hoi K. H.; Tsybovsky Y.; Thomas P. V.; Druz A.; Zhang B.; Zhang Y.; Wang L.; Kong W.-P.; Park D.; Popova L. I.; Dekker C. L.; Davis M. M.; Carter C. E.; Ross T. M.; Ellington A. D.; Wilson P. C.; Marcotte E. M.; Mascola J. R.; Ippolito G. C.; Krammer F.; Quake S. R.; Kwong P. D.; Georgiou G. Molecular-Level Analysis of the Serum Antibody Repertoire in Young Adults before and after Seasonal Influenza Vaccination. Nat. Med. 2016, 22 (12), 1456–1464. 10.1038/nm.4224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lee J.; Paparoditis P.; Horton A. P.; Frühwirth A.; McDaniel J. R.; Jung J.; Boutz D. R.; Hussein D. A.; Tanno Y.; Pappas L.; Ippolito G. C.; Corti D.; Lanzavecchia A.; Georgiou G. Persistent Antibody Clonotypes Dominate the Serum Response to Influenza over Multiple Years and Repeated Vaccinations. Cell Host Microb. 2019, 25 (3), 367–376. 10.1016/j.chom.2019.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Lindesmith L. C.; McDaniel J. R.; Changela A.; Verardi R.; Kerr S. A.; Costantini V.; Brewer-Jensen P. D.; Mallory M. L.; Voss W. N.; Boutz D. R.; Blazeck J. J.; Ippolito G. C.; Vinje J.; Kwong P. D.; Georgiou G.; Baric R. S. Sera Antibody Repertoire Analyses Reveal Mechanisms of Broad and Pandemic Strain Neutralizing Responses after Human Norovirus Vaccination. Immunity 2019, 50 (6), 1530–1541. 10.1016/j.immuni.2019.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Boutz D. R.; Horton A. P.; Wine Y.; Lavinder J. J.; Georgiou G.; Marcotte E. M. Proteomic Identification of Monoclonal Antibodies from Serum. Anal. Chem. 2014, 86 (10), 4758–4766. 10.1021/ac4037679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ellebrecht C. T.; Mukherjee E. M.; Zheng Q.; Choi E. J.; Reddy S. G.; Mao X.; Payne A. S. Autoreactive IgG and IgA B Cells Evolve through Distinct Subclass Switch Pathways in the Autoimmune Disease Pemphigus Vulgaris. Cell Rep 2018, 24 (9), 2370–2380. 10.1016/j.celrep.2018.07.093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fridy P. C.; Li Y.; Keegan S.; Thompson M. K.; Nudelman I.; Scheid J. F.; Oeffinger M.; Nussenzweig M. C.; Fenyö D.; Chait B. T.; Rout M. P. A Robust Pipeline for Rapid Production of Versatile Nanobody Repertoires. Nat. Methods 2014, 11 (12), 1253–1260. 10.1038/nmeth.3170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bondt A.; Hoek M.; Dingess K.; Tamara S.; de Graaf B.; Peng W.; den Boer M. A.; Damen M.; Zwart C.; Barendregt A.; van Rijswijck D. M. H.; Schulte D.; Grobben M.; Tejjani K.; van Rijswijk J.; Völlmy F.; Snijder J.; Fortini F.; Papi A.; Volta C. A.; Campo G.; Contoli M.; van Gils M. J.; Spadaro S.; Rizzo P.; Heck A. J. R. Into the Dark Serum Proteome: Personalized Features of IgG1 and IgA1 Repertoires in Severe COVID-19 Patients. Molecular & Cellular Proteomics 2024, 23 (1), 100690 10.1016/j.mcpro.2023.100690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Peng W.; den Boer M. A.; Tamara S.; Mokiem N. J.; van der Lans S. P. A.; Bondt A.; Schulte D.; Haas P.-J.; Minnema M. C.; Rooijakkers S. H. M.; van Zuilen A. D.; Heck A. J. R.; Snijder J. Direct Mass Spectrometry-Based Detection and Antibody Sequencing of Monoclonal Gammopathy of Undetermined Significance from Patient Serum: A Case Study. J. Proteome Res. 2023, 22 (9), 3022–3028. 10.1021/acs.jproteome.3c00330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Bondt A.; Hoek M.; Tamara S.; de Graaf B.; Peng W.; Schulte D.; van Rijswijck D. M. H.; den Boer M. A.; Greisch J. F.; Varkila M. R. J.; Snijder J.; Cremer O. L.; Bonten M. J. M.; Heck A. J. R. Human Plasma IgG1 Repertoires Are Simple, Unique, and Dynamic. Cell Syst 2021, 12 (12), 1131–1143. 10.1016/j.cels.2021.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Tran N. H.; Rahman M. Z.; He L.; Xin L.; Shan B.; Li M. Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci. Rep 2016, 6 (1), 31730. 10.1038/srep31730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Sousa E.; Olland S.; Shih H. H.; Marquette K.; Martone R.; Lu Z.; Paulsen J.; Gill D.; He T. Primary Sequence Determination of a Monoclonal Antibody against α-Synuclein Using a Novel Mass Spectrometry-Based Approach. Int. J. Mass Spectrom. 2012, 312, 61–69. 10.1016/j.ijms.2011.05.005. [DOI] [Google Scholar]
  18. Sen K. I.; Tang W. H.; Nayak S.; Kil Y. J.; Bern M.; Ozoglu B.; Ueberheide B.; Davis D.; Becker C. Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery. J. Am. Soc. Mass Spectrom. 2017, 28 (5), 803–810. 10.1007/s13361-016-1580-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Rickert K. W.; Grinberg L.; Woods R. M.; Wilson S.; Bowen M. A.; Baca M. Combining Phage Display with de Novo Protein Sequencing for Reverse Engineering of Monoclonal Antibodies. MAbs 2016, 8 (3), 501–512. 10.1080/19420862.2016.1145865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Savidor A.; Barzilay R.; Elinger D.; Yarden Y.; Lindzen M.; Gabashvili A.; Adiv Tal O.; Levin Y. Database-Independent Protein Sequencing (DiPS) Enables Full-Length de Novo Protein and Antibody Sequence Determination. Molecular & Cellular Proteomics 2017, 16 (6), 1151–1161. 10.1074/mcp.O116.065417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Peng W.; Pronker M. F.; Snijder J. Mass Spectrometry-Based De Novo Sequencing of Monoclonal Antibodies Using Multiple Proteases and a Dual Fragmentation Scheme. J. Proteome Res. 2021, 20 (7), 3559–3566. 10.1021/acs.jproteome.1c00169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Guthals A.; Gan Y.; Murray L.; Chen Y.; Stinson J.; Nakamura G.; Lill J. R.; Sandoval W.; Bandeira N. De Novo MS/MS Sequencing of Native Human Antibodies. J. Proteome Res. 2017, 16 (1), 45–54. 10.1021/acs.jproteome.6b00608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Cheung W. C.; Beausoleil S. A.; Zhang X.; Sato S.; Schieferl S. M.; Wieler J. S.; Beaudet J. G.; Ramenani R. K.; Popova L.; Comb M. J.; Rush J.; Polakiewicz R. D. A Proteomics Approach for the Identification and Cloning of Monoclonal Antibodies from Serum. Nat. Biotechnol. 2012, 30 (5), 447–452. 10.1038/nbt.2167. [DOI] [PubMed] [Google Scholar]
  24. Castellana N. E.; McCutcheon K.; Pham V. C.; Harden K.; Nguyen A.; Young J.; Adams C.; Schroeder K.; Arnott D.; Bafna V.; Grogan J. L.; Lill J. R. Resurrection of a Clinical Antibody: Template Proteogenomic de Novo Proteomic Sequencing and Reverse Engineering of an Anti-lymphotoxin-α Antibody. Proteomics 2011, 11 (3), 395–405. 10.1002/pmic.201000487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Bandeira N.; Pham V.; Pevzner P.; Arnott D.; Lill J. R. Automated de Novo Protein Sequencing of Monoclonal Antibodies. Nat. Biotechnol. 2008, 26 (12), 1336–1338. 10.1038/nbt1208-1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Peng W.; Giesbers K. C.; Šiborová M.; Beugelink J. W.; Pronker M. F.; Schulte D.; Hilkens J.; Janssen B. J.; Strijbis K.; Snijder J. Reverse-Engineering the Anti-MUC1 Antibody 139H2 by Mass Spectrometry–Based de Novo Sequencing. Life Sci. Alliance 2024, 7 (6), e202302366 10.26508/lsa.202302366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Schulte D.; Peng W.; Snijder J. Template-Based Assembly of Proteomic Short Reads For De Novo Antibody Sequencing and Repertoire Profiling. Anal. Chem. 2022, 94 (29), 10391–10399. 10.1021/acs.analchem.2c01300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Peng W.; Giesbers K. C. A. P.; Šiborová M.; Beugelink J. W.; Pronker M. F.; Schulte D.; Hilkens J.; Janssen B. J. C.; Strijbis K.; Snijder J. Reverse Engineering the Anti-MUC1 Hybridoma Antibody 139H2 by Mass Spectrometry-Based de Novo Sequencing. bioRxiv 2023, 10.1101/2023.07.05.547778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ivanov D. G.; Ivetic N.; Du Y.; Nguyen S. N.; Le S. H.; Favre D.; Nazy I.; Kaltashov I. A. Reverse Engineering of a Pathogenic Antibody Reveals the Molecular Mechanism of Vaccine-Induced Immune Thrombotic Thrombocytopenia. J. Am. Chem. Soc. 2023, 145 (46), 25203–25213. 10.1021/jacs.3c07846. [DOI] [PubMed] [Google Scholar]
  30. Guthals A.; Clauser K. R.; Bandeira N. Shotgun Protein Sequencing with Meta-Contig Assembly. Molecular & Cellular Proteomics 2012, 11 (10), 1084–1096. 10.1074/mcp.M111.015768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Yilmaz M.; Fondrie W. E.; Bittremieux W.; Oh S.; Noble W. S.. De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model. PMLR June; 28, 2022; pp 25514–25522. https://proceedings.mlr.press/v162/yilmaz22a.html (accessed 2024–01–30).
  32. Yilmaz M.; Fondrie W. E.; Bittremieux W.; Nelson R.; Ananth V.; Oh S.; Noble W. S.; Allen P. G. Sequence-to-Sequence Translation from Mass Spectra to Peptides with a Transformer Model. bioRxiv 2023, 10.1101/2023.01.03.522621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gutenbrunner P.; Kyriakidou P.; Welker F.; Cox J. Spectrum Graph-Based de-Novo Sequencing Algorithm MaxNovo Achieves High Peptide Identification Rates in Collisional Dissociation MS/MS Spectra. bioRxiv 2021, 10.1101/2021.09.04.458985. [DOI] [Google Scholar]
  34. Ma B. Novor: Real-Time Peptide de Novo Sequencing Software. J. Am. Soc. Mass Spectrom. 2015, 26 (11), 1885–1894. 10.1007/s13361-015-1204-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Tran N. H.; Zhang X.; Xin L.; Shan B.; Li M. De Novo Peptide Sequencing by Deep Learning. Proc. Natl. Acad. Sci. U. S. A. 2017, 114 (31), 8247–8252. 10.1073/pnas.1705691114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Chi H.; Chen H.; He K.; Wu L.; Yang B.; Sun R.-X.; Liu J.; Zeng W.-F.; Song C.-Q.; He S.-M.; Dong M.-Q. pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra. J. Proteome Res. 2013, 12 (2), 615–625. 10.1021/pr3006843. [DOI] [PubMed] [Google Scholar]
  37. Chi H.; Sun R.-X.; Yang B.; Song C.-Q.; Wang L.-H.; Liu C.; Fu Y.; Yuan Z.-F.; Wang H.-P.; He S.-M.; Dong M.-Q. pNovo: De Novo Peptide Sequencing and Identification Using HCD Spectra. J. Proteome Res. 2010, 9 (5), 2713–2724. 10.1021/pr100182k. [DOI] [PubMed] [Google Scholar]
  38. Yang H.; Chi H.; Zhou W.-J.; Zeng W.-F.; He K.; Liu C.; Sun R.-X.; He S.-M. Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications. J. Proteome Res. 2017, 16 (2), 645–654. 10.1021/acs.jproteome.6b00716. [DOI] [PubMed] [Google Scholar]
  39. Needleman S. B.; Wunsch C. D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 1970, 48 (3), 443–453. 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  40. Smith T. F.; Waterman M. S. Comparison of Biosequences. Adv. Appl. Math 1981, 2 (4), 482–489. 10.1016/0196-8858(81)90046-4. [DOI] [Google Scholar]
  41. Henikoff S.; Henikoff J. G. Amino Acid Substitution Matrices from Protein Blocks. Proc. Natl. Acad. Sci. U. S. A. 1992, 89 (22), 10915–10919. 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Johnson R. S.; Martin S. A.; Biemann K. Collision-Induced Fragmentation of (M + H)+ Ions of Peptides. Side Chain Specific Sequence Ions. Int. J. Mass Spectrom Ion Process 1988, 86, 137–154. 10.1016/0168-1176(88)80060-0. [DOI] [Google Scholar]
  43. Bouwmeester R.; Gabriels R.; Hulstaert N.; Martens L.; Degroeve S. DeepLC Can Predict Retention Times for Peptides That Carry As-yet Unseen Modifications. Nat. Methods 2021, 18 (11), 1363–1369. 10.1038/s41592-021-01301-5. [DOI] [PubMed] [Google Scholar]
  44. Sinitcyn P.; Hamzeiy H.; Salinas Soto F.; Itzhak D.; McCarthy F.; Wichmann C.; Steger M.; Ohmayer U.; Distler U.; Kaspar-Schoenefeld S.; Prianichnikov N.; Yılmaz Ş.; Rudolph J. D.; Tenzer S.; Perez-Riverol Y.; Nagaraj N.; Humphrey S. J.; Cox J. MaxDIA Enables Library-Based and Library-Free Data-Independent Acquisition Proteomics. Nat. Biotechnol. 2021, 39 (12), 1563–1573. 10.1038/s41587-021-00968-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

pr4c00188_si_001.zip (142.9MB, zip)
pr4c00188_si_002.pdf (58KB, pdf)

Data Availability Statement

The source code of Stitch is available on GitHub: https://github.com/snijderlab/stitch. All Stitch HTML results related to this study are provided as Supporting Data. The raw data of the monoclonal antibodies Herceptin and F59 are available under identifier PXD023419 and doi 10.6084/m9.figshare.13194005, respectively.


Articles from Journal of Proteome Research are provided here courtesy of American Chemical Society

RESOURCES