Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Nov 10;18(11):e1010663. doi: 10.1371/journal.pcbi.1010663

BIPS—A code base for designing and coding of a Phage ImmunoPrecipitation Oligo Library

Sigal Leviatan 1,2,#, Iris N Kalka 1,2,#, Thomas Vogl 1,2, Shelley Klompas 1,2, Adina Weinberger 1,2, Eran Segal 1,2,*
Editor: Dina Schneidman3
PMCID: PMC9681064  PMID: 36355866

Abstract

BIPS (Build Phage ImmunoPrecipitation Sequencing library) is a software that converts a list of proteins into a custom DNA oligonucleotide library for the PhIP-Seq system. The tool creates constant-length oligonucleotides with internal barcodes, while maintaining the original length of the peptide. This allows using large libraries, of hundreds of thousands of oligonucleotides, while saving on the costs of sequencing and maintaining the accuracy of oligonucleotide reads identification. BIPS is available under GNU public license from: https://github.com/kalkairis/BuildPhIPSeqLibrary.


This is a PLOS Computational Biology Software paper.

Introduction

The use of Phage immunoprecipitation sequencing (PhIP-Seq), based on phage-displayed synthetic oligonucleotide libraries [1], became a useful tool in scanning the reaction of the human immune system to large protein collections [25]. PhIP-Seq allows testing for antibody reaction to hundreds of thousands of antigens in parallel, in contrast to peptide arrays and enzyme-linked immunosorbent assays (ELISAs) which have limited capacities, of at most a few thousand or a few tens of thousands of antigens [6]. However, creating a library of that many peptides is computationally challenging.

Current state-of-the-art software for peptide library design includes Pepsyn [7], which uses representative selection for tiling protein sequences. Other previous works [2,8] did not publish public software, but mention crucial steps in constructing peptide libraries. While Pepsyn is a reliable tool, it does not allow for representations of multiple variants of the same peptide sequence.

Here, we present BIPS (BuildPhIPSeqLibrary), a software for rapid and simple construction of an oligonucleotide library from any input sets of proteins (Fig 1A). BIPS is intended for large peptide libraries and through the use of synonymous mutations, which was previously suggested [2,8], enables the complete representation of multiple variants in an unambiguous manner, while allowing peptide identification without the need for full-length sequencing.

Fig 1. Library construction and barcoding schematic representation.

Fig 1

a, Flow of inputs from peptide sequences, through software modules (arrows) including intermediate results. (1) protein sequences splitting into constant-length peptides with overlap. (2) reverse translation adhering to codon usage frequencies. (3) barcoding sequences’ tails. b, Three possible identification methodologies of oligos. In brown below are minimal sequencings required for correct identification. (1) External barcodes decrease the possible lengths of coding peptides via replacing the 3’ end with a short barcode. (2) No barcoding, requires sequencing of entire oligonucleotides. (3) Inline barcoding, requires intermediate length sequencing, and enables full-length coding peptides. c, Two possible inline barcodes of the same 5’ sequence (LL), with a hamming-distance of three and a six-nucleotide barcode (3x2 nucleotides). First oligonucleotide is barcoded without restrictions. Second oligonucleotide is restricted for three iterations, and permitted in the fourth. Figures were created with BioRender.com.

We tested BIPS on simulated data, on all available infectious and allergic diseases from the Immune Epitope Database (IEDB). Through these, we showed the computational limits of BIPS.

Design and implementation

Design

An overview of BIPS is presented in Fig 1A, and is detailed below.

Construction of PhIP-Seq libraries begins with a large set of proteins provided by users. Usually these will be full-length or partial proteins, from one or many sources. Note that the advantages of a PhIP-Seq library are more pronounced when exact epitopes are unknown and a scanning of relatively long peptides or full proteins is required.

In constructing a PhIP-Seq library the set of proteins must be cut into constant length peptides, so as not to create biases in PCR-amplification of the library and its cloning into phages. To this end, shorter input proteins or peptides are terminated by a stop codon, and padded with a random amino-acid sequence, as implemented in Pepsyn [7].

The PhIP-Seq library is intended to present all potential linear epitopes (potential binding sites of the immune system antibodies) of the given set of proteins. In order to cover all potential linear-epitopes with potential for antibody binding, proteins are cut with a positive overlap. This overlap should be long enough so that linear epitopes, usually 6–20 amino acids long [9] appear in full in at least one of the peptides.

BIPS produces a list of oligonucleotide sequences (oligos), to be synthesized and cloned into phages, that will infect and replicate in the appropriate bacteria. The common use is of bacteriophage T7 within Escherichia coli. All peptides must be coded using the host bacteria’s codon usage table, without including restricted sequences. Furthermore, in order to make the replication as efficient and as uniform as possible, it is preferable to also imitate the codon usage frequencies of the host bacteria (Fig 2A).

Fig 2. Codon frequency in BIPS output.

Fig 2

a, Codon frequencies (summing to one per amino acid). In green the published frequencies for Escherichia coli. In red and yellow are coding frequencies of example outputs from IEDB’s infectious diseases and allergic diseases. b, c, Simulations of average number of barcodes for random oligos. Colors are defined by barcode-lengths and number of barcode parts. Simulations were performed on constant barcode length (b) or constant barcode parts’ lengths (c).

After immunoprecipitation, the peptides in the bound phages are identified by sequencing the bound phages. In cases of libraries where there are many similar peptides due to research of deep mutagenesis [3], mutation scanning, homologous proteins or other, partially sequencing a peptide might not be sufficient for identification, and alternative methods must be explored. One method of identification is sequencing of entire oligos, however this is costly and prone to read errors. Another method is attachment of pre-planned barcodes at the 3’ end of each oligo, preceded by a stop codon, so that the barcode does not translate into a protein. Such barcodes are relatively short in length, thus requiring cheaper sequencing, and are designed to correct read errors via their pre-planned hamming-distance [10] from each other. These advantages come at a cost, the length of the barcode is added upon the length of the coding oligo, adding to the costs of library synthesis. Worse, this addition limits the length of actual coding oligo, as DNA synthesis technology is still limited. BIPS creates internal barcoding sequences by leveraging the multiple possibilities in coding amino-acids into oligos, while maintaining a requested hamming-distance between pairs. Barcodes are created at the coding stage as part of the oligo, thus they can be located at the 5’ end of the oligo eliminating the misidentification problem caused in cases of multiple inserts (S2 Text).

If a constructed barcode does not adhere to the hamming-distance or the entire peptide includes restricted sequences, BIPS re-codes the barcode until either getting a successful barcode, or until reaching a pre-defined number of iterations. The concept of utilizing synonymous codons to allow unambiguous mapping has previously been discussed [2,8], here, we have advanced this approach by using it in order to ensure a hamming distance between sequences. It is clear that coding many identical (or nearly identical) sequences in a single library will increase the use of relatively rare codons, however we have shown [4] that this does not cause a significant reduction in differential expression.

When oligo libraries are large (hundreds of thousands of oligos) it is computationally infeasible to check the hamming distance of each newly proposed oligo vs. the library already created, therefore we construct a unique divide-and-conquer method to ensure the hamming distance. Given a required number of errors to correct n (usually one or two errors), and the respective hamming distance of 2n+1 (three or five respectively), we split the barcode into 2n+1 parts, and demand uniqueness for each of the parts. Uniqueness is less computationally complex to check, and ensures at least one difference per part, thus at least 2n+1 differences overall (i.e., a hamming distance of at least 2n+1). Such internal barcodes are longer than pre-planned barcodes, however they allow using the full length of the oligo sequence as coded peptide. While in comparison to not barcoding at all, this methodology removes the need to sequence the entire oligo in order to identify it (Fig 1B and 1C).

At the final stage of preparation, a constant prefix and suffix are added to each finalized oligo sequence, these include the restriction sites and primer binding region for library amplification. In order to produce the protein, a stop codon must appear downstream of the peptide, it is suggested to include this within the suffix, (at 3’ end), but it is not mandatory (in case C-terminal tags should be added or M13 phage display should be used). It is important to define in advance all relevant restriction sequences (including these restriction sites), so as not to allow them to be formed when coding the peptides into oligo, including in the barcodes.

We present BIPS, a configurable software to build PhIP-Seq libraries. It receives sets of proteins (or partial proteins) as inputs, creating a library of internally barcoded oligos, spanning the entire input protein set. Final oligos are marked with all the original proteins and locations they were derived from, and mapped to all proteins containing them (S3 Text), allowing for the analysis of the bound peptides and their source proteins. There is still a lot of room for exploration via PhIP-Seq of antibody repertoires, and BIPS facilitates achieving this goal.

Software implementation

All code was written in Python 3, and tested on multiple platforms (CentOS, IOS and Windows all in the Anaconda environment). The code consists of two main programs, the first converts multiple files of proteins to a single file of oligos for synthesis, and the second maps oligos to original files and peptides. The latter program takes a relatively long time, however it can be run offline. Both programs can utilize multithreading to reduce run-time, if available.

All functions in the code have accompanying unit-tests. To ensure all dependencies and code versions are in order, it is advisory to run all tests on the code on the available platform.

The program converting protein files to oligos takes as input from the configuration the path to a directory, where all protein files should be located. It is possible to add new protein files and re-run the program, adding their accompanying oligos to the library. However, it is not possible to change an existing input file without erasing the entire output directory first, and re-starting the library build. This limitation exists to maintain consistency between the input files and the complete library. At the end of the run a configured output directory will contain all output files created during the run. The different output files are described in the README.md file of the Data/Output directory, the most important of which is order_file.txt, a ready to order file containing only nucleotide sequences.

The mapping program runs on two files: oligos_sequence.csv and sequences_ids.csv, both of which are created in the library constructing program. The program goes over all oligos and all protein sequences and maps the oligos to all possible perfect matches. This program produces a single mapped oligo file in the output directory under the name mapped_oligos_sequence.csv.

After producing the library and performing PhageIPSeq experiments FASTQ files should be returned from sequencing machines. We supply an example code to identify the barcode. We note that this specific code example is configured to default configurations and should provide a roadmap to identifying sequencing results rather than be used directly.

Barcode length and error correction

Our unique divide-and-conquer method allows a low-computational method of ensuring the required hamming distance between every pair of barcoded oligos.

The idea is to ensure the correction of up to n-errors by using barcodes at hamming distance (2n+1) from each other, by dividing the length of the barcode in (2n+1) parts, as equally as possible, and never having two oligos with an identical sub-sequence at any of the barcode parts. Since checking for identity is easy (using a hash function, or inherently in code structures like python set) then it is computationally easy to ensure each part has at least one difference from any other oligo, and together at least (2n+1) differences.

For example, a barcode length of 75 bps, divided into 5 sections of correcting for 2 read errors, and a random protein library, millions of oligos can be barcoded in this way (at most 415≈109, i.e. a billion). In practice, due to the uneven amino acid probabilities, and the amount of redundancy in the specific library, it is possible that some sequences will fail to code, for example there is only one way to code “MMMMM” so if two peptides beginning with “MMMMM” exist in the library and the first part of the barcode is of length 15 base pairs (5 amino acids), then the second peptide will not be codable.

In essence there are three parameters here that need to be balanced, the length of the barcode (and thus the length one needs to sequence after immuno-precipitation), the number of errors one wants to correct, and the size and richness of the library. If the library contents are not diverse (for example a database which includes many versions of the same protein, e.g. many wheat protein variants as may appear in an allergens database), then a longer barcode will be needed to allow enough coding options.

Configuration

A single configuration file allows defining all parameters: length of peptide, and overlap, in amino acids, location and lengths of barcode parts, codon table and frequencies, restricted sequences, and a prefix and suffix for all final oligonucleotides. As well as the input and output directories.

Results

We tested the performance of BIPS on both simulated and real data.

Barcode construction simulations

We simulated library and barcode construction to evaluate upper bounds of possible barcodes in a library. To that end, we first created random amino-acid sequences by sequentially randomly choosing each amino acid from E. coli amino-acid frequencies. From those we constructed barcodes for each sequence. Upon the first case of barcode construction failure (meaning that at least one of the barcode parts was non-unique) we halted the simulations. If there were no barcode construction failures, we halted each single simulation upon reaching 107 sequences.

We note that all simulations, although providing insights about barcoding limitations, do not represent real life protein sequences. That is, in common proteins, the amino-acid order is non-random, and furthermore they often include recurring motifs.

For the first set of simulations (Fig 2B), for each total barcode length we created barcodes split into 3, 5 and 7 parts. These in turn allow correction of 1, 2 or 3 errors respectively. We performed the simulations for total barcode lengths of 5aa-10aa (15–30 base-pairs). The split of barcode of total length l base pairs into m = 2n+1 barcode parts was performed by use of lm+II(i<(l%m)) base pairs for barcode part i. We note that some of the splits do not necessarily start or end at an amino-acid boundary (e.g., for a 6aa barcode of 5 parts barcodes will be of [4, 4, 4, 3, 3] base pairs).

For the second set of simulations (Fig 2C), each simulation maintained a constant length of each barcode part. We performed the simulations for barcode part lengths ranging from 1aa to 4 aa. For each such simulation we constructed barcodes made of 1, 3 or 5 such parts. Note, that in this case, unlike in the first set of simulations, each barcode part includes complete amino-acid sequences and does not contain partial codons. Furthermore, in this simulation the barcode’s total length is dependent on the number of barcode parts and is not constant.

Performance on real data

For understanding BIPS performance on real data we obtained two separate datasets from IEDB, infectious diseases and allergic diseases (S1 Text). We ran BIPS separately on each of these inputs, creating two separate libraries (with default parameters including oligo_aa_length = 64, oligo_aa_overlap = 20, and a 5-part barcode at the 5’ end of 15bp per part). Codon frequencies were calculated on the oligo sequences without the amplification prefix and suffix. The frequencies were compared to theoretical frequencies [11] (Fig 2A).

For IEDB’s infectious diseases: of the 1,249 full protein sequences two could not be used (uniprot A0A2D4C4Z9 contained a non-amino acid letter “X”, uniprot Q8I4R2 had no amino acid sequence). The remaining 1,247 proteins were cut into a total of 24,456 peptides.

Of the 24,456 peptides, 51 failed in conversion to a nucleotide sequence, because of restriction sites. After the initial coding 400 had to be recoded because of barcode collisions, of them 180 failed to recode. This run took 7 minutes on CentOS.

For the IEDB’s allergic diseases: Of the 240 proteins two could not be used (no amino acid sequence for uniprot M0U687 and Q9URH1). The remaining 238 proteins were cut into 1,745 peptides.

Of the 1,745 peptides, 1 failed in conversion to a nucleotide sequence, because of restriction sites. After the initial coding 10 had to be re-coded because of barcode collisions, all succeeded. This run took 10 seconds on CentOS.

Availability and future directions

BIPS is open-source and freely available under the GNU public library; it is maintained on GitHub, enabling bug-reporting and community collaboration. It can be found at https://github.com/kalkairis/BuildPhIPSeqLibrary. The entire software was developed in Python 3, and includes internal tests and use-cases for the benefit of users.

We plan to use BIPS in order to design a new PhIP-Seq library for a new cohort within our group. In the future we intend to develop a publicly available software to analyze results from PhIP-Seq sequencings.

Dependencies

Code uses standard Python 3 libraries, in addition to: pandas (1.3.5), numpy (1.21.5), regex (2022.1.18). Code has been tested on Windows, IOS and CentOS.

Supporting information

S1 Text. Detailed information of test data use for checking this software library.

(DOCX)

S2 Text. Explanation of the misidentification problems, which might occur when adding a barcode at the 3’ end.

(DOCX)

S3 Text. Description of extra software supplied in order to identify the oligo source of each barcode sequenced, and all potential protein sources it may have been derived from.

(DOCX)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Larman HB, Zhao Z, Laserson U, Li MZ, Ciccia A, Gakidis MAM, et al. Autoantigen discovery with a synthetic human peptidome. Nat Biotechnol. 2011;29: 535–541. doi: 10.1038/nbt.1856 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Xu GJ, Kula T, Xu Q, Li MZ, Vernon SD, Ndung’u T, et al. Viral immunology. Comprehensive serological profiling of human populations using a synthetic human virome. Science. 2015;348: aaa0698. doi: 10.1126/science.aaa0698 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Klompus S, Leviatan S, Vogl T, Mazor RD, Kalka IN, Stoler-Barak L, et al. Cross-reactive antibodies against human coronaviruses and the animal coronavirome suggest diagnostics for future zoonotic spillovers. Sci Immunol. 2021;6. doi: 10.1126/sciimmunol.abe9950 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Vogl T, Klompus S, Leviatan S, Kalka IN, Weinberger A, Wijmenga C, et al. Population-wide diversity and stability of serum antibody epitope repertoires against human microbiota. Nat Med. 2021;27: 1442–1450. doi: 10.1038/s41591-021-01409-3 [DOI] [PubMed] [Google Scholar]
  • 5.Singh H, Ansari HR, Raghava GPS. Improved method for linear B-cell epitope prediction using antigen’s primary sequence. PLoS ONE. 2013;8: e62216. doi: 10.1371/journal.pone.0062216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Paull ML, Bozekowski JD, Daugherty PS. Mapping antibody binding using multiplexed epitope substitution analysis. J Immunol Methods. 2021;499: 113178. doi: 10.1016/j.jim.2021.113178 [DOI] [PubMed] [Google Scholar]
  • 7.Mohan D, Wansley DL, Sie BM, Noon MS, Baer AN, Laserson U, et al. PhIP-Seq characterization of serum antibodies using oligonucleotide-encoded peptidomes. Nat Protoc. 2018;13: 1958–1978. doi: 10.1038/s41596-018-0025-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shrock E, Fujimura E, Kula T, Timms RT, Lee I-H, Leng Y, et al. Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity. Science. 2020;370. doi: 10.1126/science.abd4250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.El-Manzalawy Y, Dobbs D, Honavar V. Predicting flexible length linear B-cell epitopes. Comput Syst Bioinformatics Conf. 2008;7: 121–132. [PMC free article] [PubMed] [Google Scholar]
  • 10.Hamming RW. Error Detecting and Error Correcting Codes. Bell System Technical Journal. 1950;29: 147–160. doi: 10.1002/j.1538-7305.1950.tb00463.x [DOI] [Google Scholar]
  • 11.Nakamura Y, Gojobori T, Ikemura T. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 2000;28: 292. doi: 10.1093/nar/28.1.292 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010663.r001

Decision Letter 0

Dina Schneidman

3 Jul 2022

Dear Ms Leviatan,

Thank you very much for submitting your manuscript "BIPS - a Code Base for Designing and Coding of a Phage Immuno-Precipitation Oligo Library" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Dina Schneidman

Software Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: BIPS is a comprehensive tool to generate a Phage Immuno-precipitation library from Protein/peptide sequences of interest. The tool generates a library with desired oligo length and overlap.

Novelty of the code lies in the embedded barcode. In commonly used methods the barcode is at the 5p end of the sequences that reduces the effective length of the oligo or on the 3p end that makes paired end sequencing necessary.

This tool also ensures appropriate hamming distance between oligos.

This tool generates a barcode of desired length within the sequence with an elegant use of codon optimization table. Internal barcoding reduces sequencing length requirement.

I have some minor comments:

1. Does the codon optimization reduce the probability of expression of that oligo, if there are large number of similar sequences?

2. Would it be possible to use the code without using internal barcoding strategy?

Reviewer #2: Leviatan et al have developed an informatics pipeline for converting protein sequences into DNA sequences encoding peptides to tile across the desired proteins (BIPS). The DNA sequences are then to be cloned en masse into phage display for the PhIP-Seq assay. PhIP-Seq has emerged as a powerful antibody profiling technology, so open source tools for library design will certainly be welcomed by users – especially since there remains a lack of established computational tools and standards. However, there are several unfortunate shortcomings of this manuscript that must be addressed before it is suitable for publication.

Major Issues

• Perhaps the most critical problem with this paper is that the open-source pipeline currently in use for PhIP-Seq library design, pepsyn (Mohan et al. Nat Prot), is not even referenced. Since it is not referenced, BIPS cannot be directly compared to the de facto standard.

• A second critical issue is that codon-based barcoding has been incorporated into the design of multiple previously published PhIP-Seq libraries. It has been utilized in at least two contexts of which I am aware. In the first, Xu et al (Science) used codon barcoding to distinguish alanine scanning variants of public epitopes. In the second example, Shrock et al (Science) used codon barcoding to distinguish duplicate peptides that enhance assay robustness. Xu is referenced but not in the context of codon barcoding, and Shrock is not referenced at all. Since these prior examples are not referenced, the approach used by BIPS cannot be compared or contrasted.

• The overall rationale for embedding codon barcodes is not articulated. It seemed to me that the main reason the authors would use codon-barcoding is for disambiguation of similar library members. However, for typical libraries and typical read lengths (say 50 nucleotides at minimum, but typically longer reads at incremental additional cost), the number of ambiguous library members would be extremely small. Of course, any number of ambiguous sequences is undesirable, but rescuing these sequences might not be the most compelling rationale. I suggest the authors emphasize additional reasons why codon barcoding is useful. Mutation scanning, deep mutagenesis, and peptide duplication are what I think of as the most important examples, but there are likely others. Reducing read length requirements might also be compelling to some users, but more specifics should be provided on the actual benefit for standard libraries and standard read lengths.

• The authors are inconsistent with the PhIP-Seq abbreviation. PhageIP-Seq is used occasionally, but it is unclear if this is intentional or in error. The most commonly used long form seems to be Phage ImmunoPrecipitation Sequencing (with no hyphen like what is in the title).

Minor Issues

• In the first paragraph, it is mentioned that peptide arrays have at most a few thousand antigens. There are certainly examples of more than a few thousand antigen arrays.

• Padding shorter peptides with random amino-acid sequences downstream of a stop codon is also a feature highlighted in the pepsyn software so should be referenced as such.

• Typo “DNWA”

• It is mentioned that the 3’ suffix must start with a stop codon. However, many users may want to have an epitope tag or some other fusion peptide C-terminal to the peptide. Requiring a stop codon would also make the design incompatible with M13 phage display, which may also be of interest to some of your readers.

• It is not clear how restriction sites are avoided during the peptide coding process. Could you provide additional information?

• I had a hard time evaluating Fig 2b/c since the font was so small.

• I believe the Y-axis title of Fig 2b has a typo.

• The term “infectious diseases” is used where I think it is meant “Infectious organisms”.

• In the paragraph above “Code Availability” section, FASTA files are referred to, but I believe it is meant “FASTQ”.

• Define “IRIs”.

• At the end of the Test Case section, the number 24,456 appears to have been incorrectly copied from above. The replacement should be 1,745 if I am not mistaken.

• In the “Finding the Source of a Barcode” section, the authors suggest doing sequencing read mapping by alignment with errors allowed. However, certain oligo synthesis errors may prematurely truncate the peptide or be otherwise deleterious. So the pros and cons of taking this approach should be presented.

• Fig. 2 only referred to only in Supplementary/Methods?

• A typical peptide length and corresponding barcode length should be given with numbers. The qualitative descriptions are nice to read but some numbers to get an order of magnitude would be helpful.

• The sentence “Figures were created with Biorender.com” seems to be in the wrong place

• It seems in the text that the software has three options for barcoding, but after checking the code, it seems only be able to do the third. Please clarify.

• The example and default configurations should be mentioned in the methods text

• hamming distance (2n+1) … please explain n for the unfamiliar reader.

• Please clarify the meaning of this sentence: Due to computational limitations we performed each single simulation up to 107 sequences

• “Multiple inserts” should be defined for the unfamiliar reader.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No: I did not attempt to confirm

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010663.r003

Decision Letter 1

Dina Schneidman

18 Sep 2022

Dear Ms Leviatan,

We are pleased to inform you that your manuscript 'BIPS - a Code Base for Designing and Coding of a Phage ImmunoPrecipitation Oligo Library' has been provisionally accepted for publication in PLOS Computational Biology.

Please address the Reviewer 2 comment in the final version.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Dina Schneidman

Software Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have comprehensively addressed my concerns. I believe they have also addressed those of the other reviewer. I commend them for this effort.

Reviewer #2: The authors have addressed the issues I raised. However, in testing their code, I discovered a flaw that either should be fixed. The peptide coding nucleotide sequences are selected to avoid certain sequences such as those that will be used for downstream cloning. Currently, this happens before addition of prefix and suffix sequences. It is therefore possible (and quite likely in fact) that the avoided sequences are inadvertently introduced at the junction of the prefix and peptide encoding sequence and/or at the junction between the peptide coding sequence and the suffix. This can of course lead to truncation or complete loss of peptides during the library cloning process.

Minor point. In response to one my comments, the authors write:

"We stand corrected. ELISAs are usually limited to a few hundreds or thousands of oligos, but other

peptide arrays can indeed get to tens of thousands."

I'm not sure what this is supposed to mean about ELISAs, since they don't use oligos and they typically are not done for more than a very small number of antigens (1 to 10). My point in the original critique was about peptide arrays. The authors said these are limited to tens of thousands, but I have seen some with many more (e.g. those produced with micromirrors and photochemistry). But it is true that most peptide or protein arrays are in the 10s of thousands of antigens or less.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010663.r004

Acceptance letter

Dina Schneidman

3 Nov 2022

PCOMPBIOL-D-22-00602R1

BIPS - a Code Base for Designing and Coding of a Phage ImmunoPrecipitation Oligo Library

Dear Dr Leviatan,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Detailed information of test data use for checking this software library.

    (DOCX)

    S2 Text. Explanation of the misidentification problems, which might occur when adding a barcode at the 3’ end.

    (DOCX)

    S3 Text. Description of extra software supplied in order to identify the oligo source of each barcode sequenced, and all potential protein sources it may have been derived from.

    (DOCX)

    Attachment

    Submitted filename: Point_by_point.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES