Abstract
Summary
B-cell receptor (BCR) and T-cell receptor (TCR) repertoires are generated through somatic DNA rearrangements and are responsible for the molecular basis of antigen recognition in the immune system. Next-generation sequencing (NGS) of DNA and the falling cost of sequencing due to continued development of these technologies have made sequencing assays an affordable way to characterize the repertoire of adaptive immune receptors (sometimes termed the ‘immunome’). Many new workflows have been developed to take advantage of NGS and have placed the resulting immunome datasets in the public domain. The scale of these NGS datasets has made it challenging to search through the Complementarity-determining region 3 (CDR3), which is responsible for imparting specific antibody-antigen interactions. Thus, there is an increasing demand for sequence analysis tools capable of searching through CDR3s from immunome data collections containing millions of sequences. To address this need, we created a software package called ClonoMatch that facilitates rapid searches in bulk immunome data for BCR or TCR sequences based on their CDR3 sequence or V3J clonotype.
Availability and implementation
Documentation, software support and the codebase are all available at https://github.com/crowelab/clonomatch. This software is distributed under the GPL v3 license.
1 Introduction
The B-cell receptor (BCR) and T-cell receptor (TCR) repertoires are collections of immune cell surface proteins produced by the adaptive immune system of vertebrates, which regulate adaptive immune cell interactions and aid in neutralization and removal of pathogens and infected or aberrant cells. BCRs and TCRs both require two proteins (heavy and light chain or alpha and beta chain, respectively) to form a full receptor. BCRs and TCRs are formed through the process of V(D)J recombination that is then followed by a process known as affinity maturation (absent in TCRs). These processes increase the potential size of a BCR sequence repertoire to roughly 1013 (or roughly 1018 for the TCR sequence repertoire; Murphy, 2012).
While the entire sequence of both heavy and light chains is important for the overall function of BCRs or TCRs, it is the Complementarity-determining region 3 (CDR3) of the heavy chain that is often found to be the strongest contributor of an antibody (secreted BCR) to bind its respective antigen (Xu and Davis, 2000). Similarly, it is the CDR3 of TCRs that drive much of the binding to processed antigen. Since the CDR3 is largely responsible for antibody or TCR function, sequence searching for an antibody or TCR sequence with similar function should focus on this region.
In recent years, the growth in next-generation DNA sequencing technology has now made it possible to obtain billions of antibody or TCR sequences from a single experimental run. Several research groups have used these technologies to generate very large immune repertoire datasets that are in the public domain (Briney et al., 2019; Soto et al., 2019). There are several public repositories specializing in the data curation, dissemination and analysis of immune repertoire sequencing data (Corrie et al., 2018; Guo et al., 2019; Kovaltsuk et al., 2018). Given the increase in the amount of immune repertoire sequencing data publicly available, there is now a need for tools that make it easier to quickly and efficiently locate antibody or TCR sequences of interest within large collections of data.
In order to accomplish the task of matching sequences based on their CDR3 sequence, we developed ClonoMatch for searching through collections of immunome data for a given sequence of interest. The user has the option of carrying out a sequence search using just the CDR3 amino acid sequence or the V3J clonotype that includes V and J germline gene (ignoring allelic distinctions) information as well as the CDR3 amino sequence. ClonoMatch returns all antibody or TCR sequences from the collection of sequences that are within some sequence identity threshold. ClonoMatch is a versatile tool that allows users to download and search publicly available immunome data or to search through their own immunome data.
2 Implementation
ClonoMatch is implemented in Javascript on a MERN stack with Python3 being used on the server-side. The collection of antibody and TCR sequences was obtained from published studies (Briney et al., 2019; Soto et al., 2019, 2020a,b) and public repositories (Clark et al., 2016; Kovaltsuk et al., 2018). All sequences were processed using PyIR (Soto et al., 2020a,b) with default settings and imported into a MongoDB database resulting in roughly 962 million unique V(D)J sequences, 156 million unique CDR3 amino acid sequences and 208 million V3J clonotypes. To carry out a sequence search, the user only needs to specify a CDR3 amino acid sequence. Additionally, users have the option of selecting a V gene and a J gene of interest from the pull-down menu (Fig. 1). ClonoMatch uses the query sequence to search through preformatted BLAST (Altschul et al., 1990) databases composed of CDR3 amino acid sequences and grouped by V gene, J gene and CDR3 length. ClonoMatch returns antibody or TCR sequences matching the query sequence based on user-defined sequence identity and alignment coverage thresholds (Fig. 1). The user has the option of visualizing results directly online or downloading results into a CSV file or JSON file.
Fig. 1.
ClonoMatch serve is available https://clonomatch.accre.vanderbilt.edu/
3 Conclusions
We developed ClonoMatch to allow users to search through large antibody or TCR sequencing datasets. This tool was designed for users interested in querying publicly available datasets or their own repertoire datasets. Since the source code is open source, users can add their own functionality to ClonoMatch.
Acknowledgements
The authors thank users at the Vanderbilt Vaccine Center for their helpful feedback. The authors would also like to thank ACCRE for their technical support.
Funding
This work was supported by National Institutes of Health [U01 AI150739] and a grant from the Human Vaccines Project.
Conflict of Interest: J.E.C. has served as a consultant for GSK Vaccines, Sanofi-Aventis U.S., Pfizer, Novavax, Lilly and Luna Biologics, is a member of the Scientific Advisory Boards of CompuVax and Meissa Vaccines and is Founder of IDBiologics.
Contributor Information
Taylor Jones, Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
Samuel B Day, Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
Luke Myers, Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
James E Crowe, Jr, Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Department of Pathology, Microbiology, and Immunology, Vanderbilt University, Nashville, TN 37232, USA.
Cinque Soto, Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
References
- Altschul S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [DOI] [PubMed] [Google Scholar]
- Briney B. et al. (2019) Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature, 566, 393–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark K. et al. (2016) GenBank. Nucleic Acids Res., 44, D67–D72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corrie B.D. et al. (2018) iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunol. Rev., 284, 24–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo Y. et al. (2019) cAb-Rep: a database of curated antibody repertoires for exploring antibody diversity and predicting antibody prevalence. Front. Immunol., 10, 2365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kovaltsuk A. et al. (2018) Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J. Immunol., 201, 2502–2509. [DOI] [PubMed] [Google Scholar]
- Murphy K. (2012) Janeway's Immunobiology. New York and London: Garland Science, Taylor and Francis. [Google Scholar]
- Soto C. et al. (2019) High frequency of shared clonotypes in human B cell receptor repertoires. Nature, 566, 398–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soto C. et al. (2020a) High frequency of shared clonotypes in human T cell receptor repertoires. Cell Rep., 32, 107882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soto C. et al. (2020b) PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST. BMC Bioinformatics, 21, 314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu J.L., Davis M.M. (2000) Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities. Immunity, 13, 37–45. [DOI] [PubMed] [Google Scholar]

