Abstract
The ABCD (for AntiBodies Chemically Defined) database is a repository of sequenced antibodies, integrating curated information about the antibody and its antigen with cross-links to standardized databases of chemical and protein entities. It is freely available to the academic community, accessible through the ExPASy server (https://web.expasy.org/abcd/). The ABCD database aims at helping to improve reproducibility in academic research by providing a unique, unambiguous identifier associated to each antibody sequence. It also allows to determine rapidly if a sequenced antibody is available for a given antigen.
INTRODUCTION
Antibodies are one of the most widespread tools used in biological sciences. However, they are currently deemed one of the major culprits in the reproducibility crisis plaguing bio-medical research (1). Problems include batch-to-batch variability, poorly characterized and/or non-validated antibodies that sometimes do not recognize the presumptive target, or recognize more than one target, lack of explicitly described procedures adapted to each antibody, decreasing scrutiny of results by scientists and misleading antibody nomenclature. The 2 million antibodies available on the market might represent as few as 250'000 actual clones (1).
Standardized guidelines for antibody validation have been proposed to reduce reproducibility issues. These guidelines delineate a working framework to define antibody specificity and functionality for different research applications (2). In order to apply these guidelines, it is of course necessary that each antibody is identified easily and unambiguously.
Although the scientific community is well aware of this serious problem, few concerted solutions have appeared until now. The most advanced initiatives for centralizing information of antibodies are probably the portals Antibodypedia (3) and Antibody Registry [http://antibodyregistry.org/]), but both still rely largely on information provided by commercial vendors (such as antibody clone names). They also include an overwhelming majority of unsequenced or polyclonal antibodies, whose identity is difficult to clearly establish.
One of the solutions for this problem is to employ only sequenced antibodies that are unambiguously defined by their primary amino-acid sequence (4,5). In this way, researchers can be sure to be using the exactly same binding reagent. While it seems unlikely that systematic characterization of millions of antibodies will be achieved, for the estimated 20 000 currently described chemically defined (i.e. sequenced) monoclonal antibodies, the goal would seem more attainable. The IMGT database (created decades ago by Marie-Paule Lefranc and colleagues (6)) is an invaluable knowledge resource on sequences of immunoglobulins, but it is primarily aimed at studying the diversity of immune molecules, rather than their binding specificity.
Our goal is to provide the academic community with a wider access to recombinant, chemically defined antibodies (7). For this the recently launched ABCD database lists publicly available sequenced antibodies, and provides for each antibody a unique identifier and a link to its antigenic target.
OVERVIEW OF DATABASE CONTENT
The ABCD database is, to our knowledge, the first effort to provide freely accessible, curated information on chemically defined antibodies (i.e. antibodies with a known primary amino-acid sequence) connected with their antigenic target, which can be either a protein (linked to an UniProtKB unique identifier (UID) [(8), https://www.uniprot.org/]) or a chemical entity (linked to a ChEBI UID [(9), https://www.ebi.ac.uk/chebi/]).
Each ABCD entry corresponds to a unique primary amino-acid sequence, defined by a unique ABCD identifier. For each entry, information about the antigen and about the antibody are provided (Figure 1).
Figure 1.
Examples of antibody entries. Each entry has a unique identifier with the format ABCD_[A-Z][A-Z][0–9][0–9][0–9]. The Antibody table contains names and synonyms of the antibody, a published reference (with a link to PubMed, in case of scientific papers, or to the WIPO database, in case of a patent), and technical applications. The antibody sequence (see Figure 2 legend) is available on the Cross-references and Publications links provided, or upon request. (A) Target is a Protein: the Antigen table contains the name and species of target, a link to the UniProtKB UID, and information on the epitope when available. (B) Target is a Chemical: the Antigen table contains the target name, a link to the ChEBI UID, and information on the epitope when available.
Regarding the antibody, in addition to its ABCD identifier, the following information is given:
recommended name (most frequently, the name provided in the referenced publication) and a list of synonyms;
technical applications for which the antibody has been used (by no means an exhaustive inventory, as it lists only the applications described on the referenced publications);
at least one bibliographic reference (either a published scientific article—with a PubMed UID or a Digital Object Identifier (DOI)—or a patent, with a link to the WIPO database) in which the antibody sequence is provided. Note that this is not meant to be a comprehensive list of all the publications describing a given antibody;
cross-references to other databases (listed in Table 1).
Table 1.
List of databases and websites used as source of information or cross-reference
| Database | Link | Data use | Ref. |
|---|---|---|---|
| Abysis | www.bioinf.org.uk/abysis2.7/ | Source for Kabat sequences | (16) |
| Addgene | www.addgene.org | Source for antibody sequences inside vectors | (17) |
| Cellosaurus | web.expasy.org/cellosaurus/ | X-ref for hybridomas | (18) |
| ChEBI | www.ebi.ac.uk/chebi/ | X-ref for chemical targets | (9) |
| DigIt | circe.med.uniroma1.it/digit/ | Source for sequences of annotated variable domains | (19) |
| IMGT/mAb-DB | imgt.org/mAb-DB/ | Source for therapeutic antibody sequences | (6) |
| InterPro | www.ebi.ac.uk/interpro/ | X-ref for domains | (20) |
| NCBI Taxonomy | www.ncbi.nlm.nih.gov/Taxonomy/ | X-ref for species taxonomy | (21) |
| PROSITE | prosite.expasy.org | X-ref for domains | (22) |
| PubMed | www.ncbi.nlm.nih.gov/pubmed/ | X-ref for publications Source for published sequences | (21) |
| RAN | recombinant-antibodies.org | Source for Recombinant Antibody Network antibodies | (12) |
| RCSB/PDB | www.rcsb.org/pdb/ | X-ref for 3D structures Source for published sequences | (23) |
| UniProt | www.uniprot.org | X-ref for protein targets | (8) |
| WIPO Patents | patentscope.wipo.int | X-ref for patent publications | — |
Regarding the antigen, the following is given:
type of target (if a protein or a chemical);
name of the antigen (and, in the case of a protein, also the species against which the antibody was produced);
link to UniProtKB (for a protein) or ChEBI (for a chemical) databases;
when available, information about the epitope recognized (for example, a domain or a specific amino-acid subsequence).
The antibody amino-acid sequence can be obtained in the links to the publications and the databases used as source (this is extensively explained on our FAQ section, with links and examples on how to obtain any given sequence). Alternatively, the information is also available upon request by email (via our Contact form). The stored information corresponds to the sequence of the variable region of both the heavy and light chains (or, in the case of camelid antibodies or nanobodies, the sequence of the unique variable chain) (Figure 2). When needed, definition of heavy and light chain boundaries, based on alignment with germline sequences, was done using the VBASE2 server (10).
Figure 2.
Antibody sequence information. (A) An immunoglobulin consists of constant (C, in gray) and variable (V, in blue and green) chains. The paratope (or specific binding site) of an antibody is located at the variable moiety of the light (VL) and heavy (VH) chains. (B) The ABCD database stores as sequence information the amino-acid sequence of both VL and VH chains (the example given corresponds to sequence of entry ABCD_AI179, the anti-cMyc 9E10 clone).
The ABCD database is populated with data coming from (see Table 1 for a list of source databases): (i) sequences published in scientific articles or patents; (ii) 3D structural data; (iii) a few publications and repositories of large-scale phage display or hybridoma sequencing projects (11–15). We only include sequenced antibodies with a known and defined target. However, the source of such information is of variable quality, and we encourage users to verify (and to publish) the reactivity of each antibody that they use.
DATABASE DESIGN AND IMPLEMENTATION
The ABCD database is developed by the Geneva Antibody Facility team (https://www.unige.ch/medecine/antibodies/), in collaboration with the CALIPHO and Swiss-Prot groups at the Swiss Institute of Bioinformatics (https://www.sib.swiss/). The database is available at the ExPASy web server (https://web.expasy.org/abcd/).
Data is indexed for full text search using the Apache Lucy search engine library in PERL (https://lucy.apache.org/). This is a ‘loose C’ port of the Apache Lucene™ search engine library for Java. The query interface and entry display is implemented on the ExPASy server using PERL CGI scripts.
The ABCD database website consists of a simple, user-friendly interface. Each antibody page is dynamically linked to external resources and databases (see Table 1). Entries can be searched by antibody name, antigen name, antigen species, UniProtKB or ChEBI UIDs, epitope information and reference UID (PubMed, DOI or Patent), via a full-text search field.
The current release (v 4.0) contains 10′525 entries, referencing 9′076 proteins (1′642 unique UniProtKB UIDs) and 1′203 chemicals (261 unique ChEBI UIDs).
CONCLUSION AND PERSPECTIVES
We believe that this initiative is a valuable step in setting up a centralized repository of sequenced antibodies, allowing the unique and unambiguous identification of binding reagents for research and publication purposes.
Depositing or publishing the sequence information of any given antibody should be a required step during any antibody characterization procedure; careful and thorough validation is still obligatory, but knowing the precise identity of a given reagent would allow others to repeat the exact same experiment.
All entries in the ABCD database are manually curated and, hence, the database growth is linear and slow. Using computational approaches is not a desirable strategy: defining the identity of a given antibody targets is a cumbersome process, involving extensive literature mining, a process that is not easily automatized. One approach to allow for a faster inclusion of entries is to promote the submission of sequences by colleagues around the world, originating from large-scale discovery projects or sequencing of hybridomas or purified antibodies.
FUNDING
ProCare Foundation. Funding for open access charge: Swiss National Science Foundation (31003A-172951).
Conflict of interest statement. None declared.
REFERENCES
- 1. Baker M. Reproducibility crisis: Blame it on the antibodies. Nature. 2015; 521:274–276. [DOI] [PubMed] [Google Scholar]
- 2. Uhlen M., Bandrowski A., Carr S., Edwards A., Ellenberg J., Lundberg E., Rimm D.L., Rodriguez H., Hiltke T., Snyder M. et al.. A proposal for validation of antibodies. Nat. Methods. 2016; 13:823–827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bjorling E., Uhlen M.. Antibodypedia, a portal for sharing antibody and antigen validation data. Mol. Cell Proteomics. 2008; 7:2028–2037. [DOI] [PubMed] [Google Scholar]
- 4. Bradbury A., Pluckthun A.. Reproducibility: standardize antibodies used in research. Nature. 2015; 518:27–29. [DOI] [PubMed] [Google Scholar]
- 5. Weller M.G. Quality issues of research antibodies. Anal. Chem. Insights. 2016; 11:21–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lefranc M.P., Giudicelli V., Duroux P., Jabado-Michaloud J., Folch G., Aouinti S., Carillon E., Duvergey H., Houles A., Paysan-Lafosse T. et al.. IMGT(R), the international ImMunoGeneTics information system(R) 25 years on. Nucleic Acids Res. 2015; 43:D413–D422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Cosson P., Hartley O.. Recombinant antibodies for Academia: A practical approach. Chimia (Aarau). 2016; 70:893–897. [DOI] [PubMed] [Google Scholar]
- 8. UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017; 45:D158–D169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Hastings J., Owen G., Dekker A., Ennis M., Kale N., Muthukrishnan V., Turner S., Swainston N., Mendes P., Steinbeck C.. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016; 44:D1214–D1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Retter I., Althaus H.H., Munch R., Muller W.. VBASE2, an integrative V gene database. Nucleic Acids Res. 2005; 33:D671–D674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Andrews N.P., Boeckman J.X., Manning C.F., Nguyen J.T., Bechtold H., Dumitras C., Gong B., Nguyen K., van der List D., Murray K.D. et al.. A toolbox of IgG subclass-switched recombinant monoclonal antibodies for enhanced multiplex immunolabeling of brain. Elife. 2019; 8:e43322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Hornsby M., Paduch M., Miersch S., Saaf A., Matsuguchi T., Lee B., Wypisniak K., Doak A., King D., Usatyuk S. et al.. A high Through-put platform for recombinant antibodies to folded proteins. Mol. Cell Proteomics. 2015; 14:2833–2847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Jain T., Sun T., Durand S., Hall A., Houston N.R., Nett J.H., Sharkey B., Bobrowicz B., Caffry I., Yu Y. et al.. Biophysical properties of the clinical-stage antibody landscape. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:944–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Schoenherr R.M., Saul R.G., Whiteaker J.R., Yan P., Whiteley G.R., Paulovich A.G.. Anti-peptide monoclonal antibodies generated for immuno-multiple reaction monitoring-mass spectrometry assays have a high probability of supporting Western blot and ELISA. Mol. Cell Proteomics. 2015; 14:382–398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Schofield D.J., Pope A.R., Clementel V., Buckell J., Chapple S., Clarke K.F., Conquer J.S., Crofts A.M., Crowther S.R., Dyson M.R. et al.. Application of phage display to high throughput antibody generation and characterization. Genome Biol. 2007; 8:R254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Swindells M.B., Porter C.T., Couch M., Hurst J., Abhinandan K.R., Nielsen J.H., Macindoe G., Hetherington J., Martin A.C.. abYsis: Integrated antibody sequence and Structure-Management, analysis, and prediction. J. Mol. Biol. 2017; 429:356–364. [DOI] [PubMed] [Google Scholar]
- 17. Kamens J. The Addgene repository: an international nonprofit plasmid and data resource. Nucleic Acids Res. 2015; 43:D1152–D1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Bairoch A. The cellosaurus, a Cell-Line knowledge resource. J. Biomol. Tech. 2018; 29:25–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Chailyan A., Tramontano A., Marcatili P.. A database of immunoglobulins with integrated tools: DIGIT. Nucleic Acids Res. 2012; 40:D1230–D1234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Mitchell A.L., Attwood T.K., Babbitt P.C., Blum M., Bork P., Bridge A., Brown S.D., Chang H.Y., El-Gebali S., Fraser M.I. et al.. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2019; 47:D351–D360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Sayers E.W., Agarwala R., Bolton E.E., Brister J.R., Canese K., Clark K., Connor R., Fiorini N., Funk K., Hefferon T. et al.. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019; 47:D23–D28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Sigrist C.J., de Castro E., Cerutti L., Cuche B.A., Hulo N., Bridge A., Bougueleret L., Xenarios I.. New and continuing developments at PROSITE. Nucleic Acids Res. 2013; 41:D344–D347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E.. The Protein Data Bank. Nucleic Acids Res. 2000; 28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]


