Skip to main content
Patterns logoLink to Patterns
. 2022 Jul 8;3(7):100535. doi: 10.1016/j.patter.2022.100535

Artificial intelligence for antibody reading comprehension: AntiBERTa

Yoonjoo Choi 1,
PMCID: PMC9278504  PMID: 35845838

Abstract

Utilizing publicly available antibody big data resources, Leem et al. (2022) developed an antibody-specific language model, AntiBERTa, to understand the “language” of antibodies. Case studies reveal that AntiBERTa can be an extremely useful tool for antibody engineering.


Utilizing publicly available antibody big data resources, Leem et al. (2022) developed an antibody-specific language model, AntiBERTa, to understand the “language” of antibodies. Case studies reveal that AntiBERTa can be an extremely useful tool for antibody engineering.

Main text

There are a large number of players in the immune system to protect biological individuals from harmful foreign substances. Among those, the B cell is a main player in the adaptive immune system. B cells are equipped with receptor molecules (B cell receptor) that can be secreted upon activation. The secreted molecules, antibody, are known to be astronomically diverse (estimated 1013–1015).

The high diversity of the antibody is a two-faced Janus. The immune system can respond to nearly any type of antigen, mainly due to the large diversity of antibodies. According to Antibodypedia,1 4.6 million monoclonal antibodies are currently available for 19,000 genes. The diversity also enables antibodies to be highly successful as biotherapeutics. In 2021, FDA approved the 100th therapeutic antibody.2 The coronavirus pandemic has been currently boosting the development of therapeutic antibodies for COVID-19, and several new antibodies are waiting to treat SARS-CoV-2-infected patients.

On the other hand, such rich diversity may not be always advantageous. Despite the fact that antibodies have been (perhaps the most) extensively studied and the antibody-related biopharmaceutical industry continues to mature, there seem to be a lot of things to learn about antibodies, as evidenced in the increasing growth of papers with the publication keyword, “antibody.” It is simply practically impossible to experimentally explore the entire antibody repertoire. Thus, computational approaches using artificial intelligence (AI) techniques have become essential for antibody research.

The advancement of AI and big data are not separable. Recent advances in next-generation sequencing technology now enable the construction of a large volume of antibody repertoires. The observed antibody space (OAS) database3,4 is a compilation of known repertoire studies and databases. Since the release of OAS, many practical applications have been developed including computational antibody humanization using AI.5,6

The antibody repertoire big data resources also provide an in-depth view and biological insights into antibodies.7 Here, Leem et al. present an antibody-specific language model in a timely manner. AntiBERTa (antibody-specific bidirectional encoder representation from transformers) is a 12-layer transformer model pre-trained using the OAS database.8 In fact, there has been a language model for general proteins9 (ProtBERT), and there have been other antibody-specific language models, such as DeepAb10 and Sapiens.6 Comparing with those existing methods, however, AntiBERTa is more versatile and specific with deeper layers.

It is remarkable that AntiBERTa nicely partitions memory and naive B cells, whereas other models showed relatively less distinct results; i.e. the antibody-specific deep-layered model indeed learns the language of antibodies and finds the origin of B cell. One of the direct applications can be the estimation of antibody humanness and immunogenicity for the development of safer therapeutic antibodies. It is well known that antibodies with high human content tend to be less immunogenic. As demonstrated in the separation of memory and naive B cells, AntiBERTa is shown to be successful in classifying their species origin (murine, chimeric, humanized, and human).

The antibody-specific model generally provides better descriptions of antibodies than the general protein model. The authors found that residue pairs with high self-attention scores give structural insights into long-range interactions, which were not identified by the general protein model. The insight naturally leads to the prediction of paratopes, antigen binding sites. From several case studies, the authors showed that AntiBERTa successfully identifies paratope residues that are not in complementarity determining regions (CDR).

While the authors demonstrated that AntiBERTa outperforms other methods and provided convincing rationales, they also leave something to be desired. As the authors stated in the main manuscript, AntiBERTa can be directly applicable to antibody-structure prediction and humanization (or both at the same time), but the authors left it as potential applications. In the near future, we hope to meet practical application tools based on the AntiBERTa model.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (NRF-2020R1A5A2031185 and NRF-2020M3A9G3080281).

Declaration of interests

The author declares no competing interests.

References

  • 1.Björling E., Uhlén M. Antibodypedia, a portal for sharing antibody and antigen validation data. Mol. Cell. Proteomics. 2008;7:2028–2037. doi: 10.1074/mcp.m800264-mcp200. [DOI] [PubMed] [Google Scholar]
  • 2.Mullard A. FDA approves 100th monoclonal antibody product. Nat. Rev. Drug Discov. 2021;20:491–495. doi: 10.1038/d41573-021-00079-7. [DOI] [PubMed] [Google Scholar]
  • 3.Kovaltsuk A., Leem J., Kelm S., Snowden J., Deane C.M., Krawczyk K. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J. Immunol. 2018;201:2502–2509. doi: 10.4049/jimmunol.1800708. [DOI] [PubMed] [Google Scholar]
  • 4.Olsen T.H., Boyles F., Deane C.M. Observed Antibody Space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 2022;31:141–146. doi: 10.1002/pro.4205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Marks C., Hummer A.M., Chin M., Deane C.M. Humanization of antibodies using a machine learning approach on large-scale repertoire data. Bioinformatics. 2021;37:4041–4047. doi: 10.1093/bioinformatics/btab434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Prihoda D., Maamary J., Waight A., Juan V., Fayadat-Dilman L., Svozil D., Bitton D.A. BioPhi: a platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. mAbs. 2022;14:2020203. doi: 10.1080/19420862.2021.2020203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Marks C., Deane C.M. How repertoire data are changing antibody science. J. Biol. Chem. 2020;295:9823–9837. doi: 10.1074/jbc.rev120.010181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Leem J., Mitchell L.S., Farmery J.H., Barton J., Galson J.D. Deciphering the Language of Antibodies Using Self-Supervised Learning. Patterns. 2022;3:100513. doi: 10.1016/j.patter.2022.100513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Elnaggar A., Heinzinger M., Dallago C., Rehawi G., Wang Y., Jones L., Gibbs T., Feher T., Angerer C., Steinegger M., et al. ProtTrans: towards cracking the language of Life's code through self-supervised deep learning and high performance computing. IEEE Trans. Pattern Anal. Mach. Intell. 2021:1. doi: 10.1109/TPAMI.2021.3095381. [DOI] [PubMed] [Google Scholar]
  • 10.Ruffolo J.A., Sulam J., Gray J.J. Antibody structure prediction using interpretable deep learning. Patterns. 2022;3:100406. doi: 10.1016/j.patter.2021.100406. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Patterns are provided here courtesy of Elsevier

RESOURCES