Abstract
Antibodies are proteins that recognize the molecular surfaces of potentially noxious molecules to mount an adaptive immune response or, in the case of autoimmune diseases, molecules that are part of healthy cells and tissues. Due to their binding versatility, antibodies are currently the largest class of biotherapeutics, with five monoclonal antibodies ranked in the top 10 blockbuster drugs. Computational advances in protein modelling and design can have a tangible impact on antibody-based therapeutic development. Antibody-specific computational protocols currently benefit from an increasing volume of data provided by next generation sequencing and application to related drug modalities based on traditional antibodies, such as nanobodies. Here we present a structured overview of available databases, methods and emerging trends in computational antibody analysis and contextualize them towards the engineering of candidate antibody therapeutics.
Keywords: homology modelling, therapeutic antibodies, docking, antibody–antigen complexes, databases
Introduction
Antibodies are immune system proteins that recognize the surfaces of foreign molecules (antigens) for subsequent elimination from the organism during an adaptive immune response [1] or self-antigens from healthy tissues in autoimmune diseases [2]. The antibodies have evolved to be versatile binders, capable of recognizing a wide variety of molecular surfaces [3]. Because of such favorable binding properties, antibodies have been harnessed for therapeutic purposes and are currently the largest class of biotherapeutics. Five of the current top-selling blockbusters are monoclonal antibodies: adalimumab and infliximab (anti-TNFα), rituximab (anti-CD20), bevacizumab (anti-VEGF), trastuzumab (anti-HER2/neu) and their market presence is still expanding [4].
Continued exploitation of antibodies for therapeutic purposes relies on more efficient ways to develop these molecules. Computational approaches hold promise in advancing the field by providing faster results than arduous experimental approaches that are the current standard in antibody discovery [5]. Established structural bioinformatics methods such as homology modelling [6, 7], protein–protein docking [8, 9] or protein interface prediction [10] are already used for rational antibody design [11–13]. There are also pharmaceutically focused computational approaches that aid in assessing the immunogenicity [14] and biophysical properties [15] of antibodies. The increasing number of structural [16], sequence [17] and experimental data [18–21] on antibodies being deposited in the public domain provides the necessary foundation for improving such data-driven methods. Specifically, the recent decade saw the advent of next generation sequencing (NGS) of B-cell receptor (antibody) repertoires [22]. NGS provides a snapshot of millions of antibody sequences sampled from the theoretically possible 1012–1015 antibody sequences in a human repertoire [23, 24]. Discerning the biases of antibody repertoires can provide insights into the natural diversity of the immune system [25]. Among others, such natural preferences can be used as a reference to assess the biophysical properties of therapeutic antibodies [13] or to develop naturally focused surface display libraries [26]. The accumulated methodology of computational antibody protocols could potentially be applied to novel antibody formats with intrinsically better biophysical properties, such as nanobodies [27]. Altogether, computational antibody analysis methods have matured enough to allow for wider applications in therapeutic development.
In this review we give a structured overview of the currently available databases, algorithms and resources for computational analysis and design of antibodies and the prediction of their binding mode to antigens. We provide context to these methods in the form of current efforts for therapeutic antibody design and delineate the emerging trends in the field.
Antibody structure, function and therapeutic formats
Antibodies, or Immunoglobulins (Ig), are produced in jawed vertebrates by B-cells. Each of the estimated 5 × 109 B-cells in an organism [23] produces a distinct B-cell receptor (membrane-bound) or antibody (in soluble form) through somatic recombination of variable (V), diversity (D), joining (J) and constant (C) gene segments [28, 29]. The process of V(D) J recombination results in a Heavy (H) chain, assembled from V, D, J and C gene segments from the H chain locus, and a Light (L) chain, assembled from V, J and C gene segments from one of the L chain loci. In humans, the H and L chains can naturally assemble into five isotypes: IgG, IgD, IgE (all three monomers), IgA (dimer) and IgM (pentamer) [30]. The most biologically frequent format, IgG, consists of one crystallizable (Fc) and two antigen binding (Fab) fragments and is illustrated in Figure 1A.
Figure 1.
Antibody structure and binding. (A) Antibodies in soluble form often adopt the IgG isotype, a Y-shaped molecule consisting of two heavy chains (blue and amber) and two light chains (green and magenta). Each IgG molecule can be subdivided into an Fc and two Fab fragments through papain cleavage of the (hinge) region between these. At each end of a Fab fragment is a variable domain (VH/VL) involved in antigen binding. (B) Structure of an antibody VH(blue)/VL(magenta) in complex with cognate antigen (grey). The antibody paratope (light green) and antigen epitope (light brown) are highlighted. (C) Structure of an antibody VH(blue)/VL(magenta) highlighting the six hypervariable loops that make up the paratope; CDRH1 (white), CDRH2 (red), CDRH3 (amber), CDRL1 (green), CDRL2 (light blue), CDRL3 (yellow). (D) Comparison of antibody VH/VL domain (grey) and nanobody (red) structures. Nanobodies are devoid of the light chain, thus all the binding is mediated by the VH-homologous portion including its three CDR loops (CDRH1–3).
The Fab contains the H and L chain variable segments (VH and VL) that bind their cognate surface on the antigen, the epitope (Figure 1B). The VH and VL each harbor three hypervariable loops that define the complementarity determining regions (CDRs). The CDRs contain the majority of the antigen-binding residues, or paratope (Figure 1B–C). Upon antigen exposure, the antibody-producing B-cells undergo a natural process of affinity maturation, based on somatic hypermutation [31]. This process introduces mutations primarily in the CDR regions, which develop a specific and high-affinity binder. Together with the diversity introduced by V(D) J recombination, somatic hypermutation can produce an estimated 1012–1015 possible antibody sequences [23, 24]. The large number of diverse antibodies in an organism increases the probability of recognizing an arbitrary foreign antigen, thus initiating an immune response.
Despite the intrinsic sequence diversity in the CDRs, all hypervariable loops with the exception of CDRH3 adopt a constrained set of conformations termed canonical classes [32]. CDRH3 possesses the highest sequence and structural diversity of the six CDRs [33] and is very important for antigen recognition [34, 35]. Due to their central role in antigen recognition and binding, CDR loops undergo the most extensive engineering during development of monoclonal antibody (mAb) therapeutics [36, 37].
Standard mAb therapeutics have limited tissue penetration as a result of their large molecular weight (~150 kDa). As such, significant efforts, mostly based on modern protein engineering techniques, have been placed in the development of non-standard antibodies with superior properties. Some of these include Fab domains and other modular formats with single chain Fvs (scFv) (linked VH and VL domains) as the main component, e.g. scFv, (scFv)2, diabody and minibody (reviewed by Holliger and Hudson [38], Farajnia et al. [39] and Kwon et al. [40]). In addition, bi-specific and polyspecific antibody formats, which can engage two or more antigens at the same time, have been developed, e.g. combining L and H chains from two different mAbs or fusing the V domains from two different mAbs to create an antibody with dual specificity. Bi- and tri-specific antibody formats aimed at solid tumors have been recently reviewed [41].
Single-domain antibody formats, called VHH or nanobodies, are found in camelids and sharks (Figure 1D). Single domain antibodies have attracted attention because of their smaller size and better biophysical properties with respect to antibodies (higher stability and lower suspected immunogenicity) [27]. Despite being half the size of a standard antibody variable domain (Figure 1D), nanobodies retain similar binding affinity and specificity as standard mAbs. Therefore, these molecules are of increasing therapeutic interest with the first nanobody therapeutic (caplacizumab) approved in 2018 [42].
Antibody databases
Computational approaches to analyze and design antibodies rely on the availability of suitable datasets. Resources exist that curate the therapeutic antibody information, such as TABS (https://tabs.craic.com/) and SAbDab-Therapeutic-Antibodies [13]. Most other resources can be classified based on whether the content is sequence, structure or experimental information, with some databases being a combination of the three (Table 1). Most of these repositories collect both antibody and nanobody data; however, there are also some databases specializing only in the latter, e.g. sdAb-DB [58].
Table 1.
Databases containing information on antibody and nanobody structure and sequence. Most of the databases are free for academic use. In cases where the authors made it clear that a commercial version is available, this is indicated next to the database name. In some cases, such as IMGT or SKEMPI, conditions for non-commercial reuse are defined. In such cases, the authors of the respective databases should be contacted for details on commercial re-use of their material. Example contents of the databases are summarized in Supplementary Section 1. An up-to date list of antibody-related database resources is maintained at http://naturalantibody.com/tools
Database Name | Link | Description | Reference |
---|---|---|---|
TABS (commercial use) | https://tabs.craic.com/users/sign_in | Database of therapeutic antibodies | n/a |
SAbDab-therapeutic antibodies | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/Therapeutic.html | Database of therapeutic antibodies linked to structures | [13] |
Andrew Martin’s Antibody Resources | http://www.bioinf.org.uk/abs/ | Resources on antibody-related analytics | n/a |
AAAAA | https://www.bioc.uzh.ch/plueckthun/antibody/index.html | Resources on antibody-related analytics | [43] |
AbMiner | https://discover.nci.nih.gov/abminer/ | Database of available monoclonal antibodies | [44] |
IgPdb | http://cgi.cse.unsw.edu.au/~ihmmune/ IgPdb/information.php |
Database of inferred allelic variants for immunoglobulins | [45] |
IMGT® | http://www.imgt.org/ | Leading antibody genetics database | [46] |
Abysis (commercial license available) | http://www.abysis.org/ | Sequence and structural data on antibodies | [47] |
DIGIT | http://circe.med.uniroma1.it/digit/help.php | Antibody sequence database | [48] |
Ireceptor | http://ireceptor.irmacs.sfu.ca/ | NGS sequence data on B-cell receptors | [49] |
Observed Antibody Space | http://antibodymap.org/oas | NGS sequence data on B-cell receptors/antibodies | [17] |
SystimsDB (commercial license available) | https://www.systimsdb.ethz.ch | NGS sequence data on B-cell and T-cell receptors | n/a |
PCLICK | http://mspc.bii.a-star.edu.sg/minhn/cluster_pclick.html | Clusters of antibody–antigen interactions | [50] |
PyIgClassify (commercial license available) | http://dunbrack2.fccc.edu/PyIgClassify/ | Database of CDR canonical classes | [51] |
Structural Antibody Database | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/Welcome.php | Self-updatable database of antibody/nanobody structures | [16] |
AbDb | http://www.bioinf.org.uk/abs/abdb/ | Database of antibody structures | [52] |
Immune Epitope Database | http://iedb.org | Manually curated epitope data | [18] |
AntigenDB | http://crdd.osdd.net/raghava/antigendb/ | Antigen database | [53] |
PDBBind | http://www.pdbbind.org.cn/ | Affinity data on proteins in the PDB | [54] |
Ab-Bind | https://github.com/sarahsirin/AB-Bind-Database | Mutational antibody data related to binding affinities | [19] |
SKEMPI (non-commercial use) | https://life.bsc.es/pid/skempi2/ | Not-antibody specific interaction database | [55, 56] |
Non-redundant Nanobody database | https://www.sciencedirect.com/science/ article/pii/S2352340919301052 |
Non-redundant structures of nanobodies | [57] |
Single Domain Antibody Database | http://sdab-db.ca/ | Sequence and structural data on nanobodies | [58] |
Institute for analysis and collection of nanobodies | http://ican.ils.seu.edu.cn/ | Sequences and structural data on nanobodies | [59] |
Sequence databases
The leading database of germline antibody sequences is the International Immunogenetics Information System (IMGT) [46] and it is widely used to derive gene assignments for recombined antibodies. Most other resources typically store the recombined sequences of the variable regions (VH and VL). Such databases can be divided into those that specialize in single sequence depositions, e.g. DIGIT [48], Abysis [47] or bulk raw reads produced by NGS experiments, e.g. iReceptor [49], Observed Antibody Space [17]. Tools such as DIGIT and Abysis source their data from the European Nucleotide Archive (ENA) [60] and the National Center for Biotechnology Information (NCBI) [61]; their sequence volumes are of the order of 105 and contain multiple artificially engineered sequences. These data typically originate from single molecule depositions, often derived by Sanger sequencing, and thus can be regarded as of high quality. Repositories containing NGS data derive their contents from the raw sequence reads deposited in multiple repositories, including ENA and NCBI. Due to the high-throughput nature of NGS, sequence volumes are of the order of 108 and are associated with non-trivial error rates [62]. The raw sequences are annotated with antibody-specific information such as CDRs, numbering schemes and wherever available experimental data on the immune state of the donor at the point of sequence collection. Certain resources, such as Observed Antibody Space, address this by offering an annotation of the predicted sequence errors [62]. The NGS databases typically offer only the unpaired heavy and light chains; however, as the paired NGS technology becomes more mainstream, it is to be expected that such data will also become publicly available [63, 64].
Structure databases
The Protein Data Bank (PDB) is the main global repository of 3D structure information for proteins [65]. Resources that mine the PDB for antibody fragments such as canonical classes (PyIgClassify [51]), antibody–antigen interaction data (PCLICK [50]) or their entire structures (IMGT/3D-Structure-DB [66], Structural Antibody Database (SabDAb [16]), Abysis [47] and AbDb [52]) exist. According to SAbDab, of the approximately 150 000 structures deposited in the PDB to date as many as 3500 are identified as containing at least one antibody (or nanobody) chain. SAbDab specifically allows for a bulk download of its weekly updatable database, providing an up-to-date resource for applications such as antibody modelling or docking. SabDab and Abysis allow retrieval of particular structures given a query antibody sequence (SAbDab) or by using more advanced features such as canonical classes of the CDRs (Abysis). Other resources such as the Immune Epitope Database (IEDB) [18] link structural information to experimentally derived epitope data.
Experimental databases
Sequence and structure data can be further enriched with antibody-specific experimental information. Data on epitopes targeted by antibodies can be readily downloaded from the IEDB that now links such information to epitope-specific antibody sequences [67]. One of the crucial pieces of information to characterize antibody–epitope interactions is the binding affinity. Such data is contained in resources such as SAbDab and PDBBind [54]. Other, more specialized, resources exist such as Ab-Bind [19], which hold data on 1101 mutations across 32 antibody complexes, and SKEMPI [55], which curates binding energy data for available structures but is not limited to antibody information only.
Computational characterization of antibodies
The increasing availability of antibody-specific sequence, structure and experimental data allows development of bioinformatics tools facilitating antibody engineering (Table 2). Routine bioinformatics methods such as homology modelling and protein–protein docking can be harnessed to guide the engineering of therapeutic antibodies [5]. Antibody-based therapeutics are developed via well-established processes that can be broadly categorized into Lead Identification and Lead Optimization. During Lead Identification animal immunization or surface display technologies are used to generate a large number of ‘hit’ molecules, which need to be further triaged. Following various rounds of further screening and design during Lead Optimization, a small number of high affinity lead candidates are selected. During Lead Identification and Optimization, molecules are assessed for unfavorable characteristics such as immunogenicity or poor biophysical properties. This assessment of ‘developability’ risk is of key importance before undergoing clinical trials and to ensure the successful development of a lead candidate into a stable, manufacturable, safe and efficacious therapeutic. Computational methods, such as homology modelling, docking or interface prediction can be used during the Lead Identification and Optimization phases to generate 3D models of the antibodies and predict or identify the key residues involved in antigen binding.
Table 2.
Computational antibody tools. The algorithms or software packages were grouped by the core area of their application: Antibody Annotation/Numbering, Structural Antibody Modelling, Antibody–Antigen Interface Prediction, Antibody Design and Pharmaceutically-specific applications. We provide the name for each method together with a reference and weblink to the software if available. Some resources are currently not maintained, in which case we suggest contacting the authors directly. A web-based version of this table where the resources listed below and newly released ones are curated is maintained at http://natuarlantibody.com/tools
A. Antibody Annotation/Numbering | Role | Link | Reference |
---|---|---|---|
IgBLAST | Raw data processing | https://www.ncbi.nlm.nih.gov/igblast/ | [68] |
IMGT V-Quest | Raw data processing | http://www.imgt.org/IMGTindex/V-QUEST.php | [69] |
MiXCR | Raw data processing | https://mixcr.readthedocs.io/en/master/ | [70] |
Immcantation | Raw data processing | https://immcantation.readthedocs.io | [71, 72] |
IgRec | Raw data processing | https://yana-safonova.github.io/ig_repertoire_constructor/ | [73] |
ImmuneDiversity | Raw data processing | https://bitbucket.org/ImmunediveRsity/immunediversity/ | [74] |
IMSEQ | Raw data processing | http://www.imtools.org/ | [75] |
Partis | Raw data processing | https://github.com/psathyrella/partis | [76] |
IGoR | Raw data processing | https://github.com/qmarcou/IGoR | [77] |
Vidjil | Raw data processing | http://www.vidjil.org/ | [78, 79] |
ImmuneDB | Raw data processing | https://immunedb.readthedocs.io/en/latest/ | [80] |
AbRSA | Numbering | http://cao.labshare.cn/AbRSA/ | [81] |
Abnum | Numbering | http://www.bioinf.org.uk/abs/abnum/ | [82] |
ANARCI | Numbering | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/ANARCI.php | [83] |
B. Structural Antibody Modelling | Role | Link | Reference |
AbodyBuilder | Full FV modelling | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/Modelling.php | [84] |
LYRA | Full FV modelling | http://www.cbs.dtu.dk/services/LYRA/index.php | [85] |
PIGS | Full FV modelling | https://cassandra.med.uniroma1.it/pigspro/ | [86] |
Kotai Antibody Builder | Full FV modelling | http://kotaiab.org/ | [87] |
RosettaAntibody | Full FV modelling | http://rosie.rosettacommons.org/antibody | [88, 89] |
BIOVIA | Full FV modelling | https://www.3dsbiovia.com/ | [90] |
MoFvAb | Full Fv Modelling | - | [91] |
WAM | Full Fv Modelling | - | [92] |
BioLuminate | Full Fv Modelling | https://www.schrodinger.com/products/bioluminate | [93] |
MOE | Full Fv Modelling | https://www.chemcomp.com/ | [94] |
ABGEN | Full Fv Modelling | - | [95] |
AbPredict | Full FV modelling | http://abpredict.weizmann.ac.il/bin/steps | [96] |
SmrtMolAntibody | Full FV modelling | https://www.macromoltek.com/ | [97] |
PEARS | Ab-specific side chain prediction | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/PEARS.php | [98] |
H3LoopPred | Antibody specific loop prediction | - | [99] |
SCWRL | Side Chain Prediction | http://dunbrack.fccc.edu/scwrl4/ | [100] |
BetaSCPWeb | Side Chain Prediction | http://voronoi.hanyang.ac.kr/betascpweb/ | [101] |
SPHINX | Antibody specific ab initio loop prediction | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/Sphinx.php | [102] |
FREAD | Database-search loop prediction | http://opig.stats.ox.ac.uk/webapps/fread/php/ | [103] |
PLOP | Ab initio loop prediction | http://www.jacobsonlab.org/plop_manual/plop_overview.htm | [104] |
Chothia Canonical Assignment | CDR Canonical structure prediction | http://www.bioinf.org.uk/abs/chothia.html | Based on [105] |
SCALOP | CDR Canonical structure prediction | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/SCALOP.php | [106] |
Roche VH/VL orientation | VH/VL orientation | - | [107] |
Rosetta VH/VL orientation | VH/VL orientation | Rosetta Suite | [108] |
AbAngle | VH/VL orientation | http://opig.stats.ox.ac.uk/webapps/abangle/index.html | [109] |
Antibody i-Patch | Paratope Prediction | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/ABipatch.php | [110] |
Paratome | Paratope Prediction | http://ofranservices.biu.ac.il/site/services/paratome/ | [111] |
ProABC | Paratope Prediction | http://circe.med.uniroma1.it/proABC/ | [112] |
Parapred | Paratope Prediction | https://github.com/eliberis/parapred | [113] |
AntibodyInterface Prediction |
Paratope Prediction | https://github.com/sebastiandaberdaku/AntibodyInterfacePrediction | [114] |
AG-FAST-Parapred | Paratope Prediction | - | [115] |
ISMBLab-PPI | Protein contact prediction, applied to paratopes | http://ismblab.genomics.sinica.edu.tw/predict.php?pred=PPI | [3] |
Rapberger et al. 2007 | Ab-specific epitope prediction | - | [116] |
PEASE | Ab-specific epitope prediction | http://ofranservices.biu.ac.il/site/services/epitope/index.html | [117, 118] |
EpiPred | Ab-specific Epitope Prediction | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/EpiPred.php | [119] |
Jespersen et al. | Ab-specific Epitope Prediction | - | [120] |
EpiScope | Ab-specific Epitope Prediction | - | [121] |
MabTope | Ab-specific Epitope Prediction | - | [122] |
ASEP | Ab-specific Epitope Prediction | - | [123] |
BEPAR | Ab-specific Epitope Prediction | - | [124] |
ABEPAR | Ab-specific Epitope Prediction | - | [125] |
ClusPro | Ab-specific docking | https://cluspro.bu.edu/login.php | [8, 126] |
surFit | Ab-specific docking | https://sysimm.ifrec.osaka-u.ac.jp/docking/main/ | [127] |
SnugDock | Ab-specific docking | http://rosie.graylab.jhu.edu/snug_dock | [9, 89] |
FRODOCK | Ab-specific docking | http://frodock.chaconlab.org/ | [128] |
DockSorter (ab-specific scoring) | Ab-specific docking scoring | http://www.stats.ox.ac.uk/~krawczyk/dockingsupp.html | [110] |
Hex | Docking, not antibody specific | http://hex.loria.fr/ | [129] |
ZDOCK | Docking, not antibody specific. | http://zdock.umassmed.edu/ | [130] |
HADDOCK | Docking, not antibody specific | https://haddock.science.uu.nl/services/HADDOCK2.2/ | [131, 132] |
ATTRACT | Docking, not antibody specific | http://www.attract.ph.tum.de/services/ATTRACT/attract.html | [133] |
GRAMM-X | Docking, not antibody specific | http://vakser.compbio.ku.edu/resources/gramm/grammx/ | [134] |
pyDockWeb (pyDock, FTDock) | Docking, not antibody specific | https://life.bsc.es/pid/pydockweb | [135] |
Swarmdock | Docking, not antibody specific | https://bmm.crick.ac.uk/~svc-bmm-swarmdock/ | [136] |
PatchDock | Docking, not antibody specific | https://bioinfo3d.cs.tau.ac.il/PatchDock/ | [137, 138] |
D. Antibody Design | Role | Link | Reference |
OPTCDR | Design Protocol | http://www.maranasgroup.com/submission/OptCDR_2.htm | [139] |
OPTMaven | Design Protocol | https://github.com/maranasgroup/OptMAVEn_2.0 | [140, 141] |
RosettaAntibodyDesign | Design Protocol | https://www.rosettacommons.org/docs/latest/application_documentation/antibody/RosettaAntibodyDesign | [142] |
AbDesign | Design Protocol | https://www.rosettacommons.org/node/9206 | [12, 143] |
Humanness Score | Humanization | http://www.bioinf.org.uk/abs/shab/ | [14] |
Humanizer | Humanization | https://drive.google.com/file/d/1seCQYMlMG4_oC1-0EjiDhZHnMf9D-1R5/view?usp=sharing | [141] |
Tabhu | Humanization | http://circe.med.uniroma1.it/tabhu/ | [144] |
Human String Content | Humanization | - | [145] |
Human String Content | Humanization | - | [145] |
T20 Score | Humanization | https://dm.lakepharma.com/bioinformatics/ | [146] |
CODah | Humanization | - | [147] |
Developability Index | Developability | - | [148] |
Delayed HIC retention prediction | Developability | - | [149] |
Therapeutic Antibody Profiler | Developability | http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/TAP.php | [13] |
Lonza | Developability | - | [15] |
Antibody numbering
The first step in antibody computational analysis is to map the antibody sequences onto a standardized reference framework (Table 2A). Raw nucleotide sequences of variable regions can be translated into amino acids by aligning them to germline sequences, thus identifying the V, D and J regions. This can be achieved by programs such as IgBLAST [68] or IMGT V-Quest [69] and multiple other tools aimed at processing raw antibody data (Table 2A, reviewed in [150]). Similarities between antibody amino acid sequences further allow for the creation of a standardized reference framework, or numbering scheme, giving each variable region amino acid an identifier [151]. The numbering schemes contextualize each position within the structure of an antibody, allowing for rapid delineation of CDR and framework regions. Since the seminal work to define a standard numbering scheme for antibodies was carried out by Kabat in 1970 [152], the Chothia [32] and IMGT [153] schemes have been adopted as the main alternatives. Additional numbering schemes such as Contact [154], North [155], WolfGuy [107] and Aho [43] exist but these are less prevalent. Kabat and IMGT definitions are based on sequence alignments identifying conserved positions in the variable region [152, 153] whereas Chothia takes into account the 3D structure of the CDR loops. The antibody numbering scheme developed by the Chemical Computing Group (CCG) combines several antibody numbering schemes and offers a broader definition of CDR boundaries based on Martin and collaborators’ CDR definitions (http://www.bioinf.org.uk/abs/#martinnum). To the best of our knowledge, there are three freely available software packages to perform numbering of antibodies, ANARCI [83], Abnum [82] and AbRSA[81], to act as the first step in computational antibody analysis such as homology modelling.
Antibody modelling
Structural antibody modelling creates a 3D structure from its sequence alone, based on existing knowledge of antibody structures in particular and protein structures in general. The high degree of antibody sequence and structure conservation in the framework region and the five canonical loops leads to an overall high accuracy of antibody homology modelling [7].
Antibody modelling generally follows a five-step process (Figure 2A). The first step is selection of a suitable framework template that can harbor the CDR loops. This is typically achieved by finding close sequence matches to the H and L chains in available databases [16]. The second step involves accurate determination of the relative orientation of the VH and VL domains, which is crucial to determine the correct shape of the paratope [107, 109]. Specific algorithms have been developed for this and incorporated into available software packages such as AbAngle [109]. The third step involves modelling of the CDR loops. Knowledge-based methods are currently capable of providing accurate predictions for the five canonical loops, but CDRH3 remains a challenge [156]. Antibody-specific knowledge-based approaches are fast and accurate if a template is available [103]. If there is no suitable template, as can be often the case with CDRH3, more computationally expensive ab initio approaches can be employed that generate a large set of novel loops. The biggest challenge in such ab initio modelling remains selection of best loop models among those generated [102]. Hybrid methods such as Sphinx [102] combine knowledge-based and ab initio approaches to provide better all-round predictions irrespective of the presence or absence of a priori structural information. The fourth step involves building and refining of the side-chains [98]. Here, protein-generic approaches such as SCWRL [100] can be employed although it has been demonstrated that an antibody-focused approach, such as PEARS, could yield better results [98]. The final antibody model can be further refined by optimizing the energetic packing of the molecule, through packages such as Rosetta [89].
Figure 2.
Computational antibody methods schematic. (A) Antibody modelling produces three dimensional coordinates from the sequence of an antibody. Framework templates are identified and the VH/VL domains can be oriented with respect to each other if the two regions originate from different molecules. CDRs are modelled onto the framework followed by side-chain prediction and refinement of the entire structure by energy minimization. (B) Antibody interface prediction identifies the residues on the antibody (paratope) that are in contact with the antigen (epitope). This is a special case of molecular docking in which the antibody–antigen docking aims to recapitulate the complex between the antibody and the antigen. (C) Antibody design optimizes the binding of an antibody against an epitope of choice through a series of modelling, docking and energy minimization steps. In ab initio design, novel paratopes are generated computationally and their structural stability and binding propensity against the cognate epitope assessed by energy functions. Hotspot grafting involves transferring known interaction motifs from the antigen partner protein to an antibody template. (D) Antibodies need to be immunologically safe and have favorable biophysical properties in order to be administered to humans. Humanization involves modifying an animal-derived sequence to resemble one with a higher degree of human amino acid content without affecting its affinity and specificity. Developability-specific applications annotate regions on the surface that might lead to poor solubility or aggregation altogether. (E) Entire antibody repertoires can be used to draw information on the mechanics of the adaptive immune system. Identification of antigen-specific sequences post-vaccination can identify antibodies that could bestow passive immunity. The dynamic state of the repertoire can be analyzed to identify diseases in the organism. The diversity of antibodies can be harnessed to create surface display libraries recapitulating naturally evolved preferences and advantages.
Available tools that employ the methods described above are summarized in Table 2B, including those specific for individual steps in the modelling process. The modelling protocols are currently available via free-to-use web-servers, e.g. PIGS [86], AbodyBuilder [84]; as commercial packages, e.g. Biovia from Accelrys (https://www.3dsbiovia.com/), SmrtMolAntibody from Macromoltek (https://www.macromoltek.com/), MOE from CCG (https://www.chemcomp.com/) and BioLuminate from Schrodinger Inc. (https://www.schrodinger.com/products/bioluminate); or for local installation, e.g. AbPredict [96], Rosetta [89]. The tools vary radically in run-times, with tools such as AbodyBuilder capable of producing a model in around 60 s, to Rosetta-based frameworks that can take up to several hours. Despite different run times, the tools produce comparable results as exemplified by the Antibody Modelling Assessment II [7], a benchmarking experiment in which blinded predictions using some of the aforementioned tools were conducted. AMA II reported that the overall accuracy of modelling the entire antibody FV is 1.1 Å Root Mean Square Deviation (RMSD) on average, with the most challenging region being the CDRH3, which is modelled to >5 Å RMSD in some targets. Such results cannot rival the accuracy of experimentally derived structures, but a model with 1.0 Å RMSD, especially across the CDR region, can be used as a rapid proxy to delineate structural features of the molecule. Modelled structures can be used at the Lead Identification stage to select surface exposed paratope residues for mutations [110] or to characterize the binding with respect to the cognate epitope [119]. Accurate structural information can be used during the Lead Optimization stage to assess various developability indicators, such as hydrophobicity [13] that rely on accurate models of the molecular surface of the paratope and epitope.
Interface prediction and antibody–antigen docking
Understanding the epitope–paratope interactions at the atomic level is key to rational development of effective therapeutics. The ‘gold standard’ for obtaining this information is by experimentally determining the 3D structure of the antibody–antigen complex using X-ray crystallography. Other structural methods such as cryo-electron microscopy (cryoEM) or nuclear magnetic resonance (NMR) can be used but the size of the complexes makes it challenging for the latter. Experimental methods can be very time and resource consuming with success not being guaranteed. Thus, computational methods that predict antibody–antigen contact surfaces could be a rapid alternative during therapeutic discovery efforts. These methods can be categorized into those that predict the paratope, the epitope or the entire antibody–antigen complex (Figure 2B and Table 2C).
About half of the 40–50 residues in the CDRs are in direct contact with the antigen, forming the paratope [157–159]. Analyses of high resolution crystal structures of antigen-antibody complexes show that the framework residues can bury a substantial amount of surface area upon complex formation [159, 160]. Computational predictors of paratopes address this problem (Table 2C) and they could have an impact in constraining and guiding mutational choices for rational affinity engineering of therapeutics during Lead Optimization. They can also provide valuable information to guide the modelling of antibody–antigen complexes during Lead Identification. For instance, statistical approaches such as Antibody i-Patch [110] assign a score to each residue with respect to its propensity to be part of the paratope, with high-scoring residues offering potential candidates for mutagenesis. Since not all paratope residues are constrained to the CDRs, tools such as Paratome [111, 159] can be used to identify positions in the framework region that might contribute to antigen recognition as well. Recently, Antibody i-Patch and Paratome were outperformed by machine learning approaches such as the random forest-based proABC [112], support-vector machine-based AntibodyInterfacePrediction [114] and the deep learning-based, Parapred [113] and AG-Fast-Parapred [115]. To the best of our knowledge AntibodyInterfacePrediction and AG-Fast-Parapred are currently the best performing paratope prediction methods as compared to previously available methods; however, they were not compared against one another. AG-Fast-Parapred predictions are obtained with reference to the antigen; therefore, as the authors suggest, the method might be applicable to epitope prediction as well.
Accurate delineation of an epitope is an important step in characterizing the function of an antibody [161] (Figure 2B). From a therapeutic perspective, knowledge of the epitope can be used for rational design in targeting an immunogenic region for vaccine development [162]. From a legal perspective, characterization of the antibody–antigen interaction is of importance when filing therapeutic antibody patents [163]. To achieve such goals, epitopes can be identified by various experimental methods [163] or be predicted by computational protocols [164]. Methods for computational epitope prediction can be divided into predictors of linear epitopes, which focus on identifying contiguous stretches of primary amino acid sequence, and conformational epitope predictors, which aim to identify the 3D configuration of the epitope. The majority of epitopes are conformational in nature therefore predictors that use structural antigen information offer more accurate results than linear methods [165, 166]. Many epitope prediction methods do not include information on the antibody, thus focusing on identifying generic immunogenic molecular surfaces [167]. However, arbitrary molecular surfaces appear to be indistinguishable from epitope regions [167–169] suggesting that predictions should be performed with reference to a particular antibody [116, 118–120]. We summarize the linear and conformational epitope predictors in Supplementary Information and those following the new paradigm of including antibody in providing epitope predictions (antibody-specific predictors) in Table 2C. Antibody-specific epitope prediction was first addressed by Rapberger and co-workers in 2007 [116] and subsequently by methods such as ASEP [123], BEPAR [124], ABEpar [125], EpiPred [119], PEASE [117, 118], MabTope [122] and Jespersen et al. [120]. The most recent approaches, such as those by MabTope and Jespersen et al., perform antibody-specific epitope predictions in conjunction with protein-protein docking to offer information on the paratope-epitope pairings.
Paratope and epitope prediction can offer useful information on antibody–antigen recognition, which can be exploited for therapeutic design but these methods do not provide information about the specific interactions involved in antibody–antigen binding. This issue is addressed by antibody–antigen docking, a specialized application of the broader field of molecular docking [170] (Figure 2B). Molecular docking aims to predict the biological complex starting from the unbound proteins. It typically involves two steps; the sampling step, during which thousands of possible complex conformations are generated (e.g. antibody-specific ClusPro [8, 126], SnugDock [9, 89] and general protein HADDOCK [131, 132], ZDOCK [130]) and the scoring step, where the conformations are ranked according to a specific scoring function (e.g. antibody-specific DockSorter [110] and general protein ZRANK [171], FireDock [172], SIPPER[173]) to discriminate models that are closer to the native conformation. According to the sampling strategy used during the simulation, docking methods can be classified into two categories. The first class includes algorithms that perform a global search around the whole interfaces of the components without taking into account previous information about the binding region (ab initio docking). On the other hand, experimental or predicted information about the binding interface is often available and can be used to drive the sampling during docking (information-driven, local or integrative docking) [174]. Both classes can benefit from available information during the scoring step to select models that are consistent with the available information about the interaction. Additionally, inputs from experimental studies such as hydrogen-deuterium exchange (HDX) coupled with mass spectrometry and mutational analyses can help refine the computational models of antibody–antigen complexes [175, 176].
Another important aspect to be considered in the study of the biomolecular interactions regards the conformational changes that the molecules undergo upon binding. Most docking algorithms do not take into account conformational changes of the components, performing only ‘rigid-body docking’. Examples of widely used rigid-body docking software are ClusPro [8, 126], ZDOCK [130] and PatchDock [137, 138]. Since in most cases flexibility of the molecule is a crucial factor to be considered [177], approaches that tackle this problem have been developed over the years. Examples of such methods are for example Swarmdock [136], HADDOCK [132] and SnugDock [9].
All of the aforementioned methods allow the user to provide information about the binding interface using different strategies to implement the methodologies during the simulation. This feature is particularly relevant in the case of antibody–antigen docking as CDRs in particular offer a reasonable proxy of the binding interface. In fact, some docking methods such as ClusPro and PatchDock are able to automatically define the antibody CDRs in order to use this information during the docking process. The most challenging aspect is identification of the epitope since, despite the great efforts of the community in developing accurate epitope prediction methods, existing systems still do not provide reliable predictions, limiting their applicability in molecular docking.
HADDOCK is one of the few methods that can encode a variety of experimental and predicted information into restraints throughout the entire docking process to both drive and score the generated models following a data-driven strategy. Restraints can be derived from various experimental sources such as NMR chemical shifts perturbations, HDX and chemical cross-linking detected by mass spectrometry and mutagenesis data. In the case of antibodies, it has recently been demonstrated that HADDOCK is able to already provide high quality models when only a loose definition of the epitope and the hypervariable loops of antibodies are used to drive the docking [178]. Despite the availability of experimental data and their use to drive the docking and/or score the generated models, accurate prediction of biomolecular complexes remains a real challenge with much room for improvement. Current docking methods still cannot rival the reliability of X-ray crystallography-derived structures and their performance is regularly assessed by the Critical Assessment of Predicted Interactions (CAPRI) [179]. Here, scientists are typically provided sequences of the interacting partners (or in rare cases the structures of the unbound components) and are tasked with predicting the native complex. CAPRI rounds over the years have catalyzed and demonstrated improvements in protein–protein docking methodology. Targets consisting of antibody–antigen complexes are regularly included. Therefore, as methods for antibody–antigen interaction prediction improve, it is to be expected that triangulation of results from paratope prediction, epitope prediction and antibody–antigen docking methods could provide a relatively fast and cost-effective route to obtaining reliable information on which to base rational antibody design decisions.
Computational methods for therapeutic antibody discovery
Antibody design
Antibody modelling and interface prediction/analysis tools can be used to create novel molecules ab initio during Lead Identification or as auxiliary tools during Lead Optimization (Figure 2C and Table 2D). The availability of an antigen structure opens up the possibility to develop a novel antibody binder computationally [180]. The seminal work on the subject was published by Lippow and co-workers, who computationally improved the binding of an antibody against its target, starting from an existing structural complex [181]. The authors performed comprehensive computational mutagenesis of the CDRs and assessed the binding of the novel designs using the CHARMM energy function [182]. Selected molecules had better affinity for the target, demonstrating that in some scenarios computational approaches alone can be used for affinity maturation.
Since then, four methods have been made available: OptCDR [139], OptMAVEn[140], AbDesign [143] and RosettaAntibodyDesign [142]. These protocols can be broadly categorized as ab initio since they aim to design novel paratopes through four sequential steps: CDR generation, modelling, antibody–antigen docking and binding energy evaluation. OptCDR and RosettaAntibodyDesign generate CDR conformations by sampling known canonical classes and modelling the CDRH3 loop. In contrast, OptMAVEn and AbDesign generate molecules by modular design in a process akin to that of V(D) J recombination. The new CDRs are grafted onto a framework and the structures are energy-minimized by well-established energy functions such as RosettaEnergy [183] or CHARMM [182]. The affinity of each variant is further optimized by docking the antibody onto the target antigen and scored by assessing the interaction energy between antibody and antigen. Ab inito methods such as these are still emerging and although some of them demonstrated the validity of their constructs experimentally there exists for them to be validated across multiple projects in industrial setting to assess their utility.
The four methods outlined above facilitate the re-design of CDRs to improve antibody stability and affinity through a combination of conformational and free energy change optimization upon modification of specific residues. In contrast, Liu and co-workers validated an approach in which binding site motifs from existing protein–protein complexes were transferred directly onto an antibody in a process termed ‘hot-spot grafting’ [11]. A further approach to data mining existing structures to improve antibody affinity is ‘re-epitoping’, pioneered by Ofran and collaborators [184]. Here, existing antibodies are tested for complementarity to a target epitope and the best candidates are used to computationally construct focused surface display libraries. This protocol is exemplary in showing how the computational constructs can guide the traditional discovery methods to accelerate the discovery of therapeutic lead candidates.
The methods outlined above offer the potential for discovering specific and selective binders computationally, reducing the experimental effort during the Lead Identification stage. Such binders need to be further developed during Lead Optimization stage by assessing their immunogenicity and overall ‘developability’ potential through understanding of their biophysical properties.
Immunogenicity prediction
A large proportion of currently developed antibodies are discovered by animal immunizations. Molecules raised in animals, such as mice, carry the risk of inducing an immunological response in humans in the form of anti-drug antibodies (ADAs). To avoid such issues, animal-derived antibodies undergo a process called humanization [185, 186]. During this process the CDRs from the (typically) mice-derived antibodies are grafted onto human frameworks, or alternatively, the mice-derived frameworks are engineered to resemble human ones. Traditionally, humanization involves comparing the animal-derived sequence with approximately 1000 human germline sequences before selecting the appropriate template. Germline sequences however only offer a limited view of overall mutational antibody diversity, which can be addressed by computational humanization, comparing the animal-derived therapeutic to the distribution of amino acids in human antibody sequences (Figure 2D, Table 2E).
This was addressed by Tabhu [144], a web-server that compares a query therapeutic sequence to thousands of recombined variable region sequences from DIGIT [48] and serves as a reference in humanization. Although this takes into account antibody sequence diversity, humanization is a complex process where simple pairwise homology and alignments might be insufficient. Thus, statistical approaches to assess the ‘humanness’ of the query sequence have been developed (Figure 2D and Table 2E). One of the earliest examples is the Humanness score by Andrew Martin’s group [14] in which the authors contrasted the distribution of amino acids in antibody sequences in humans and mice. This allowed them to develop a statistical score indicating whether a query sequence is close in its amino acid content to the human distribution and provided a global metric based on the entire antibody variable region. Since immunogenicity is mediated by short peptides on the molecule, Lazar and co-workers developed the Human String Content (HSC) score that takes into account short (9-mer) sequences along the variable region to indicate regions requiring modification to conform with the human amino acid distribution [145]. Humanness score and HSC are primarily sequence distance-based approaches and it has been demonstrated that more sophisticated methods that take the positional correlations between residues into account could be superior [187, 188]. These methods remain sequence-based and do not explicitly use the structure of the antibody to be humanized, although HSC uses a contact-based score derived a priori. Since the immunogenic portions of the antibody would naturally be found on the surface, structural modelling can readily identify solvent-exposed positions aiding in a process called re-surfacing [189]. Choi and co-workers elegantly demonstrated how deimmunized functional antibody molecules can be created through structure-based design and simultaneous integration of HSC [147].
The generation of immune responses against a biotherapeutic requires multiple critical steps beyond reproducing human antibody sequence diversity [190]. Indeed, humanized and even fully human antibodies can elicit immune responses among the patients receiving such medicines and generate ADAs against them. Generation of ADAs is a multi-factorial issue and depends upon patients genetic background and disease history as well as quality attributes of the protein therapeutics, particularly, presence of aggregates and other degradants even in very minute quantities [190, 191]. A crucial first step towards ADA generation is the binding of short biotherapeutic-derived peptide fragments to major histocompatibility complex class II (MHC II) molecules. Several computational approaches have been developed to identify potential MHC I and MHC II binding T-cell epitopes as well as conformational B-cell epitopes [192]. Prediction of T-cell epitopes is addressed by machine learning approaches, particularly, neural networks-based methods that often rely on evaluating the binding affinity of a given short peptide towards MHC-I or II [192, 193]. In addition to such predictions, publicly available databases such as IEDB [18] provide free access to experimentally validated immunogenic peptide and protein sequences along with tools for their analyses. In silico predictions of potential MHC II binding T-cell immune epitopes in the amino acid sequences can be used as part of immunogenicity risk assessment and mitigation during the Lead Identification and Optimization stages along with other measures of humanness of the lead candidate sequences. In this regard, the immunogenicity scale developed by Epivax Inc. can be particularly useful [194] for initial triaging of the potential lead candidates and for formulating potential de-immunization or risk mitigation strategies. Kumar and co-workers have observed an overlap between potential immune epitopes and aggregation prone regions (APRs) around the CDRs of therapeutic antibodies [195, 196]. In addition to offering a potential mechanistic understanding of how protein aggregates can break immune tolerance, the existing overlaps between immune epitopes and APRs in CDRs of biotherapeutics open up exciting opportunities for simultaneous optimization of potency, solubility and safety of antibody-based biotherapeutics via rational structure-based design. Altogether, computational methods that facilitate deimmunization might offer a faster and cost-efficient way of pre-selecting molecules with better immunogenicity properties during the Lead Optimization stage. We must emphasize that connection between computational predictions of immune epitopes and ADAs generated against biotherapeutics remains largely untested. Therefore, it remains to be seen whether computational deimmunization strategies truly work in clinic.
Biophysical properties
Together with immunogenicity, developing a working therapeutic also relies on favorable biophysical properties of the molecule. This includes properties such as colloidal stability of the antibody solution, concentration dependent viscosity behaviors and physicochemical degradation [197–201]. Good solubility is crucial [202, 203] to avoid aggregation that can potentially lead to loss of activity, degradation of antibodies or immunogenicity, as discussed previously. From a general perspective, protein aggregation remains a major unsolved problem in biochemistry. Aggregation has two aspects, namely, mechanistic and kinetic. Mechanistic aspects focus on protein instability and on identifying potential APRs, mainly hydrophobic patches on the protein surface, which can potentially nucleate aggregation. A number of groups have reviewed the applicability of various algorithms (Figure 2D, Table 2E) available to predict APRs in biotherapeutics [204, 205]. Wang and co-workers have examined molecular sequences of commercially available mAb drug products and shown that they contain multiple well defined aggregation prone motifs often located in their CDRs [206]. These CDR-located APRs also contribute significantly towards antigen binding [160], which help us rationalize how antibodies may lose potency upon aggregation and suggest potential strategies for selecting APRs for disruption without impacting biological activity. Recently, Rawat and co-workers have collected experimental data on aggregation kinetics available in literature and used machine learning to identify aggregation rate enhancer and mitigatory mutations in proteins [207]. Several generic predictors of solubility and APRs in proteins have been developed [208, 209] and though these have been successfully applied to antibodies [206], antibody-specific protocols addressing these issues also exist [204, 210]. Lauer and co/workers carried out a 2-year long measurement of biophysical properties for 12 antibodies [148] from which they derived a score, the developability index (DI), and demonstrated that it correlated well with the favorable biophysical properties of their antibodies. The DI combines the computed hydrophobicity, SAP score [211] and net charge of the molecule into a statistical score indicating APRs. Identification of hydrophobic regions is an important step in aggregation prediction that ideally requires a crystal structure of the antibody or a reliable homology model. This was addressed by Jain and colleagues who developed a surface accessible area predictor that can be applied to an antibody sequence to further create a propensity score that could be correlated with aggregation risk [149]. Measures such as the DI and the aggregation propensity risk score rely on the hydrophobicity scales and charge annotations, demonstrating that there is useful information in these parameters alone. An extended set of physico-chemical parameters was used by Obrezanova and colleagues [15] to create an Adaptive Boosting model for aggregation prediction. The model was trained and validated on a dataset of 500 antibodies with calculated biophysical properties.
The aforementioned methods relied on proprietary datasets of calculated aggregation propensity, data from clinical stage or marketed biotherapeutics in order to develop fast computational methods to perform pre-selection of candidates with more favorable developability properties during the Lead Optimization stage. An alternative approach is to use natural antibody sequences under the assumption that they have favorable biophysical properties [13]. In this study Raybould et al. stipulate five computational guidelines to define favorable biophysical properties in antibody therapeutics. Among these a structure-based hydrophobicity score is calculated and the value is then compared to the distribution of the same score in naturally sourced NGS sequences. Score values diverging significantly from the natural distribution are highlighted and the associated sequences flagged for developability risk. This work demonstrates a new paradigm in employing the vast amount of naturally-sourced NGS data to guide therapeutic antibody development.
Developing trends
Data-mining NGS
Development of computational methods to aid antibody engineering relies on successful exploration and exploitation of new data sources. In this respect, the field currently benefits from a steady stream of new data from NGS of B-cell Receptors (BCRs, to be used as proxy for antibodies) [212, 213] and there is an increasing number of resources available for downloading such data [17], which are being used to analyze therapeutic antibodies [13]. Current bioinformatic analyses of NGS repertoires focus on large-scale decoding of the immune responses, with several potential applications for therapeutic design [22, 25, 214].
One of the main applications of computational analysis of NGS outputs is the identification of antigen-specific BCR sequences after immunization (Figure 2E). Upon administering an immunogen to an organism, antigen-specific antibodies are raised, thus polarizing the immune repertoire. Sequencing a sample of the repertoire and identifying newly abundant sequence-similar BCRs is being used as a simple bioinformatic method to identify new antibodies specific for the antigen of interest. Clustering by V, J genes and CDRH3 sequence to identify large sequence-similar groups can identify antigen-specific cells after vaccinating humans with Hepatitis B [215]. A similar approach based on identifying highly abundant sequences was used to select antigen-specific molecules in immunized mice [216]. Such simple models however can still identify sequences that are not antigen-specific or miss less abundant antigen-specific sequences [215]. The selection of false positives can be tackled by more complex statistical models as highlighted by Fowler and co-workers [217]. Identification of antigen-specific antibodies after immunization can readily inform vaccine design since such antibodies can be used to confer passive immunity [218].
Identification of antigen-specific sequences can be naturally extended to detection of the immune state altogether (Figure 2E). Since the immune system is a dynamic reflection of the overall health of the organism, some antigen-specific signatures in the repertoire could be indicative of particular diseases [219]. It was demonstrated that statistical classifiers can identify immune profiles of patients with chronic lymphocytic leukemia [220], multiple sclerosis [221] or influenza [222] from NGS data alone. Further development of a larger variety of such models could result in versatile diagnostic tools for multiple conditions on the basis of an individual’s sequenced BCR repertoire [220].
Detection of antigen-specific sequences or the immune state could be improved by defining sequence- and structure-based rules governing adaptive immune responses through large scale analysis of antibody repertoires [25]. From a sequence perspective, it was demonstrated recently that despite the vast numbers of diverse sequences in a typical human antibody repertoire, a non-trivial amount of these is shared between individuals [23, 223]. Human antibody repertoires can maintain their fundamental sequence diversity despite the removal of as many as 50–90% of sequences [214]. Human antibody repertoires also appear to be constrained structurally, and this includes regions of the antibody that display great variability such as the CDRH3 loop [224]. Furthermore, many therapeutic CDRH3 loops can be found in naturally sourced NGS datasets, indicating a certain degree of convergence between antibodies raised experimentally and naturally occurring ones [225]. Studying such convergence of immune repertoires might reveal strategic preferences that have arisen through natural evolution. Such constraints can be readily adapted to library construction and used to identify binding antibodies (Figure 2E) [226]. This has already been demonstrated to some extent through analysis of antibodies derived from 600 donors [24] followed by construction of a library based on natural positional preferences [26]. Deriving binders from more naturally focused libraries might produce binders with more favorable biophysical and immunogenic properties.
At this stage, further development of bioinformatic methods for NGS analysis will depend as much on improving the algorithms as on the quality of the data itself. In addition, most of the NGS datasets produced to date do not offer paired H and L chain sequences. Further development of single-cell technology to provide paired NGS data [64, 227] will expand our ability to query the immune system by computational methods, paving the way for better antibody-based therapeutics.
Alternative antibody formats—nanobodies
New approaches to develop antibody-based therapeutics are increasingly focused on different molecular formats. One of the most promising is the H chain only antibody, or nanobody that are naturally occurring in camelids (llamas, alpacas and camels) [27] and sharks [228, 229] (reviewed by Muyldermans [230] and Bannas et al. [27]). In line with its first therapeutic approval in 2018 (caplacizumab), increasing levels of interest in these molecules is demonstrated by recent nanobody-specific databases and analyses [57, 58, 231, 232].
The nanobody contains just three highly variable loops CDRH1, CDRH2 and CDRH3, which form an extended structural paratope located at one side of the folded protein domain. The absence of the L chain means that the nanobody’s CDRs are distinct from those of antibodies in both sequence and structure, and as a result nanobodies are able to bind antibody-inaccessible epitopes in enzyme active sites, viral capsids and G protein coupled receptors [233, 234]. Recent computational analyses of large sets of antibody and nanobody sequences and structures demonstrate that non-trivial systematic differences between these molecules exist [231, 232]. Specifically, nanobodies were found to exhibit less sequence and structure variation across their framework regions, and similar levels of sequence variation as classical antibodies within the CDRH1 and CDRH2 loops [232]. However, nanobodies translate this same level of sequence variation into increased structural diversity across the CDRH1 and CDRH2 loops, which are not classifiable by established canonical rules, presenting additional challenges to computational structural modelling tools compared to classical antibodies [232, 235]. Furthermore, the nanobody CDRH3 loop is on average three to four residues longer than its antibody counterpart, and significantly more diverse in terms of both primary sequence and tertiary structure configuration, defying the canonical antibody VH-domain-based rules and enabling nanobody CDRH3 loops to exhibit finger-like protrusions that extend into epitope cavities on their cognate antigens [230, 232, 235, 236].
Perhaps more significant in terms of challenges to computational modelling tools is that nanobody paratopes contain on average nearly three additional residues, and moreover the paratope is drawn from a wider set of aligned sequence positions than classical antibody VH domain paratopes, comparable to the set used by the entire VH-VL paratope [230, 232]. Given that the VL domain is much less structurally variable, this suggests that the application of computational structural modelling tools to nanobody–antigen interactions will require innovation. In addition, analysis of large sets of nanobody–antigen co-crystal structures reveals that nanobody paratopes are made up of a much greater diversity of structural subunits, further increasing the modelling challenge [231]. CDRH3 loop residues that are highly variable in both sequence and structure dominate antigen-contacting residues within nanobodies, suggesting that the nanobody-antigen interface will be difficult to model using tools that have been developed in the context of classical antibody VH domains. Because of such differences it is not clear whether the current methods for antibody modelling, docking etc. are directly transferable to nanobodies. Thus, a systematic benchmark of existing antibody computational tools would be highly informative to establish the extent to which these are applicable to nanobodies, and the innovations that are necessary to drive computational nanobody development.
Conclusions
Antibodies continue to dominate the field of biotherapeutics with an increasing number of new clinical approvals each year. Current approaches to bring these molecules to the market have remained experimentally focused, with animal immunization and surface display technologies accounting for the majority of molecules developed to date. The increasing amount of antibody-specific data in the public domain facilitates the maturation of computational antibody design methods, resulting in a growing uptake as part of standard pharmaceutical discovery processes.
Computational methods are unlikely to replace the entire discovery process. Indeed, their largest added value will continue to be in providing time and cost-efficient ways of guiding experimental methods. Structural modelling can offer insight on exposed residues to be used for mutagenesis to either optimize binding, reduce immunogenicity or provide information on hydrophobicity patches related to detrimental biophysical properties. Predicting interface information can provide an initial guide for experimental epitope mapping efforts or offer a starting point for a therapeutic campaign by providing the basis for focused surface display libraries to design a novel antibody binder for a given epitope. Exploiting the vast amount of data generated by NGS will facilitate the derivation of more reliable ‘humanness’ and ‘developability’ profiles with which to guide antibody therapeutic discovery.
Existing computational antibody design knowledge and tools may benefit emerging biotherapeutic modalities akin to antibodies, such as nanobodies. However, despite the similarity between antibodies and nanobodies, systematic benchmarking will still be needed to determine whether development of nanobodies can benefit from computational antibody protocols in their current form or whether they need to be adjusted accordingly. Holistically, benchmarking of bioinformatic antibody methods on a par with existing protein-generic initiatives such as CASP, CAPRI or CAMEO [237] will benefit the entire computational antibody field. Antibody-specific benchmarking challenges will emphasize the shortcomings and advantages of each method and enable improvements to be developed in a focused manner, specifically with regard to their utility in therapeutic development process.
Further progress in the development of antibody-specific computational tools will be associated with access to more and diverse data in the public domain. It will become increasingly important that these data adhere to information management and reusability best practices. Such efforts are exemplified by AIRR community, which aims to standardize the increasing amount of antibody NGS depositions and their metadata [213], and from a broader perspective by the adoption of scientific data management principles such as FAIR [238]. Organizations involved in the discovery and development of antibody therapeutics have a unique opportunity to catalyze the development of the computational antibody methods by participating in data sharing and benchmarking efforts. Publishing proprietary data, which has no or little commercial value, generated in the process of developing a candidate therapeutic may yield a higher return in the form of better computational methods.
As the importance of antibodies as therapeutics grows, faster and more accurate computational methods are set to become even more tightly integrated into therapeutic development processes, thus accelerating the delivery of new medicines to patients.
Key Points
Antibodies are the largest group of biopharmaceuticals.
Established bioinformatics methods such as protein structural homology modelling and molecular docking can be readily applied to the specific case of antibodies in a therapeutic setting.
Increasing amount of data from next generation sequencing holds the potential to improve the computational methods and thus its applicability to therapeutic design.
Computational antibody methods might be transferable to other immunoglobulin formats such as nanobodies, which have more potential in certain therapeutic areas.
Systematic benchmarking of the tools can emphasize the advantages of computational methods and where these can be used to support therapeutic pipelines.
Supplementary Material
Richard A. Norman is a Consultant at Pistoia Alliance Inc., USA, and Managing Director at Norman Consulting, Norway. His interests lie in structural and computational biology, therapeutic discovery and innovation in the life sciences.
Francesco Ambrosetti is a PhD candidate at Utrecht University. He is interested in integrative modeling of antibody–antigen complexes.
Alexandre M.J.J. Bonvin is professor of Computational Structural Biology at Utrecht University in the Netherlands. His group is developing the integrative modelling platform HADDOCK and associated computational services.
Lucy J. Colwell is a lecturer in Chemistry at the University of Cambridge. Her group studies the relationship between protein sequence and structure/function.
Sebastian Kelm is an employee of UCB Celltech. He is interested in the three-dimensional modelling of protein structures and interactions with a view towards designing medicines that impact patients’ lives.
Sandeep Kumar works for Boehringer Ingelheim. He is advocating for Biopharmaceutical Informatics, a strategic combination of Bioinformatics, Computational Biophysics and Computer Science to enable discovery and development of biologic medicines.
Konrad Krawczyk works for NaturalAntibody. He is interested in computational techniques to streamline discovery of antibody-based biotherapeutics.
Funding
Pistoia Alliance (AbVance project to R.A.N.); European Union Horizon 2020 BioExcel (project no. 675728 and 823830 to F.A. and A.M.J.J.B.); EOSC-hub (project no. 777536 to F.A. and A.M.J.J.B.) projects; Simons Foundation (to L.J.C.).
References
- 1. Kindt TJ, Goldsby RA, Osborne BA, et al. Kuby Immunology. New York, USA: W. H. Freeman and Co., 2007, ISBN 9780716785903 [Google Scholar]
- 2. Kelly-Scumpia KM, Scumpia PO, Weinstein JS, et al. B cells enhance early innate immune responses during bacterial sepsis. J Exp Med 2011;208(8):1673–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Peng H-P, Lee KH, Jian J-W, et al. Origins of specificity and affinity in antibody–protein interactions. Proc Natl Acad Sci U S A 2014;111:E2656–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Kaplon H, Reichert JM. Antibodies to watch in. MAbs 2018;2018:1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Krawczyk K, Dunbar J, Deane CM. Computational tools for aiding rational antibody design. Methods Mol Biol 2017;1529:399–416. [DOI] [PubMed] [Google Scholar]
- 6. Fiser A, Šali A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 2003;374:461–91. [DOI] [PubMed] [Google Scholar]
- 7. Almagro JC, Teplyakov A, Luo J, et al. Second antibody modeling assessment (AMA-II). Proteins Struct Funct Bioinforma 2014;82:1553–62. [DOI] [PubMed] [Google Scholar]
- 8. Brenke R, Hall DR, Chuang GY, et al. Application of asymmetric statistical potentials to antibody–protein docking. Bioinformatics 2012;28:2608–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Sircar A, Gray JJ. SnugDock: paratope structural optimization during antibody–antigen docking compensates for errors in antibody homology models. PLoS Comput Biol 2010;6: e1000644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Esmaielbeiki R, Krawczyk K, Knapp B, et al. Progress and challenges in predicting protein interfaces. Brief Bioinform 2015;17:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Liu X, Taylor RD, Griffin L, et al. Computational design of an epitope-specific Keap1 binding antibody using hotspot residues grafting and CDR loop swapping. Sci Rep 2017;7:41306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Baran D, Pszolla MG, Lapidoth GD, et al. Principles for computational design of binding antibodies. Proc Natl Acad Sci U S A 2017;114(41):10900–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Raybould MIJ, Marks C, Krawczyk K, et al. Five computational developability guidelines for therapeutic antibody profiling. Proc Natl Acad Sci U S A 2019;116(10):4025–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Abhinandan KR, Martin ACR. Analyzing the ‘Degree of Humanness’ of antibody sequences. J Mol Biol 2007;369:852–62. [DOI] [PubMed] [Google Scholar]
- 15. Obrezanova O, Arnell A, De La Cuesta RG, et al. Aggregation risk prediction for antibodies and its application to biotherapeutic development. MAbs 2015;7:352–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Dunbar J, Krawczyk K, Leem J, et al. SAbDab: the structural antibody database. Nucleic Acids Res 2013;42:1140–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Kovaltsuk A, Leem J, Kelm S, et al. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J Immunol 2018;201(8):2502–9. [DOI] [PubMed] [Google Scholar]
- 18. Vita R, Zarebski L, Greenbaum JA, et al. The immune epitope database 2.0. Nucleic Acids Res 2010;38:D854–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Sirin S, Apgar JR, Bennett EM, et al. AB-Bind: antibody binding mutational database for computational affinity predictions. Protein Sci 2016;25(2):393–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Koenig P, Lee CV, Walters BT, et al. Mutational landscape of antibody variable domains reveals a switch modulating the interdomain conformational dynamics and antigen binding. Proc Natl Acad Sci U S A 2017;114(4):E486–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Jain T, Sun T, Durand S, et al. Biophysical properties of the clinical-stage antibody landscape. Proc Natl Acad Sci 2017;114(5):944–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Miho E, Yermanos A, Weber CR, et al. Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires. Front Immunol 2018;9:224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Briney B, Inderbitzin A, Joyce C, et al. Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature 2019;566:393–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Glanville J, Zhai W, Berka J, et al. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl Acad Sci U S A 2009;106:20216–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Brown AJ, Snapkov I, Akbar R, et al. Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires. Mol Syst Des Eng 2019. doi: 10.1039/C9ME00071B, eprint arXiv:1904.04105 [DOI]
- 26. Zhai W, Glanville J, Fuhrmann M, et al. Synthetic antibodies designed on natural sequence landscapes. J Mol Biol 2011;412(1):55–71. [DOI] [PubMed] [Google Scholar]
- 27. Bannas P, Hambach J, Koch-Nolte F. Nanobodies and nanobody-based human heavy chain antibodies as antitumor therapeutics. Front Immunol 2017;8:1603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Tonegawa S. Somatic generation of antibody diversity. Nature 1983;302:575–81. [DOI] [PubMed] [Google Scholar]
- 29. Hesslein DGT, Schatz DG. Factors and forces controlling V(D) J recombination. Adv Immunol 2001;78:169–232. [DOI] [PubMed] [Google Scholar]
- 30. Storb U. Somatic hypermutation and class switch recombination. Encycl Immunobiol 2016;3:186–94. [Google Scholar]
- 31. Peters A, Storb U. Somatic hypermutation of immunoglobulin genes is linked to transcription initiation. Immunity 1996;4(1):57–65. [DOI] [PubMed] [Google Scholar]
- 32. Chothia C, Lesk AM. Canonical structures for the hypervariable regions of immunoglobulins. J Mol Biol 1987;196:901–17. [DOI] [PubMed] [Google Scholar]
- 33. Regep C, Georges G, Shi J, et al. The H3 loop of antibodies shows unique structural characteristics. Proteins Struct Funct Bioinforma 2017;85:1311–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Tsuchiya Y, Mizuguchi K. The diversity of H3 loops determines the antigen-binding tendencies of antibody CDR loops. Protein Sci 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Xu JL, Davis MM. Diversity in the CDR3 region of V H is sufficient for most antibody specificities. Immunity 2000;13:37–45. [DOI] [PubMed] [Google Scholar]
- 36. Knappik A, Ge L, Honegger A, et al. Fully synthetic human combinatorial antibody libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J Mol Biol 2000. [DOI] [PubMed] [Google Scholar]
- 37. De Kruif J, Boel E, Logtenberg T. Selection and application of human single chain Fv antibody fragments from a semi-synthetic phage antibody display library with designed CDR3 regions. J Mol Biol 1995;248(1):97–105. [DOI] [PubMed] [Google Scholar]
- 38. Holliger P, Hudson PJ. Engineered antibody fragments and the rise of single domains. Nat Biotechnol 2005;23(9):1126–36. [DOI] [PubMed] [Google Scholar]
- 39. Farajnia S, Ahmadzadeh V, Tanomand A, et al. Development trends for generation of single-chain antibody fragments. Immunopharmacol Immunotoxicol 2014;36(5):297–308. [DOI] [PubMed] [Google Scholar]
- 40. Kwon N-Y, Kim Y, Lee J-O. Structural diversity and flexibility of diabodies. Methods 2019;154:136–42. [DOI] [PubMed] [Google Scholar]
- 41. Runcie K, Budman DR, John V, et al. Bi-specific and tri-specific antibodies—the next big thing in solid tumor therapeutics. Mol Med 2018;24(1):50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Duggan S. Caplacizumab: first global approval. Drugs 2018;78:1639–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Honegger A, Pluckthun A. Yet another numbering scheme for immunoglobulin variable domains: an automatic modeling and analysis tool. J Mol Biol 2001;309:657–70. [DOI] [PubMed] [Google Scholar]
- 44. Major SM, Nishizuka S, Morita D, et al. AbMiner: a bioinformatic resource on available monoclonal antibodies and corresponding gene identifiers for genomic, proteomic, and immunologic studies. BMC Bioinformatics 2006;7:192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Ohlin M, Scheepers C, Corcoran M, et al. Inferred allelic variants of immunoglobulin receptor genes: a system for their evaluation, documentation, and naming. Front Immunol 2019;10:435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Lefranc MP, Giudicelli V, Duroux P, et al. IMGT R, the international ImMunoGeneTics information system R 25 years on. Nucleic Acids Res 2015;43:D413–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Swindells MB, Porter CT, Couch M, et al. abYsis: integrated antibody sequence and structure—management, analysis, and prediction. J Mol Biol 2017;429:356–64. [DOI] [PubMed] [Google Scholar]
- 48. Chailyan A, Tramontano A, Marcatili P. A database of immunoglobulins with integrated tools: DIGIT. Nucleic Acids Res 2012;40:D1230–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Corrie BD, Marthandan N, Zimonja B, et al. iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunol Rev 2018;284(1):24–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Nguyen MN, Pradhan MR, Verma C, et al. The interfacial character of antibody paratopes: analysis of antibody–antigen structures. Bioinformatics 2017;33:2971–6. [DOI] [PubMed] [Google Scholar]
- 51. Adolf-Bryfogle J, Xu Q, North B, et al. PyIgClassify: a database of antibody CDR structural classifications. Nucleic Acids Res 2015;43:D432–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Ferdous S, Martin ACR. AbDb: antibody structure database—a database of PDB-derived antibody structures. Database 2018;2018: bay040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Ansari HR, Flower DR, Raghava GPS. AntigenDB: an immunoinformatics database of pathogen antigens. Nucleic Acids Res 2010;38:D847–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Wang R, Fang X, Lu Y, et al. The PDBbind database: methodologies and updates. J Med Chem 2005;48:4111–9. [DOI] [PubMed] [Google Scholar]
- 55. Moal IH, Fernández-Recio J. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 2012;28(20):2600–7. [DOI] [PubMed] [Google Scholar]
- 56. Jankauskaite J, Jiménez-García B, Dapkunas J, et al. SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019;35(3):462–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Zavrtanik U, Hadži S. A non-redundant data set of nanobody-antigen crystal structures. Data Br 2019;103754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Wilton EE, Opyr MP, Kailasam S, et al. sdAb-DB: the single domain antibody database. ACS Synth Biol 2018;7(11):2480–4. [DOI] [PubMed] [Google Scholar]
- 59. Zuo J, Li J, Zhang R, et al. Institute collection and analysis of Nanobodies (iCAN): a comprehensive database and analysis platform for nanobodies. BMC Genomics 2017;18:797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Leinonen R, Akhtar R, Birney E, et al. The European nucleotide archive. Nucleic Acids Res 2011;39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Resource NCBI. Coordinators. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res 2017;45:D12–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Kovaltsuk A, Krawczyk K, Kelm S, et al. Filtering next-generation sequencing of the Ig gene repertoire data using antibody structural information. J Immunol 2018;201(12):3694–704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. DeKosky BJ, Lungu OI, Park D, et al. Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires. Proc Natl Acad Sci U S A 2016;113(19):E2636–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. DeKosky BJ, Ippolito GC, Deschner RP, et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat Biotechnol 2013;31:166–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Berman HM, Westbrook J, Feng Z, et al. The protein data bank. Nucleic Acids Res 2000;28:235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Ehrenmann F, Kaas Q, Lefranc M. IMGT/3Dstructure-DB and IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF. Nucleic Acids Res 2010;38:D301–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Mahajan S, Vita R, Shackelford D, et al. Epitope specific antibodies and T cell receptors in the immune epitope database. Front Immunol 2018;9:2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Ye J, Ma N, Madden TL, et al. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res 2013;41:W34–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Brochet X, Lefranc MP, Giudicelli V. IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis. Nucleic Acids Res 2008;36:W503–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Bolotin DA, Poslavsky S, Mitrophanov I, et al. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 2015;12. [DOI] [PubMed] [Google Scholar]
- 71. Gupta NT, Vander Heiden JA, Uduman M, et al. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 2015;31(20):3356–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Vander Heiden JA, Yaari G, Uduman M, et al. PRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 2014;30(13):1930–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Shlemov A, Bankevich S, Bzikadze A, et al. Reconstructing antibody repertoires from error-prone immunosequencing reads. J Immunol 2017;199(9):3369–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Cortina-Ceballos B, Godoy-Lozano EE, Sámano-Sánchez H, et al. Reconstructing and mining the B cell repertoire with ImmunediveRsity. MAbs 2015;7(3):516–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Kuchenbecker L, Nienen M, Hecht J, et al. IMSEQ—a fast and error aware approach to immunogenetic sequence analysis. Bioinformatics 2015;31(18):2963–71. [DOI] [PubMed] [Google Scholar]
- 76. Ralph DK, Matsen FA. Consistency of VDJ rearrangement and substitution parameters enables accurate B cell receptor sequence annotation. PLoS Comput Biol 2016;12(1): e1004409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Marcou Q, Mora T, Walczak AM. High-throughput immune repertoire analysis with IGoR. Nat Commun 2018;9(1):561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Giraud M, Salson M, Duez M, et al. Fast multiclonal clusterization of V(D) J recombinations from high-throughput sequencing. BMC Genomics 2014;15:409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Duez M, Giraud M, Herbert R, et al. Vidjil: a web platform for analysis of high-throughput repertoire sequencing. PLoS One 2016;11(11): e0166126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Rosenfeld AM, Meng W, Luning Prak ET, et al. ImmuneDB, a novel tool for the analysis, storage, and dissemination of immune repertoire sequencing data. Front Immunol 2018;9:2107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Li L, Chen S, Miao Z, et al. AbRSA: a robust tool for antibody numbering. Protein Sci 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Abhinandan KR, Martin AC. Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains. Mol Immunol 2008;14:3832–9. [DOI] [PubMed] [Google Scholar]
- 83. Dunbar J, Deane CM. ANARCI: Antigen receptor numbering and receptor classification. Bioinformatics 2015; btv552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Leem J, Dunbar J, Georges G, et al. ABodyBuilder: automated antibody structure prediction with data–driven accuracy estimation. MAbs 2016;8:1259–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Klausen MS, Anderson MV, Jespersen MC, et al. LYRA, a webserver for lymphocyte receptor structural modeling. Nucleic Acids Res 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Marcatili P, Rosi A, Tramontano A. PIGS: automatic prediction of antibody structures. Bioinformatics 2008;24:1953–4. [DOI] [PubMed] [Google Scholar]
- 87. Yamashita K, Ikeda K, Amada K, et al. Kotai antibody builder: automated high-resolution structural modeling of antibodies. Bioinformatics 2014;30:3279–80. [DOI] [PubMed] [Google Scholar]
- 88. Sivasubramanian A, Sircar A, Chaudhury S, et al. Toward high-resolution homology modeling of antibody Fv regions and application to antibody–antigen docking. Proteins 2009;74:497–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Weitzner BD, Jeliazkov JR, Lyskov S, et al. Modeling and docking of antibody structures with Rosetta. Nat Protoc 2017;12:401–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Kemmish H, Fasnacht M, Yan L. Fully automated antibody structure prediction using BIOVIA tools: validation study. PLoS One 2017;12(5): e0177923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Bujotzek A, Fuchs A, Qu C, et al. MoFvAb: modeling the Fv region of antibodies. MAbs 2015;7(5):838–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Whitelegg NR, Rees AR. WAM: an improved algorithm for modelling antibodies on the WEB. Protein Eng 2000;12:819–24. [DOI] [PubMed] [Google Scholar]
- 93. Zhu K, Day T, Warshaviak D, et al. Antibody structure determination using a combination of homology modeling, energy-based refinement, and loop prediction. Proteins Struct Funct Bioinforma 2014;82:1646–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Inc. CCG . Molecular Operating Environment (MOE), 2016.08. 1010 Sherbooke St. West, Suite #910, Montr. QC, Canada, H3A 2R7, 2016. [Google Scholar]
- 95. Mandal C, Kingery BD, Anchin JM, et al. ABGEN: a knowledge-based automated approach for antibody structure modeling. Nat Biotechnol 1996;14:323–8. [DOI] [PubMed] [Google Scholar]
- 96. Lapidoth G, Parker J, Prilusky J, et al. AbPredict 2: a server for accurate and unstrained structure prediction of antibody variable domains. Bioinformatics 2018;35(9):1591–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97. Berrondo M, Kaufmann S, Berrondo M. Automated Aufbau of antibody structures from given sequences using Macromoltek’s SmrtMolAntibody. Proteins Struct Funct Bioinforma 2014;82(8):1636–45. [DOI] [PubMed] [Google Scholar]
- 98. Leem J, Georges G, Shi J, et al. Antibody side chain conformations are position-dependent. Proteins Struct Funct Bioinforma 2018;86(4):383–92. [DOI] [PubMed] [Google Scholar]
- 99. Messih MA, Lepore R, Marcatili P, et al. Improving the accuracy of the structure prediction of the third hypervariable loop of the heavy chains of antibodies. Bioinformatics 2014;30(19):2733–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Krivov GG, Shapovalov MV, Dunbrack RL. Improved prediction of protein side-chain conformations with SCWRL4. Proteins Struct Funct Bioinforma 2009;77(4):778–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Ryu J, Lee M, Cha J, et al. BetaSCPWeb: side-chain prediction for protein structures using Voronoi diagrams and geometry prioritization. Nucleic Acids Res 2016;44:W416–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Marks C, Nowak J, Klostermann S, et al. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction. Bioinformatics 2017; 33:1346–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Choi Y, Deane CM. FREAD revisited: accurate loop structure prediction using a database search algorithm. Proteins 2010;78:1431–40. [DOI] [PubMed] [Google Scholar]
- 104. Jacobson MP, Pincus DL, Rapp CS, et al. A hierarchical approach to all-atom protein loop prediction. Proteins Struct Funct Genet 2004;55:351–67. [DOI] [PubMed] [Google Scholar]
- 105. Martin ACR, Thornton JM. Structural families in loops of homologous proteins: automatic classification, modelling and application to antibodies. J Mol Biol 1996;263:800–15. [DOI] [PubMed] [Google Scholar]
- 106. Wong WK, Georges G, Ros F, et al. SCALOP: sequence-based antibody canonical loop structure annotation. Bioinformatics 2018;35(10):1774–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107. Bujotzek A, Dunbar J, Lipsmeier F, et al. Prediction of VH-VL domain orientation for antibody variable domain modeling. Proteins Struct Funct Bioinforma 2015. [DOI] [PubMed] [Google Scholar]
- 108. Marze NA, Lyskov S, Gray JJ. Improved prediction of antibody VL-VH orientation. Protein Eng Des Sel 2016;29(10):409–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109. Dunbar J, Fuchs A, Shi J, et al. ABangle: characterising the VH-VL orientation in antibodies. Protein Eng Des Sel 2013;26:611–20. [DOI] [PubMed] [Google Scholar]
- 110. Krawczyk K, Baker T, Shi J, et al. Antibody i-Patch prediction of the antibody binding site improves rigid local antibody–antigen docking. Protein Eng Des Sel 2013;26:621–9. [DOI] [PubMed] [Google Scholar]
- 111. Kunik V, Ashkenazi S, Ofran Y. Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure. Nucleic Acids Res 2012;40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112. Olimpieri PP, Chailyan A, Tramontano A, et al. Prediction of site-specific interactions in antibody–antigen complexes: the proABC method and server. Bioinformatics 2013;29:2285–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113. Liberis E, Velickovic P, Sormanni P, et al. Parapred: antibody paratope prediction using convolutional and recurrent neural networks. Bioinformatics 2018;34(17):2944–50. [DOI] [PubMed] [Google Scholar]
- 114. Daberdaku S, Ferrari C. Antibody interface prediction with 3D Zernike descriptors and SVM. Bioinformatics 2018;35(11):1870–6. [DOI] [PubMed] [Google Scholar]
- 115. Deac A, VeliČković P, Sormanni P. Attentive cross-modal paratope prediction. J Comput Biol 2018; doi: 10.1093/bioinformatics/bty918. [Epub ahead of print] [DOI] [PubMed] [Google Scholar]
- 116. Rapberger R, Lukas A, Mayer B. Identification of discontinuous antigenic determinants on proteins based on shape complementarities. J Mol Recognit 2007;20:113–21. [DOI] [PubMed] [Google Scholar]
- 117. Sela-Culang I, Ashkenazi S, Peters B, et al. PEASE: predicting B-cell epitopes utilizing antibody sequence. Bioinformatics 2015;31(8):1313–5. [DOI] [PubMed] [Google Scholar]
- 118. Sela-Culang I, Benhnia MREI, Matho MH, et al. Using a combined computational-experimental approach to predict antibody-specific B cell epitopes. Structure 2014;22:646–57. [DOI] [PubMed] [Google Scholar]
- 119. Krawczyk K, Liu X, Baker T, et al. Improving B-cell epitope prediction and its application to global antibody–antigen docking. Bioinformatics 2014;30(16):2288–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120. Jespersen MC, Mahajan S, Peters B, et al. Antibody specific B-cell epitope predictions: leveraging information from antibody–antigen protein complexes. Front Immunol 2019;10:298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121. Hua CK, Gacerez AT, Sentman CL, et al. Computationally-driven identification of antibody epitopes. Elife 2017; 6. pii: e2902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122. Bourquard T, Musnier A, Puard V, et al. MAbTope: a method for improved epitope mapping. J Immunol 2018;201(10):3096–105. [DOI] [PubMed] [Google Scholar]
- 123. Soga S, Kuroda D, Shirai H, et al. Use of amino acid composition to predict epitope residues of individual antibodies. Protein Eng Des Sel 2010;23:441–8. [DOI] [PubMed] [Google Scholar]
- 124. Zhao L, Li J. Mining for the antibody–antigen interacting associations that predict the B cell epitopes. BMC Struct Biol 2010;10:S6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125. Zhao L, Wong L, Li J. Antibody-specified B-cell epitope prediction in line with the principle of context-awareness. IEEE/ACM Trans Comput Biol Bioinform 2011;8:1483–94. [DOI] [PubMed] [Google Scholar]
- 126. Kozakov D, Hall DR, Xia B, et al. The ClusPro web server for protein–protein docking. Nat Protoc 2017;12(2):255–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127. Shimba N, Kamiya N, Nakamura H. Model building of antibody–antigen complex structures using GBSA scores. J Chem Inf Model 2016;6(10):2005–12. [DOI] [PubMed] [Google Scholar]
- 128. Ramírez-Aportela E, López-Blanco JR, Chacón P. FRODOCK 2.0: fast protein–protein docking server. Bioinformatics 2016;32(15):2386–8. [DOI] [PubMed] [Google Scholar]
- 129. Macindoe G, Mavridis L, Venkatraman V, et al. HexServer: an FFT-based protein docking server powered by graphics processors. Nucleic Acids Res 2010;38:W445–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130. Chen R, Li L, Weng Z. ZDOCK: an initial-stage protein docking algorithm. Proteins 2003;1:80–7. [DOI] [PubMed] [Google Scholar]
- 131. Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J Am Chem Soc 2003;125:1731–7. [DOI] [PubMed] [Google Scholar]
- 132. De Vries SJ, Van Dijk ADJ, Krzeminski M, et al. HADDOCK versus HADDOCK: new features and performance of HADDOCK2.0 on the CAPRI targets. Proteins Struct Funct Genet 2007;69(4):726–33. [DOI] [PubMed] [Google Scholar]
- 133. De Vries SJ, Schindler CEM, Chauvot De Beauchêne I, Zacharias M. A web interface for easy flexible protein–protein docking with ATTRACT. Biophys J 2015;108(3):462–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134. Tovchigrechko A, Vakser IA. GRAMM-X public web server for protein–protein docking. Nucleic Acids Res 2006;34:W310–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135. Jiménez-García B, Pons C, Fernández-Recio J. pyDockWEB: a web server for rigid-body protein–protein docking using electrostatics and desolvation scoring. Bioinformatics 2013;29(13):1698–9. [DOI] [PubMed] [Google Scholar]
- 136. Torchala M, Moal IH, Chaleil RAG, et al. SwarmDock: a server for flexible protein–protein docking. Bioinformatics 2013;29(6):807–9. [DOI] [PubMed] [Google Scholar]
- 137. Duhovny D, Nussinov R, Wolfson HJ. Efficient Unbound Docking of Rigid Molecules. Gusf. al., Ed. Proc. 2’nd Work. Algorithms Bioinformatics (WABI) Rome, Italy, Lect. Notes Comput. Sci. 2002; 2452:185–200 [Google Scholar]
- 138. Schneidman-Duhovny D, Inbar Y, Nussinov R, et al. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucl Acids Res 2005;33:W363–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139. Pantazes RJ, Maranas CD. OptCDR: a general computational method for the design of antibody complementarity determining regions for targeted epitope binding. Protein Eng Des Sel 2010;11:849–58. [DOI] [PubMed] [Google Scholar]
- 140. Li T, Pantazes RJ, Maranas CD. OptMAVEn—a new framework for the de novo design of antibody variable region models targeting specific antigen epitopes. PLoS One 2014;9(8): e105954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141. Chowdhury R, Allan MF, Maranas CD. OptMAVEn-2.0: de novo design of variable antibody regions against targeted antigen epitopes. Antibodies 2018;7(3):23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142. Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, et al. RosettaAntibodyDesign (RAbD): a general framework for computational antibody design. PLoS Comput Biol 2018;14(4): e1006112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143. Lapidoth GD, Baran D, Pszolla GM, et al. AbDesign: an algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins Struct Funct Bioinforma 2015;83(8):1385–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144. Olimpieri PP, Marcatili P, Tramontano A. Tabhu: tools for antibody humanization. Bioinformatics 2014;31:434–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145. Lazar GA, Desjarlais JR, Jacinto J, et al. A molecular immunology approach to antibody humanization and functional optimization. Mol Immunol 2007;44(8):1986–98. [DOI] [PubMed] [Google Scholar]
- 146. Gao SH, Huang K, Tu H, et al. Monoclonal antibody humanness score and its applications. BMC Biotechnol 2013;13:55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147. Choi Y, Hua C, Sentman CL, et al. Antibody humanization by structure-based computational protein design. MAbs 2015;7:1045–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148. Lauer TM, Agrawal NJ, Chennamsetty N, et al. Developability index: a rapid in silico tool for the screening of antibody aggregation propensity. J Pharm Sci 2012;101:102–15. [DOI] [PubMed] [Google Scholar]
- 149. Jain T, Boland T, Lilov A, et al. Prediction of delayed retention of antibodies in hydrophobic interaction chromatography from sequence using machine learning. Bioinformatics 2017;33(23):3758–66. [DOI] [PubMed] [Google Scholar]
- 150. López-Santibáñez-Jácome L, Avendaño-Vázquez SE, Flores-Jasso CF. The pipeline repertoire for Ig-Seq analysis. Front Immunol 2019;10:899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151. Dondelinger M, Filée P, Sauvage E, et al. Understanding the significance and implications of antibody numbering and antigen-binding surface/residue definition. Front Immunol 2018;9:2278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152. Wu TT, Kabat EA. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J Exp Med 1970;132:211–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153. Lefranc MP. IMGT unique numbering for the variable (V), constant (C), and groove (G) domains of IG, TR, MH, IgSF, and MhSF. Cold Spring Harb Protoc 2011;6:633–42. [DOI] [PubMed] [Google Scholar]
- 154. MacCallum RM, Martin ACR, Thornton JM. Antibody–antigen interactions: contact analysis and binding site topography. J Mol Biol 1996;262:732–45. [DOI] [PubMed] [Google Scholar]
- 155. North B, Lehmann A, Dunbrack RL, Jr. A new clustering of antibody CDR loop conformations. J Mol Biol 2011;2:228–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156. Marks C, Deane CM. Antibody H3 structure prediction. Comput Struct Biotechnol J 2017;15:222–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157. Stave JW, Lindpaintner K. Antibody and antigen contact residues define epitope and paratope size and structure. J Immunol 2013;191(3):1428–35. [DOI] [PubMed] [Google Scholar]
- 158. Sela-Culang I, Kunik V, Ofran Y. The structural basis of antibody–antigen recognition. Front Immunol 2013;4:302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159. Kunik V, Peters B, Ofran Y. Structural consensus among antibodies defines the antigen binding site. PLoS Comput Biol 2012;8: e1002388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160. Wang X, Singh SK, Kumar S. Potential aggregation-prone regions in complementarity-determining regions of antibodies and their contribution towards antigen recognition: a computational analysis. Pharm Res 2010;27(8):1512–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161. Kringelum JV, Lundegaard C, Lund O, et al. Reliable B cell epitope predictions: impacts of method development and improved benchmarking. PLoS Comput Biol 2012;8: e1002829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162. Kazi A, Chuah C, Majeed ABA, et al. Current progress of immunoinformatics approach harnessed for cellular- and antibody-dependent vaccine design. Pathog Glob Health 2018;112(3):123–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163. Deng X, Storz U, Doranz BJ. Enhancing antibody patent protection using epitope mapping information. MAbs 2017;10(2):204–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164. Potocnakova L, Bhide M, Pulzova LB. An introduction to B-cell epitope mapping and in silico epitope prediction. J Immunol Res 2016;2016:6760830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165. Haste Andersen P, Nielsen M, Lund O. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 2006;15:2558–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166. Gao J, Kurgan L. Computational prediction of B cell epitopes from antigen sequences. Methods Mol Biol 2014;1184:197–215. [DOI] [PubMed] [Google Scholar]
- 167. Kunik V, Ofran Y. The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops. Protein Eng Des Sel 2013;26(10):599–609. [DOI] [PubMed] [Google Scholar]
- 168. Kringelum JV, Nielsen M, Padkjær SB, et al. Structural analysis of B-cell epitopes in antibody: protein complexes. Mol Immunol 2013;53:24–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169. Greenbaum JA, Andersen PH, Blythe M, et al. Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. J Mol Recognit 2007;20(2):75–82. [DOI] [PubMed] [Google Scholar]
- 170. Pagadala NS, Syed K, Tuszynski J. Software for molecular docking: a review. Biophys Rev 2017;9(2):91–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 171. Pierce B, Weng Z. ZRANK: reranking protein docking predictions with an optimized energy function. Proteins Struct Funct Genet 2007;67(4):1078–86. [DOI] [PubMed] [Google Scholar]
- 172. Andrusier N, Nussinov R, Wolfson HJ. FireDock: fast interaction refinement in molecular docking. Proteins Struct Funct Genet 2007;69(1):139–59. [DOI] [PubMed] [Google Scholar]
- 173. Pons C, Talavera D, De La Cruz X, et al. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein–protein docking. J Chem Inf Model 2011;51(2):370–7. [DOI] [PubMed] [Google Scholar]
- 174. Rodrigues JP, Bonvin AMJJ. Integrative computational modeling of protein interactions. FEBS J 2014;281:1988–2003. [DOI] [PubMed] [Google Scholar]
- 175. Sevy AM, Healey JF, Deng W, et al. Epitope mapping of inhibitory antibodies targeting the C2 domain of coagulation factor VIII by hydrogen-deuterium exchange mass spectrometry. J Thromb Haemost 2013;11:2128–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176. Coales SJ, Tuske SJ, Tomasso JC, et al. Epitope mapping by amide hydrogen/deuterium exchange coupled with immobilization of antibody, on-line proteolysis, liquid chromatography and mass spectrometry. Rapid Commun Mass Spectrom 2009;23:639–47. [DOI] [PubMed] [Google Scholar]
- 177. Kotev M, Soliva R, Orozco M. Challenges of docking in large, flexible and promiscuous binding sites. Bioorganic Med Chem 2016;24(20):4961–9. [DOI] [PubMed] [Google Scholar]
- 178. Ambrosetti F, Jiménez-García B, Roel-Touris J, et al. Information-driven modelling of antibody–antigen complexes. SSRN Electron J 2019; doi: 10.2139/ssrn.3362436 [DOI] [PubMed] [Google Scholar]
- 179. Lensink MF, Velankar S, Wodak SJ. Modeling protein–protein and protein–peptide complexes: CAPRI 6th edition. Proteins Struct Funct Bioinforma 2017;85(3):359–77. [DOI] [PubMed] [Google Scholar]
- 180. Kuroda D, Tsumoto K. Antibody affinity maturation by computational design. Methods Mol Biol 2018;1827:15–34. [DOI] [PubMed] [Google Scholar]
- 181. Lippow SM, Wittrup KD, Tidor B. Computational design of antibody-affinity improvement beyond in vivo maturation. Nat Biotechnol 2007;25:1171–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182. MacKerel AD, Jr, Brooks CL, III, Nilsson L, et al. CHARMM: the energy function and its parameterization with an overview of the program. Encycl Comput Chem 1998;1:271–7. [Google Scholar]
- 183. Leaver-Fay A, Tyka M, Lewis SM, et al. Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184. Nimrod G, Fischman S, Austin M, et al. Computational design of epitope-specific functional antibodies. Cell Rep 2018;51:156–62. [DOI] [PubMed] [Google Scholar]
- 185. Jones PT, Dear PH, Foote J, et al. Replacing the complementarity-determining regions in a human antibody with those from a mouse. Nature 1986;321(6069):522–5. [DOI] [PubMed] [Google Scholar]
- 186. Almagro JC, Fransson J. Humanization of antibodies. Front Biosci 2008;13:1619–33. [DOI] [PubMed] [Google Scholar]
- 187. Clavero-Álvarez A, Di Mambro T, Perez-Gaviro S, et al. Humanization of antibodies using a statistical inference approach. Sci Rep 2018;8(1):14820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188. Seeliger D. Development of scoring functions for antibody sequence assessment and optimization. PLoS One 2013;8(10): e76909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189. Roguska MA, Pedersen JT, Keddy CA, et al. Humanization of murine monoclonal antibodies through variable domain resurfacing. Proc Natl Acad Sci U S A 1994;91(3):969–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 190. Jiskoot W, Kijanka G, Randolph TW, et al. Mouse models for assessing protein immunogenicity: lessons and challenges. J Pharm Sci 2016;105:1567–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 191. Singh SK. Impact of product-related factors on immunogenicity of biotherapeutics. J Pharm Sci 2011. [DOI] [PubMed] [Google Scholar]
- 192. Soria-Guerra RE, Nieto-Gomez R, Govea-Alonso DO, et al. An overview of bioinformatics tools for epitope prediction: implications on vaccine development. J Biomed Inform 2015. [DOI] [PubMed] [Google Scholar]
- 193. Sidhom J-W, Pardoll D, Baras A. AI-MHC: an allele-integrated deep learning framework for improving Class I & Class II HLA-binding predictions. bioRxiv 2018; doi: 10.1101/318881 [DOI]
- 194. Jawa V, Cousens LP, Awwad M, et al. T-cell dependent immunogenicity of protein therapeutics: preclinical assessment and mitigation. Clin Immunol 2013;149(3):534–55. [DOI] [PubMed] [Google Scholar]
- 195. Kumar S, Singh SK, Wang X, et al. Coupling of aggregation and immunogenicity in biotherapeutics: T- and B-cell immune epitopes may contain aggregation-prone regions. Pharm Res 2011;28(5):949–61. [DOI] [PubMed] [Google Scholar]
- 196. Kumar S, Mitchell MA, Rup B, et al. Relationship between potential aggregation-prone regions and HLA-DR-binding T-cell immune epitopes: implications for rational design of novel and follow-on therapeutic antibodies. J Pharm Sci 2012;101(8):2686–701. [DOI] [PubMed] [Google Scholar]
- 197. Kumar S, Plotnikov NV, Rouse JC, et al. Biopharmaceutical informatics: supporting biologic drug development via molecular modelling and informatics. J Pharm Pharmacol 2017;70(5):595–608. [DOI] [PubMed] [Google Scholar]
- 198. Tomar DS, Singh SK, Li L, et al. In silico prediction of diffusion interaction parameter (k D), a key indicator of antibody solution behaviors. Pharm Res 2018;35(10):193. [DOI] [PubMed] [Google Scholar]
- 199. Tomar DS, Li L, Broulidakis MP, et al. In-silico prediction of concentration-dependent viscosity curves for monoclonal antibody solutions. MAbs 2017;9(3):476–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 200. Plotnikov NV, Singh SK, Rouse JC, et al. Quantifying the risks of asparagine deamidation and aspartate isomerization in biopharmaceuticals by computing reaction free-rnergy surfaces. J Phys Chem B 2017;121(4):719–30. [DOI] [PubMed] [Google Scholar]
- 201. Tomar DS, Kumar S, Singh SK, et al. Molecular basis of high viscosity in concentrated antibody solutions: strategies for high concentration drug product development. MAbs 2016;8(2):216–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202. Sormanni P, Aprile FA, Vendruscolo M. The CamSol method of rational design of protein mutants with enhanced solubility. J Mol Biol 2015;427(2):478–90. [DOI] [PubMed] [Google Scholar]
- 203. Wolf Pérez AM, Sormanni P, Andersen JS, et al. In vitro and in silico assessment of the developability of a designed monoclonal antibody library. MAbs 2019;11(2):388–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 204. Agrawal NJ, Kumar S, Wang X, et al. Aggregation in protein-based biotherapeutics: computational studies and tools to identify aggregation-prone regions. J Pharm Sci 2011;100(12):5081–95. [DOI] [PubMed] [Google Scholar]
- 205. Buck PM, Kumar S, Wang X, et al. Computational methods to predict therapeutic protein aggregation. Methods Mol Biol 2012;899:425–51. [DOI] [PubMed] [Google Scholar]
- 206. Wang X, Das TK, Singh SK, et al. Potential aggregation prone regions in biotherapeutics: a survey of commercial monoclonal antibodies. MAbs 2009;1(3):254–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 207. Rawat P, Kumar S, Michael GM. An in-silico method for identifying aggregation rate enhancer and mitigator mutations in proteins. Int J Biol Macromol 2018;18(Pt A):1157–67. [DOI] [PubMed] [Google Scholar]
- 208. Hebditch M, Carballo-Amador MA, Charonis S, et al. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics 2017;33(19):3098–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 209. Zambrano R, Jamroz M, Szczasiuk A, et al. AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures. Nucleic Acids Res 2015;43:W306–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 210. Wang W, Nema S, Teagarden D. Protein aggregation—pathways and influencing factors. Int J Pharm 2010;390(2):89–99. [DOI] [PubMed] [Google Scholar]
- 211. Chennamsetty N, Voynov V, Kayser V, et al. Prediction of aggregation prone regions of therapeutic proteins. J Phys Chem B 2010;114:6614–24. [DOI] [PubMed] [Google Scholar]
- 212. Georgiou G, Ippolito GC, Beausang J, et al. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat Biotechnol 2014;32:158–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 213. Rubelt F, Busse CE, Bukhari SAC, et al. Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data. Nat Immunol 2017;18:1274–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 214. Miho E, Roškar R, Greiff V, et al. Large-scale network analysis reveals the sequence space architecture of antibody repertoires. Nat Commun 2019;10:1321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 215. Galson J, Trück J, Fowler A, et al. Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences. EBioMedicine 2015;2:2070–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 216. Reddy ST, Ge X, Miklos AE, et al. Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat Biotechnol 2010;28:965–9. [DOI] [PubMed] [Google Scholar]
- 217. Fowler A, Galson JD, Trück J, et al. Inferring B cell specificity for vaccines using a mixture model. bioRxiv 2018; doi: 10.1101/464792 [DOI] [PMC free article] [PubMed]
- 218. Keller MA, Stiehm ER. Passive immunity in prevention and treatment of infectious diseases. Clin Microbiol Rev 2000;13(4):602–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 219. Chaussabel D. Assessment of immune status using blood transcriptomics and potential implications for global health. Semin Immunol 2015;27(1):58–66. [DOI] [PubMed] [Google Scholar]
- 220. Greiff V, Bhat P, Cook SC, et al. A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Med 2015;7:49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 221. Ostmeyer J, Christley S, Rounds WH, et al. Statistical classifiers for diagnosing disease from immune repertoires: a case study using multiple sclerosis. BMC Bioinformatics 2017;18:401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 222. Arora R, Kapllinsky J, Li A, et al. Repertoire-based diagnostics using statistical biophysics. bioRxiv 2019; doi: 10.1101/519108 [DOI]
- 223. Soto C, Bombardi RG, Branchizio A, et al. High frequency of shared clonotypes in human B cell receptor repertoires. Nature 2019;566:398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 224. Krawczyk K, Kelm S, Kovaltsuk A, et al. Structurally mapping antibody repertoires. Front Immunol 2018;9:1698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 225. Krawczyk K, Raybould M, Kovaltsuk A, et al. Looking for therapeutic antibodies in next generation sequencing repositories. bioRxiv 2019; doi: 10.1101/572958 [DOI] [PMC free article] [PubMed]
- 226. Perelson AS, Oster GF. Theoretical studies of clonal selection: minimal antibody repertoire size and reliability of self-non-self discrimination. J Theor Biol 1979;81:645–70. [DOI] [PubMed] [Google Scholar]
- 227. Dekosky BJ, Kojima T, Rodin A, et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat Med 2014;21:1–8. [DOI] [PubMed] [Google Scholar]
- 228. Feige MJ, Grawert MA, Marcinowski M, et al. The structural analysis of shark IgNAR antibodies reveals evolutionary principles of immunoglobulins. Proc Natl Acad Sci 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 229. Griffiths K, Dolezal O, Parisi K, et al. Shark variable new antigen receptor (VNAR) single domain antibody fragments: stability and diagnostic applications. Antibodies 2013. [Google Scholar]
- 230. Muyldermans S. Nanobodies: natural single-domain antibodies. Annu Rev Biochem. 2013;82:775–97. [DOI] [PubMed] [Google Scholar]
- 231. Mitchell LS, Colwell LJ. Analysis of nanobody paratopes reveals greater diversity than classical antibodies. Protein Eng Des Sel 2018;31(7–8):267–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 232. Mitchell LS, Colwell LJ. Comparative analysis of nanobody sequence and structure data. Proteins Struct Funct Bioinforma 2018;86(7):697–706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 233. Staus DP, Strachan RT, Manglik A, et al. Allosteric nanobodies reveal the dynamic range and diverse mechanisms of G-protein-coupled receptor activation. Nature 2016;535(7612):448–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 234. Steyaert J, Kobilka BK. Nanobody stabilization of G protein-coupled receptor conformational states. Curr Opin Struct Biol 2011;21(4):567–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 235. Sircar A, Sanni KA, Shi J, et al. Analysis and modeling of the variable region of camelid single-domain antibodies. J Immunol 2011;186:6357–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 236. Rasmussen SGF, Choi HJ, Fung JJ, et al. Structure of a nanobody-stabilized active state of the β2adrenoceptor. Nature 2011;469(7329):175–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 237. Haas J, Barbato A, Behringer D, et al. Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins Struct Funct Bioinforma 2018;86(Suppl 1):387–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 238. Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.