. 2024 Mar 5;20(3):e1011881. doi: 10.1371/journal.pcbi.1011881

Table 1. Antibody Liability Reference.

Liabilities are identified within the IMGT-defined regions in IMGT-numbered sequences.

Name	Short name/Tag	Severity	Motif	Description	Citations
Deamidation (high)	DeAmdH	High	N[GS] in CDRs	Deamidation of Asparagine occurs in the following motifs: NG motif (Asparagine followed by Glycine) and NS motif (Asparagine followed by Serine). Such motifs are known to be associated with deamidation (type of degradation) and can result in reduced "shelf-life".	[7,16]
Fragmentation (high)	FragH	High	DP in CDRs	Fragmentation occurs as cleavage at the interface between Aspartate and Proline. It is an example of a common motif that is susceptible to hydrolysis in response to pH.	[17]
Isomerization	Isom	High	D[DGHST] in CDRs	Isomerization of Aspartate occurs in the following motifs: DD (Aspartate followed by aspartate), DG motif (Aspartate followed by Glycine), DH (Aspartate followed by Histidine), DS motif (Aspartate followed by Serine) and DT (Aspartate followed by Threonine). Such motifs are known to be connected to isomerization (type of degradation) and can cause a shorter “shelf-life” of antibodies.	[18]
Missing Cyst (C)	mCys	High	C not present at 23 or 104 IMGT positions	Missing Cysteine occurs as cysteine absence at IMGT 23 or 104. Certain antibody sequence regions containing unpaired cysteines may result in structural changes, surface charges, or hydrophobicity.	[19]
Extra Cys (C)	xCys	High	C present at different position then 23 or 104 IMGT positions	Extra Cysteine occurs as cysteine present at a different position than IMGT 23 or 104. Certain antibody sequence regions containing unpaired cysteines can change an antibody’s structure, apparent surface charges, or hydrophobicity.	[20]
N-linked glycosylation (NXS/T, X not P)	Ngly	High	N[^P][ST] in variable fragment	N-linked glycosylation occurs as an addition of a sugar molecule. Reduced conformational stability and shorter "shelf-life" of antibody products are connected to asparagine linked glycosylation. Incidence of glycosylation in the CDRs can also directly impair antigen recognition and therefore lead to lower efficacy.	[21]
Deamidation (medium)	DeAmdM	Medum	N[AHNT] in CDRs	Occurs in the following motifs: NA (Aspargine followed by Alanine), NH (Aspargine followed by Histidine), NN (Aspargine followed by Aspargine) and NT (Aspargine followed by Threonine). This type of deamidation is less common in comparison to the NG and NS motifs.	[7,16]
Hydrolysis	Hydro	Medium	NP in CDRs	Hydrolysis gives rise to the DP motif as a result of the deamidation of Asparagine (N) to Aspartate (D).	[17]
Fragmentation (medium)	FragM	Medium	TS in CDRs	Occurs as pH-dependent cleavage at the Threonine—Serine interface.	[17]
Trp (W) oxidation	TrpOx	Medium	W in CDRs	Tryptophan oxidation is one of the Post-translational modifications (PTMs).	[22]
Met (M) oxidation	MetOx	Medium	M in CDRs	Methionine oxidation occurs in the CDRs. Reduced binding affinity and quicker degradation of the antibody product are linked to oxidation in these particular spots.	[23]
Deamidation (low)	DeAmdL	Low	[STK]N in CDRs	Occurs in the following motifs: SN (Serine followed by Aspargine), TN (Threonine followed by Aspargine), and KN (Lysine followed by Aspargine). This type of deamidation is less common than others.	[7,16]
Integrin binding	IntBind	Low	GPR\|RGD\|RYD\|LDV\|DGE\|KGD\|NGR in fragment variable	Motifs for following integrin binding: αVβ3 (RGD\|RYD\|KGD\|NGR), α4β1 (LDV), α2β1 (DGE) CD11c/CD18 (GPR). eight human integrins act as RGD receptors: α5β1, α8β1, αVβ1, αVβ3, αVβ5, αVβ6, αVβ8 and αIIbβ3	[24]

We used antibody sequences from therapeutics, patents, GenBank, literature, and a large paired next-generation (NGS) sequencing dataset. The therapeutics and patents can be thought of as representatives of the clinical spectrum of sequences [25,26]. GenBank and literature are a mix of antibodies developed for scientific/therapeutic purposes [27,28]. The NGS dataset is a sample of the natural diversity [29]. We extracted unique heavy and light chain sequences from each source and we stratified them by detected organisms (human or non-human closest germline). In the case of the NGS and therapeutics datasets, the heavy and light chains were already paired. All other datasets were unpaired with heavy or light chain sequences.