Table 2. Different Descriptor Spaces (DS) Used to Represent the 2D Structure of Insulin Analogsa,b.
| name | type of descriptors | examples | descriptor space dimension |
|---|---|---|---|
| DS1 | overall numeric molecular descriptors calculated from the entire sequence of the insulin analogs | size, charge, hydrophobicity | 15 |
| DS2 | physiochemical molecular descriptors of the fatty acid side chain (acylation group) | surface area, LogP, molecular weight | 7 |
| DS3 | NLP embedding approach ESM-1b,30 encoding the entire backbone sequence (insulin and attached amino acids/sequences) | GIVEQCCTSICSL | 1280 |
| DS4 | SMILES representation of the fatty acid side chain Mol2Vec31 used for embedding of SMILES | NC(C)C(=O)O | 100 |
Abbrevations: NLP, Natural Language Processing; ESM, Evolutionary Scale Modeling; SMILES, Simplified Molecular-Input Line-Entry System.
For more details on descriptors, see Supporting Information Table S3, Table S4, and Figure S3.