Skip to main content
. 2023 Jun 22;8(26):23566–23578. doi: 10.1021/acsomega.3c01218

Table 2. Different Descriptor Spaces (DS) Used to Represent the 2D Structure of Insulin Analogsa,b.

name type of descriptors examples descriptor space dimension
DS1 overall numeric molecular descriptors calculated from the entire sequence of the insulin analogs size, charge, hydrophobicity 15
DS2 physiochemical molecular descriptors of the fatty acid side chain (acylation group) surface area, LogP, molecular weight 7
DS3 NLP embedding approach ESM-1b,30 encoding the entire backbone sequence (insulin and attached amino acids/sequences) GIVEQCCTSICSL 1280
DS4 SMILES representation of the fatty acid side chain Mol2Vec31 used for embedding of SMILES NC(C)C(=O)O 100
a

Abbrevations: NLP, Natural Language Processing; ESM, Evolutionary Scale Modeling; SMILES, Simplified Molecular-Input Line-Entry System.

b

For more details on descriptors, see Supporting Information Table S3, Table S4, and Figure S3.