Table 1:
Study | Dataset source | Curation strategy | Training data size | Algorithm | Ref. |
---|---|---|---|---|---|
Novel excipient candidates | DrugBank, Drugs@FDA*, | High throughput screening for drug- excipient interactions, Cheminformatics computed by RDkit, and molecular dynamics studies | 2.1 million drug-excipient pairings | Random forest | 29 |
Nanoparticle protein corona | Literature, UniProt | Data mining for nanoparticle properties and classification, physiochemical descriptions of protein corona. | 56 papers with 178 independent proteins | Random forest | 14 |
Biological activity prediction | Literature, in-house screening and imaging | Structure-activity relationship, image analysis | 960 SNAs with 17,000 MALDI-MS data points; 1620 samples for immune responses and 301 samples for organ burdens; 1301 micrometastases for image analysis | Random forest, XGBoost; Support Vector Machine | 95, 96, 97 |
Excipients can be sourced under Inactive Ingredient Search for Approved Drug Product