Patent summarization and functional determination pipeline
Chemical SMILES strings are converted to PubChem CIDs based on same connectivity, then associated patent IDs are obtained. These are then used to obtain the patent title, abstract, and description from Google Scholar. The patent information is then passed into GPT-3.5-turbo with a prompt to obtain summarized functional labels. These labels are then passed into GPT-4 with pre-defined labels to determine functional similarity between the query and molecule in question. Figure created with