Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource, developed and maintained by the U.S. National Institutes of Health. The present chapter describes how to find potential multi-target ligands from PubChem that would be tested in further experiments. While the protocol presented here uses PubChem’s web-based interfaces to allow users to follow it interactively, it can also be implemented in computer software by using programmatic access interfaces to PubChem (such as PUG-REST or E-Utilities).
Keywords: PubChem, multi-target ligand, virtual screening, FLink, Entrez, PUG-REST, E-Utilities
1. Introduction
PubChem [1–3] is a public repository for information on chemical substances and their biological activities (hereafter simply called “bioactivities”), developed and maintained by the U.S. National Institutes of Health. PubChem provides this collected chemical information free of charge to the scientific community, serving as a key information resource for the biomedical research communities in areas including cheminformatics, chemical biology, and medicinal chemistry.
Various aspects of PubChem, including data contents and organization, search and analysis tools, data download, and other related services, are described elsewhere [1–3] and only a brief introduction is given here. PubChem data is organized into three inter-linked databases: Substance, Compound, and BioAssay. The Substance database serves as an archive of chemical substance descriptions contributed by individual data sources. The Compound database stores unique chemical structures extracted from the Substance database through a process called structure standardization [1]. The BioAssay database contains the descriptions and substance testing results of biological assay experiments (hereafter simply called “assays”). Each PubChem record is distinguished by a unique, numerical identifier known as an SID (Substance), CID (Compound), or AID (BioAssay). As of January 2017, PubChem contains more than 227 million substance descriptions, 93 million unique chemical structures, and 231 million bioactivities from 1.2 million assays, covering 10 thousand target protein sequences and 20 thousand gene targets.
This chapter describes how to use PubChem to identify potential multi-target ligands for subsequent in silico or in vitro screening. The conceptual workflow for this task is depicted in Figure 1. While this workflow focuses on identifying dual-target ligands, it can be adapted to other cases (for example, multi-target ligands with more than two targets or selective ligands that bind to one target but not another target).
Figure 1.

Conceptual workflow for identifying potential multi-target ligands for subsequent screening.
The workflow begins with searching the BioAssay database for assays that were performed against each of the targets A and B. Then, compounds that are tested to be active in these assays are retrieved to identify those active against both targets. These “known” multi-target ligands are subsequently used as query molecules to search the Compound database for compounds that are structurally similar to them. This is based on the assumption that structurally similar molecules are likely to have similar biological activities (the so-called “similarity principle” or “similar property principle” [4]). [Evaluation of molecular similarity in PubChem is described in Notes 1 through 4.] Some of the compounds returned from the similarity search may have already been tested against any of the targets A and B, and this information can be used to prioritize the compounds. For example, if compounds are already known to be active against both targets, it is not necessary to test them against the same targets again, although some of them may be included as reference compounds in further screening to check the consistency of new screening data with the existing assay data. If compounds are known actives against one of the two targets but have not been tested against the other, these compounds may be considered as high-priority compounds for subsequent tests. If compounds are known inactives against either of the two targets, they may be regarded as low-priority compounds or excluded from consideration for further screening. The development of an actual protocol that implements the conceptual workflow shown in Figure 1 requires some additional considerations, such as the availability of necessary database tools and services and the data throughputs that they can handle. The protocol described in this chapter aims to retrieve potential dual-target kinase inhibitors for the platelet-derived growth factor receptor (PDGFR) [5–8] and vascular endothelial growth factor receptor (VEGFR) [7–10], both of which are important anticancer drug targets. This protocol uses PubChem’s web-based tools and services to allow users to follow it interactively. However, the protocol can also be implemented in computer software by using programmatic access interfaces to PubChem (such as PUG-REST or E-Utilities) [11].
2. Materials
This section describes PubChem tools and services that will be used in the present chapter. These tools are available to the public free of charge.
2.1. Entrez for text search of PubChem
Entrez [12–15] is the primary search and retrieval system used for PubChem’s three primary databases and other major databases at the National Center for Biotechnology Information (NCBI). Entrez supports text searching using simple Boolean queries (i.e., queries combined with Boolean operators, such as “AND”, “OR”, and “NOT”).
Multiple entry points exist for initiating an Entrez search against PubChem databases, as summarized in Figure 2. One of them is the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) (Figure 3), which also provides launch points to various PubChem services, tools, help documents, and more. Alternatively, one can start from the web page of one of the three PubChem databases (Figure 4).
Figure 2.

Entry points to search the PubChem databases through Entrez, which is a primary search system used for PubChem’s three primary databases and other major NCBI databases.
Figure 3.

Partial screenshot of the PubChem home page. One can initiate a text search by typing a query in the search box and clicking the “Go” button. The PubChem home page also provides launch points for PubChem tools and services.
Figure 4.

Partial screenshot of the PubChem Compound database. The layout of the other two PubChem databases (Substance and BioAssay) are similar to that of the Compound database.
It is also possible to search the PubChem databases from the NCBI home page (the upper panel of Figure 5). If the user does not specify a database to search, Entrez by default searches all Entrez databases for a “global query” and lists the number of returned records in each database on the “global query” result page (the lower panel of Figure 5). By selecting one of the three PubChem databases from this page, one can see the query result for that database.
Figure 5.

Partial screenshot of the NCBI home page and the global query result page. Unless a specific database is selected from the drop-down menu on the NCBI homepage (upper panel), Entrez searches all Entrez databases for a “global query” provided in the search box and presents the number of returned records in each database on the “global query” result page (lower panel).
One can perform an Entrez search by providing a text query in the search box available from one of the entry points in Figure 2. If the query is a phrase or a name with non-alphanumeric characters, it should be enclosed by double quotes. Various indices can be individually searched by suffixing a text query with an appropriate “Entrez index” enclosed by square brackets (for example, the query “2-[4-(2-methylpropyl)phenyl]propanoic acid”[iupacname]). The Entrez indices available for each database can be found from the Advance Search Builder (to be discussed later), and commonly used ones are listed in Table 1 and 2. Numeric range searches of appropriate index fields can be performed using a “:” delimiter (for example, the query 3:6[heavyatomcount] for search of compounds with the heavy (non-hydrogen) atom counts from 3 to 6).
Table 1.
Selected Entrez indices used in the PubChem BioAssay database.
| Entrez index | Description |
|---|---|
| GeneSymbol | Search for assays targeting a gene represented by the query gene symbol or proteins encoded by that gene. |
| GenBank Accession | Search for assays targeting a gene represented by the query GenBank accession or proteins encoded by that gene. |
| UniProt Accession | Search for assays targeting a protein represented by the query UniProt Accession. |
| ProteinTargetName | Search for assays by target name. |
| ProteinTargetGI | Search for small molecule assays by target global identifier. |
| RNATargetGI | Search for RNAi assays by target GI. |
| TargetCount | The number of targets tested against in an assay. |
| Assay Name | Search for assays whose title contains the query string. |
| Assay Description | Search for assays that contains the query string in their assay descriptions. |
| Active SID count | Retrieve assays in which a given number of substances are tested active. |
| Total SID count | Retrieve assays in which a given number of substances are tested. |
Table 2.
Selected Entrez indices used in the PubChem Compound database.
| Entrez index | Description |
|---|---|
| CompleteSynonym | Search for compounds whose name exactly match the query. |
| Synonym | Search for compounds whose synonyms contain the query string (i.e., partially matches the query). |
| TotalFormalCharge | Search for compounds with a given total formal charge. |
| InChI | Search for compounds whose InChI string is the same as the query |
| InChIKey | Search for compounds whose InChIKey is the same as the query |
| MolecularWeight | Mass of a molecule calculated using the average mass of each element weighted for its natural isotopic abundance. |
| ExactMass | Mass of an ion or molecule containing most likely isotopic composition for a single random molecule. |
| MonoisotopicMass | Mass of a molecule calculated using the mass of the most abundant isotope of each element. |
| Element | Retrieve compounds that contain a given element. |
| MeSH | Retrieve compounds associated with a given MeSH term. |
| PharmAction | Retrieve compounds with a given MeSH pharmacological action. |
The Entrez search system also provides a variety of Entrez filters, which allow one to subset PubChem records according to the presence or absence of a particular piece of information. For example, the query “has_pharm[filter]” against the Compound database returns all compounds that have pharmacological action annotations. The query “pccompound_pcassay[filter]” against the Compound database retrieves all compounds that are tested in any assay experiments archived in the BioAssay database. Commonly used Entrez filters are summarized in Tables 3 and 4.
Table 3.
Selected Entrez filters used in the PubChem BioAssay database.
| Entrez filter | Description |
|---|---|
| screening | Retrieve assays that are classified as primary screenings. |
| confirmatory | Retrieve assays that are classified as confirmatory assays. |
| summary | Retrieve assays that are classified as summary assays. |
| pcassay_protein_target | Retrieve assays with protein targets provided. |
| pcassay_gene_target | Retrieve assays with information on target genes provided |
| cellbased | Retrieve cell-based assays. |
| Biochemical | Retrieve biochemical assays. |
| multitarget | Restrict searches to only assays with multiple targets. |
| active_concentration | Retrieve assays with “active concentration” attribute provided |
Table 4.
Selected Entrez filters used in the PubChem Compound database.
| Entrez filter | Description |
|---|---|
| lipinski rule of 5 | Retrieve compounds that satisfy all requires in Lipinski’s rule of 5. |
| has_mesh | Retrieve compounds with MeSH annotations. |
| has_pharm | Retrieve compounds with known pharmacological actions. |
| has_patent | Retrieve compounds that are mentioned in patent documents. |
| pccompound_structure | Retrieve compounds with experimental 3-D structures |
| pccompound_pcassay | Retrieve compounds tested in assays archived in PubChem BioAssay. |
| pccompound_pcassay_active | Retrieve compounds tested active in any assay archived in PubChem BioAssay. |
| pccompound_pcassay_activityconcmicromolar | Retrieve compounds with an activity concentration at or below 1 μM. |
| pccompound_pcassay_activityconcnanomolar | Retrieve compounds with an activity concentration at or below 1 nM. |
The databases in the Entrez system are inter-linked through “Entrez links,” which allows one to readily retrieve records in one database that are associated with those in another database. Many Entrez filters are derived from Entrez links and enable a quick retrieval of records in one database that have links to a particular database. The name of these filters typically has a form of “database1_database2”, often followed by a string that represents the type of links, as in “pccompound_pcassay” or “pccompound_pcassay_active” (Tables 3 and 4). In this chapter, Entrez links will be exploited through a web-based tool called FLink (to be described later in Section 2.5) [16].
2.2. DocSum page
If an Entrez search returns multiple records, they are displayed in a document summary (DocSum) page. The DocSum page from a search against the BioAssay database is shown in Figure 6 as an example. The DocSum page for the other two PubChem databases (Compound and Substance) have a similar layout.
Figure 6.

Partial screenshot of the DocSum page that displays the returned records from a search of the PubChem BioAssay database for the query “PDGFR[GeneSymbol] AND 1[TargetCount]”. Additional controls at the right-hand column of the DocSum page allow users to refine the search results, download the retrieved bioassay records, and find associated records in Entrez databases.
The DocSum page presents a data-specific summary for each record with the link to a web page that contains detailed information on that record. [This web page with detailed information, described further in Section 2.3, is called the Compound Summary, Substance Record, or BioAssay Record page, depending on the type of the record.] In addition, for each record, the DocSum page provides links to associated records in the same or other databases. For example, each assay record in Figure 6 is presented with links to active compounds, PubMed citations, related BioAssays by target, and so on.
At the right-hand column of the DocSum page, additional controls are provided for further analysis of the query result list. As shown in Figure 6, for example, the search results can be filtered by assay target, bioactivity, experiment type, and depositor category. The BioAssay Download icon allows users to download the assay data through the PubChem Assay Download service. The drop-down menu under the “Find related data” section allows users to retrieve associated records in the BioAssay database and other Entrez databases (through an Entrez link).
2.3. Compound Summary, Substance Record, and Assay Record pages
If a search against one of the three PubChem databases returns a single record, detailed information on that record is displayed on a webpage called the Compound Summary, Substance Record, or BioAssay Record page, depending on the record type. The Compound Summary page provides a comprehensive overview of all information available for a given chemical, collected from different data sources. The Substance Record page for a substance shows information provided by only the data contributor of that substance. The Assay Record page contains assay descriptions and bioactivity data provided by the data contributor as well as other related annotated information collected by PubChem.
Figure 7 shows a partial screenshot of the Compound Summary page of CID 5329102 (sunitinib). On the top of the Compound Summary page, some commonly requested chemical information is presented. The Table of Contents on the left-side column allows users to jump to a particular section or subsection that contains desired information. In this chapter, structurally similar compounds to a small set of known bioactive compounds will be retrieved through the “Similar Compounds” and “Similar Conformers” links (available under the “Related Compound” subsection of the “Related Records” section of the Compound Summary pages). See Note 5 for the definitions of “Similar Compounds” and “Similar Conformers” of a compound in PubChem.
Figure 7.

Partial screenshot of the PubChem Compound Summary page for sunitinib (CID 5329102).
2.4. Advanced Search Builder for formulating complex queries
The Advanced Search Builder page (Figure 8) helps users formulate complex queries. This page can be accessed by clicking the “Advanced” link on the PubChem homepage (Figure 3) and the Entrez page of each PubChem database (Figure 4). From the drop-down menus on the Advanced Search Builder, one can see what search indices are available for a specific database. In addition, the Advance Search Builder displays all the previous queries in a tabular format, which helps one to combine them using the Boolean operators, “AND”, “OR”, and “NOT”. The Boolean operators should be provided in capital letters.
Figure 8.

Partial screenshot of the PubChem Compound Advanced Search Builder. The histories for the searches performed in the respective subsections of the Methods section are indicated with their subsection numbers (in the format of “§3.x”) next to the history table.
2.5. FLink
FLink (Figure 9) [16] is a web-based tool used to get a ranked list of records in a destination database that are associated with a group of records in a source database. The retrieved records in the destination database are ranked by the number of records to which they are associated through specified “Entrez links”. The records in the destination database as well as one-to-one correspondence between the records in the two databases can be downloaded as a comma-separated value (CSV) file. FLink can accept a maximum of 100,000 items as input and can display a maximum of 100,000 items as output in any given destination database. An input or output larger than this limit will be truncated.
Figure 9.


Screenshots of the FLink tool that illustrates how to retrieve compounds tested in input assays.
Currently, FLink supports PubChem’s three databases as well as Gene, Protein, Structure, BioSystems, Conserved Domain Database (CDD), and PubMed. Any of these databases can be used as a source database or destination database. For example, using FLink, one can retrieve a list of known active compounds against a protein or gene target, or to find bioassay records that have tested a set of compounds. In this chapter, this tool is used to retrieve compounds tested in a particular assay.
2.6. Structure Download Service
The PubChem Structure Download service (https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi) (Figure 10) is used to download a subset of substance or compound records in PubChem (Notes 6 and 7). The supported file formats include: text and binary ASN.1 (PubChem’s native data format), Structure-Data File (SDF) [17], Portable Network Graphics (PNG), and Extensible Markup Language (XML). In addition, Simplified Molecular-Input Line-Entry System (SMILES) [18–20] and International Chemical Identifier (InChI) [21,22] for the input compounds or substances can be downloaded in a text file (Note 8). The files may be optionally compressed in standard gzip (.gz) or bzip2 (.bz2) formats.
Figure 10.

Screenshots of the PubChem Structure Download tool.
The input identifiers (CIDs or SIDs) may be provided through the web form or uploaded from a local file. Alternatively, the results of a previous search (stored as an Entrez history) may be used as input identifiers to the Structure Download Service. In addition, the records presented on the DocSum page from an Entrez or PubChem-specific search may be fed to the Structure Download Service, by clicking the download link available on the top-right side of the DocSum page.
3. Methods
This section provides a step-by-step instruction on how to get a list of potential dual-target ligands against PDGFR and VEGFR using PubChem’s web-based tools and services. First, known active and inactive compounds against the two targets are retrieved (Sections 3.1 and 3.2, respectively) and used to build filters that determines the priority of compounds for further screening (Section 3.3). Compounds similar to known dual-target ligands are retrieved (Section 3.4), prioritized using the compound filters developed in previous steps (Section 3.5), and downloaded on the user’s computer (Section 3.6). This protocol generates dozens of compound sets, which are summarized in Tables 5, 6, and 7. The number of compounds in these sets reflects data contents in PubChem as of November 2016, and they may be different from what users would get when they follow this protocol by themselves (see Note 9).
Table 5.
Description for the compound sets associated with assays targeting each of the PDGFR and VEGFR, along with the number of compounds in each set (retrieved from Sections 3.1 and 3.2).
| Set | Description | # CIDs | |
|---|---|---|---|
| X=PDGFR | X=VEGFR | ||
| Compounds tested against target X. | 1486 | 5086 | |
| Compounds declared to be active against target X in any assays considered. | 595 | 2713 | |
| Compounds with activity concentrations of ≤ 1 μM against target X. | 424 | 2148 | |
| Compounds that are declared to be active or have activity concentrations of ≤ 1 μM against target X (i.e., ) | 607 | 2771 | |
| Compounds tested against target X that are not declared to be active nor have activity concentrations of ≤ 1 μM. (i.e., ) | 879 | 2315 | |
Table 6.
Description for the compound filters generated in Section 3.3, along with the number of compounds in the respective filter.
| Filter | Description | # CIDs |
|---|---|---|
| Known actives against both targets (i.e., ). These compounds may be included in subsequent experiments as reference compounds or for confirmatory purposes. | 347 | |
| Known actives against one of the two targets, whose activity against the other target has not been tested [e.g.,( NOT ) OR ( NOT )]. These compounds have high priority in subsequent experiments for multi-target ligand discovery, because they are already known to be active against one target. | 2539 | |
| Those which are known to be inactive against at least one of the targets (e.g., ). It is reasonable that these low-priority compounds should be excluded from further consideration because they are already known to be inactive against one target. | 2497 |
Table 7.
Description for the prioritized potential dual-target ligand sets, generated in Sections 3.4 and 3.5, along with their number of compounds.
| Filter | Description | # CIDs |
|---|---|---|
| Known highly-active dual-target ligands (with activity concentrations of ≤ 1nM) used as “queries” for retrieval of structurally similar molecules. | 4 | |
| Potential dual-target ligands that are structurally similar to the query molecules in . | 6203 | |
| Compounds in that are known actives against both targets. These compounds may be included in subsequent experiments as reference compounds or for confirmatory purposes. | 36 | |
| Compounds in that are known actives against one of the two targets, whose activity against the other target has not been tested. These compounds have high priority in subsequent experiments for multi-target ligand discovery, because they are already known to be active against one target. | 13 | |
| Compounds in that are known inactives against at least one of the targets (e.g., ). It is reasonable that these low-priority compounds should be excluded from further consideration because they are already known to be inactive against one target. | 3 |
3.1. Retrieving compounds tested in assays against PDGFR
In this step, bioassays performed against PDGFR are retrieved from Entrez searches, and then the compounds tested in these assays are subsequently retrieved using the NCBI’s FLink tool (Section 2.5) and the Advanced Search Builder (Section 2.4).
Go to the PubChem homepage (https://pubchem.ncbi.nlm.nih.gov), and click the “BioAssay” tab above the text search box. Alternatively, you may initiate the search from the PubChem BioAssay page (https://www.ncbi.nlm.nih.gov/pcassay/) or the NCBI home page. See Section 2.1 and Figure 2 for multiple entry points to Entrez search.
Perform a search for assays tested against PDGFR by typing “PDGFR[genesymbol] AND 1[TargetCount]” in the search box and clicking the “GO” button (see Notes 10 and 11). The search results will be displayed on a DocSum page and also stored as an Entrez history.
Go to the NCBI FLink homepage (https://www.ncbi.nlm.nih.gov/Structure/flink/flink.cgi).
Expand the drop-down menu and select “PubChem BioAssay” as the database to start with (See Figure 9(a)).
Click the “Input From Entrez History” tab and select from the drop-down menu the search history for the previous query “PDGFR[genesymbol] AND 1[TargetCount]”. (See Figure 9(b)). Then, click the “Submit” button. This will load the retrieved assays into the FLink tool.
- Retrieve compounds associated with the loaded AIDs, using the “pcassay_pccompound” link, which returns all compounds tested in the input assays. This compound set is designated as (see Table 5).
- Click the “LinkTo” icon to retrieve compounds associated with the loaded assays (Figure 9(c)). From the drop-down menu, select the “pcassay_pccompound” link and click the “Submit” button.
- Click the “Show” icon to display the returned compounds on a DocSum page (Figure 9(d)). This operation stores the returned compounds as an Entrez history, which will be used later.
Repeat Step 6 using the “pcassay_pccompound_active” link, which returns all compounds that are tested active in any of the input assays (see Notes 12 and 13 for the definition of active and inactive compounds in PubChem assays). This compound set is designated as .
Go to the PubChem BioAssay Advanced Search page (https://www.ncbi.nlm.nih.gov/pcassay/advanced), and click the search history for the query “PDGFR[GeneSymbol] AND 1[TargetCount]”. This will take you to the DocSum page that displays the returned assays.
Select “PubChem Compound” as the database under the “Find related data” section on the right column of the DocSum page, and then select “Compounds, activity concentration at/below 1 μM” as the option (see Notes 13 and 14). This returns compounds which have activity concentration of ≤ 1 μM against PDGFR.
Go to the PubChem Compound Advanced Search Builder (https://www.ncbi.nlm.nih.gov/pccompound/advanced). The searches performed in the previous steps are displayed under the history section as shown in Figure 8. From the histories, find the search numbers for and , and take the union of them. (In the screenshot shown in Figure 8, and are #5 and #7. The query “#5 OR #7” returns compounds that belong to either of the two sets. This set is designated as (See Note 13).
Using the PubChem Compound Advanced Search Builder, retrieve the compounds tested against PDGFR that are not contained in (that is, compounds that are neither declared to be active nor with activity concentrations of ≤ 1 μM). As shown in Figure 8, this can be done by querying “#3 NOT #9”. This set becomes (See Note 13).
3.2. Retrieving compounds tested in assays against VEGFR
This section retrieves active and inactive compounds against VEGFR in the same way as the previous section.
Repeat steps 1 through 11 of Section 3.1, beginning with the query “VEGFR[genesymbol] AND 1[TargetCount]”. This will result in , , , , and . The number of compounds contained in these sets are listed in Table 5.
3.3. Generating compound filters (, , and )
This step generates three compound filters (, , and ) from the compound sets created in Sections 3.1 and 3.2. The description of these filters are given in Table 6, along with the number of CIDs contained in each filter. These compound filters are essentially compound sets that will be used later to prioritize compounds for further screening.
Go to the PubChem Compound Advanced Search Builder (https://www.ncbi.nlm.nih.gov/pccompound/advanced).
From the History section, find the query numbers for and , and take the intersection between them (e.g., “#9 AND #19” in Figure 8). The resulting compounds constitutes .
Find the query numbers for , , and , and use them to retrieve the compounds that are active against in one target and that are not tested in the other target [i.e., ]. In the example shown in Figure 8, the query is “(#9 NOT #13) OR (#19 NOT #3)”. The results becomes .
Find the query numbers for and , and take the union between them. The corresponding query in Figure 8 is “#10 OR #20”. The resulting compounds becomes .
3.4. Retrieving potential multi-target ligands for PDGFR and VEGFR
This step identifies potential multi-target ligands for PDGFR and VEGFR that are structurally similar to one or a few compounds known to be active for both. In theory, compounds contained in may be a good starting point. However, in this example of dual-target ligands for PDGFR and VEGFR contains more than 300 compounds, which is too large to manually retrieve their structural analogues using the web-based interface provided by PubChem. Therefore, in this step, a smaller subset of are first retrieved by using a tighter activity concentration threshold (≤ 1nM), and then they are used as a starting point to retrieve structurally similar molecules.
Go to the PubChem BioAssay Advanced Search page (https://www.ncbi.nlm.nih.gov/pcassay/advanced), and click the search history for the query “PDGFR[GeneSymbol] AND 1[TargetCount]” to go to the DocSum page that displays the returned assays.
Select “PubChem Compound” as the database under the “Find related data” section on the right column of the DocSum page, and then select “Compounds, activity concentration at/below 1 nM” as the option (see Note 14). This returns compounds which have activity concentration of ≤ 1 nM against PDGFR.
Repeat steps 1 and 2 with the query “VEGFR[GeneSymbol] AND 1[TargetCount] in order to get compounds with activity concentrations of ≤ 1 nM against VEGFR.
Go to the PubChem Compound Advanced Search builder, and take the intersection between the results from steps 1 through 3. Currently (as of November, 2016), this returns four compounds (CID 5329102, CID 9933475, CID 10361267, and CID 42642645). This set is designated as (Table 7).
- Retrieve a pre-computed list of compounds that are structurally similar to one of the returned compounds (see Notes 5 and 15).
- Select one of the returned compounds to go to the Compound Summary page for that compound.
- Jump to the “Related Records” section by clicking “Related Records” on the Table of Contents of the Compound summary page.
- Right-click the “Similar Compounds” link under the “Related Compounds” subsection and select “Open in a new tab”. The results displayed on the new page will be stored as an Entrez history.
- Repeat step c for the “Similar Conformers” link.
Repeat Step 5 for all the other compounds.
Take the union of the returned results from Steps 5 and 6. This results in 6023 compounds that are structurally similar to one of the four compounds. This set is designated as (Table 7).
3.5. Prioritizing potential multi-target ligands using the compound filters.
In this section, the compound filters generated in Section 3.3 are applied to the list of compounds retrieved from Section 3.4.
Go to the PubChem Compound Advanced Search page (https://www.ncbi.nlm.nih.gov/pcassay/advanced), and take the overlap between and . This set, designated as , contains known actives against both targets. It is not necessary to consider these compounds in subsequent screenings, but they may be included as reference compounds or for comparison purposes.
Take the overlap between and to get a list of high-priority compounds to screen, designated as . Since these compounds are already known to be active against one target, they need to be tested only against the other target.
Take the overlap between and to get a list of low-priority compounds to screen, designated as . Because these compounds are already known to be inactive against either of the targets, they cannot be dual-target ligands.
3.6. Downloading potential multi-target ligands to screen.
In this section, the data for compounds in the , , , and sets are downloaded in the SDF format to a local machine for use in a third-party program.
Go to the DocSum page for the set, via the search history presented on the Advanced Search Builder.
Click the Structure Download icon on the top-right corner of the DocSum page, which directs you to the Structure Download Service.
Select “SDF” as a file format and “Gzip” as a compression type from the drop-down menus.
If 3-D structure-related information is necessary, check the “Retrieve 3D records/images” box, and specify the number of 3-D conformers per CID to download (which is set to 1 by default).
Click the “Download” button to download the record.
Repeat steps 1–5 for the , , and sets.
4. Notes
Many molecular similarity methods have been developed to quantify the structural similarity between molecules, as reviewed in many articles [23–28]. PubChem uses two similarity methods: a subgraph fingerprint-based 2-D similarity method (Note 2) and a Gaussian-shape overlay-based 3-D similarity method (Notes 3 and 4). These two methods are considered to complement each other because chemical structure similarity that is not recognized by one method is often easily identified by the other method [29–34].
- The PubChem 2-D similarity method uses the 881-bit-long PubChem binary fingerprints [35], in conjunction with the Tanimoto coefficient [36–38]:
where and are the counts of bits set in the fingerprints for molecules A and B, respectively, and is the count of bits set in common. The Tanimoto coefficient ranges from 0 (for no similarity) to 1 (for identical molecules).(1) -
The PubChem 3-D similarity method is based on the Gaussian-shape overlay method by Grant and coworkers [39–42], implemented in the Rapid Overlay of Chemical Structures (ROCS) [43,44]. In ROCS, two different aspects of molecular similarity are considered: steric shape similarity and feature similarity. The steric shape similarity is evaluated using the shape-Tanimoto (ST) [39,40,43–45], which is given as the following equation:
where and are the self-overlap volumes of molecules A and B, respectively, and is the overlap volume between A and B. The feature similarity considers the similarity in the 3-D orientation of protein-binding “features” of six different types (i.e., hydrogen bond donors and acceptors, cations, anions, hydrophobes, and rings), which are represented by “fictitious” feature atoms (also called “color” atoms). The feature similarity is quantified using the color-Tanimoto (CT) [44,45]:(2)
where the index indicates any of the six feature atom types, and and are the self-overlap volumes of molecules A and B for feature atom type , respectively, and is the overlap volume between molecules A and B for feature atom type . The steric shape similarity and feature similarity can be considered simultaneously using the Combo Tanimoto, which is defined as an arithmetic sum of ST and CT.(3)
Because both the and scores range from 0 to 1, the score can have a value from 0 to 2 (without normalization).(4) Computation of the 3-D similarity metrics between two conformers involves finding their best overlap, which can be done in two different ways: (1) shape-optimization (ST-optimization), which finds the conformer superposition that maximizes the ST score, and (2) feature-optimization (CT-optimization), in which both the shape and feature are considered simultaneously to find the best superposition. As a result, in PubChem, 3-D molecular similarity can be evaluated using six different measures: , , and scores for each of the two superposition methods.
-
Evaluation of 3-D similarity between molecules requires 3-D structures of molecules. PubChem generates a conformer ensemble that contains up to 500 conformers per compound if the compound satisfies the following conditions [31,46,47]:
- It should be not too big or too flexible (with ≤ 50 non-hydrogen atoms and ≤ 15 rotatable bonds).
- It should have only a single covalent unit (i.e., not a salt or a mixture).
- It should consist of only supported elements (H, C, N, O, F, Si, P, S, Cl, Br, and I).
- It should contain only atom types recognized by the MMFF94s force field.
- It should have fewer than six undefined atom or bond stereo centers.
About 90% of compounds in PubChem satisfy all five conditions and have computationally generated conformer ensembles [31,46]. These conformer models are designed to predict “bioactive” conformers (i.e., protein-bound structures, often determined through X-ray crystallography). The procedure used for the conformer generation ensures that 90% of the conformer models have at least one “bioactive” conformer whose (non-hydrogen atom pair-wise) RMSD from the experimentally determined conformation is closer than the upper-limit value () predicted using an empirically derived equation [31,46]:(5)
where , , , and are the numbers of non-hydrogen atoms, effective rotors, rotatable bonds, and non-aromatic ring atoms in the molecule, respectively. takes into account molecular flexibility due to rotatable bonds and ring flexibility simultaneously [31,46,48]. While conformer models generated by PubChem contains up to 500 conformers per compound, most of PubChem tools and services that exploit 3-D similarity use only up to ten diverse conformers per compound, where a diversity selection procedure is used to represent the conformer ensemble with a minimal number of conformers [31,46,47].(6) -
The “Similar Compounds” and “Similar Conformers” links under the “Related Compounds” section on the Compound Summary page of a given CID provides immediate access to “pre-computed” lists of compounds that are similar to that CID in terms of PubChem 2-D and 3-D similarities, respectively (therefore, also known as 2-D and 3-D neighbors, respectively).
Two compounds are defined as 2-D neighbors of each other if the Tanimoto coefficient between them is ≥ 0.9 (see Note 2 for evaluation of 2-D similarity in PubChem). If any conformer pair from two molecules gives a ST score of ≥ 0.8 and a CT score of ≥ 0.5 at their “ST-optimized” overlap, the two molecules are defined as 3-D neighbors of each other. Because 3-D similarity evaluation requires 3-D molecular structures, only compounds with 3-D conformer models are considered for 3-D neighbor computation. Currently, 3-D neighboring uses up to nine diverse conformers per compound. More detailed discussion about PubChem 2-D and 3-D neighborings is given elsewhere [29–31].
To download a set of records in the BioAssay database, the Assay Download Service should be used, which is available at https://pubchem.ncbi.nlm.nih.gov/assay/assaydownload.cgi.
-
For the download of a very large amount of data, it is highly recommended to use the PubChem File Transfer Protocol (FTP) site (ftp://ftp.ncbi.nlm.nih.gov/pubchem/).
Resource Description Framework (RDF)-formatted PubChem data (also known as PubChemRDF [49]) are also available at the PubChem FTP site for users who want to exploit PubChem data with Semantic Web technologies on local computing resources.
Strictly speaking, SMILES [18–20] and InChI [21,22] are not file formats but line notations that represent chemical structures. However, they are often described as file formats in many applications, meaning that SMILES or InChI strings are stored in a text file.
It is recommended that the results for search against PubChem databases are downloaded to a local machine, as PubChem data are updated daily. Importantly, the number of records in PubChem can change due to new data submission to PubChem. Data contributors can revoke their existing substance and assay information submitted to PubChem. However, as an archive, PubChem does not remove the revoked substances, but makes them non-live, meaning that they are not searchable (although they continue to exist in the database). A compound record becomes non-live when it does not have an associated live substance record. While non-live PubChem records are not searchable, they can still be accessed via the Uniform Resource Locator (URL) to their Summary or Record pages, which contains their identifiers (CIDs, SIDs, or AIDs).
The word “genesymbol” enclosed by brackets is one of many Entrez indices, which allows one to search the database for a particular type of information (see Section 2.1). In this case, the “genesymbol” Entrez index allows one to search for assays that targeted the gene represented by gene symbol or the proteins encoded by that gene. In addition to gene symbols, protein target names and global identifiers (GIs) can also be used to search the BioAssay database (see Table 1).
In PubChem, there is no limit on the maximum number of targets for an assay, and some assays have hundreds of targets, which often makes it difficult to retrieve only assays tested against a desired target. (For example, the targets of AID 1433 include PDGFRA, PDGFRB, VEGFR2, and 284 other protein targets.) The Entrez Index “[TargetCount]” allows one to limit the search to assays tested against a given number of targets. For example, the search term “1[TargetCount]” restricts the search to single-target assays.
The compounds tested in an assay archived in PubChem may be declared by its assay data provider as probe, active, inactive, or inconclusive. The depositor may choose not to provide this activity outcome information, because it is not required for assay data submission to PubChem. Therefore, compounds tested in an assay archived in PubChem may be classified into five groups: probe, active, inactive, inconclusive, and unspecified. They may be further reduced into four groups because probes are a very small subset of active compounds.
It should be emphasized that these activity outcomes are determined by individual assay data depositors, not by PubChem. Because there are no standard criteria for the activity outcome determination across all assays, each depositor adopts different criteria that satisfy their own needs. Thus, inactive compounds in one assay could have been declared to be active if different activity outcome criteria were employed. The example presented in this chapter takes this heterogeneity into account by re-defining active compounds as “any compounds that are declared to be active by a depositor or that have an activity concentration at/below 1 μM. The active compounds based on this new definition correspond to set in Table 5, which is the union of sets and . Accordingly, the inactive compounds are also re-defined as any compounds that do not belong to (that is, ).
The “Find Related Data” drop-down menu allows one to retrieve records related to those presented on the DocSum page, through Entrez links (see Section 2.1). However, this tool is designed for quick retrieval of a small amount of data. If the data retrieval takes too long due to too many records being returned, the results will be truncated. If a large number of records are expected to be returned, the FLink tool should be used in a similar way described in Section 3.1.
The “Similar Compounds” (2-D neighbors) and “Similar Conformers” (3-D neighbors) are pre-computed with a set of pre-determined options (e.g., for the similarity threshold value or the number of conformers considered for 3-D neighboring). PubChem provides the Chemical Structure Search tool (https://pubchem.ncbi.nlm.nih.gov/search/search.cgi), which allows one to perform a flexible search with adjustable options. There is also another structure search tool called PubChem Search (https://pubchem.ncbi.nlm.nih.gov/search), which was released as a beta (test) version that exploits newer technology.
Acknowledgements
This work was supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health, U.S. Department of Health and Human Services. We would like to thank Douglas Joubert, NIH Library Editing Service, for reviewing the manuscript.
References
- 1.Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem Substance and Compound databases. Nucleic Acids Res 44 (D1):D1202–D1213. doi: 10.1093/nar/gkv951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang YL, Suzek T, Zhang J, Wang JY, He SQ, Cheng TJ, Shoemaker BA, Gindulyte A, Bryant SH (2014) PubChem BioAssay: 2014 update. Nucleic Acids Res 42 (D1):D1075–D1082. doi: 10.1093/nar/gkt978 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kim S (2016) Getting the most out of PubChem for virtual screening. Expert Opin Drug Discov 11 (9):843–855. doi: 10.1080/17460441.2016.1216967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Johnson MA, Maggiora GM (eds) (1990) Concepts and Applications of Molecular Similarity. John Wiley & Sons, Inc., New York, NY [Google Scholar]
- 5.Pietras K, Sjoblom T, Rubin K, Heldin CH, Ostman A (2003) PDGF receptors as cancer drug targets. Cancer Cell 3 (5):439–443. doi: 10.1016/s1535-6108(03)00089-8 [DOI] [PubMed] [Google Scholar]
- 6.Board R, Jayson GC (2005) Platelet-derived growth factor receptor (PDGFR): A target for anticancer therapeutics. Drug Resist Update 8 (1–2):75–83. doi: 10.1016/j.drup.2005.03.004 [DOI] [PubMed] [Google Scholar]
- 7.Traxler P (2003) Tyrosine kinases as targets in cancer therapy - successes and failures. Expert Opin Ther Targets 7 (2):215–234. doi: 10.1517/14728222.7.2.215 [DOI] [PubMed] [Google Scholar]
- 8.Roskoski R (2007) Sunitinib: A VEGF and PDGF receptor protein kinase and angiogenesis inhibitor. Biochem Biophys Res Commun 356 (2):323–328. doi: 10.1016/j.bbre.2007.02.156 [DOI] [PubMed] [Google Scholar]
- 9.Ellis LM, Hicklin DJ (2008) VEGF-targeted therapy: mechanisms of anti-tumour activity. Nat Rev Cancer 8 (8):579–591. doi: 10.1038/nrc2403 [DOI] [PubMed] [Google Scholar]
- 10.Takahashi S (2011) Vascular Endothelial Growth Factor (VEGF), VEGF Receptors and Their Inhibitors for Antiangiogenic Tumor Therapy. Biol Pharm Bull 34 (12):1785–1788 [DOI] [PubMed] [Google Scholar]
- 11.Kim S, Thiessen PA, Bolton EE, Bryant SH (2015) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic Acids Res 43 (W1):W605–W611. doi: 10.1093/nar/gkv396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schuler GD, Epstein JA, Ohkawa H, Kans JA (1996) Entrez: Molecular biology database and retrieval system. Methods Enzymol 266:141–162. doi: 10.1016/S0076-6879(96)66012-1 [DOI] [PubMed] [Google Scholar]
- 13.McEntyre J (1998) Linking up with Entrez. Trends in genetics : TIG 14 (1):39–40. doi: 10.1016/s0168-9525(97)01325-5 [DOI] [PubMed] [Google Scholar]
- 14.Help Entrez. (2005-) National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/NBK3836/.
- 15.Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, Bourexis D, Brister JR, Bryant SH, Lanese K, Charowhas C, Clark K, DiCuccio M, Dondoshansky I, Federhen S, Feolo M, Funk K, Geer LY, Gorelenkov V, Hoeppner M, Holmes B, Johnson M, Khotomlianski V, Kimchi A, Kimelman M, Kitts P, Klimke W, Krasnov S, Kuznetsov A, Landrum MJ, Landsman D, Lee JM, Lipman DJ, Lu ZY, Madden TL, Madcj T, Marchler-Bauer A, Karsch-Mizrachi I, Murphy T, Orris R, Ostell J, O’Sullivan C, Panchenko A, Phan L, Preuss D, Pruitt KD, Rodarmer K, Rubinstein W, Sayers EW, Schneider V, Schuler GD, Sherry ST, Sirotkin K, Siyan K, Slotta D, Soboleva A, Soussov V, Starchenko G, Tatusova TA, Todorov K, Trawick BW, Vakatov D, Wang YL, Ward M, Wilbur WJ, Yaschenko E, Zbicz K, Coordinators NR (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44 (D1):D7–D19. doi: 10.1093/nar/gkv1290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.FLink: Frequency weighted links. (2010) National Center for Biotechnology Information, National Library of Medicine. https://www.ncbi.nlm.nih.gov/Structure/flink/flink.cgi. [Google Scholar]
- 17.Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) DESCRIPTION OF SEVERAL CHEMICAL-STRUCTURE FILE FORMATS USED BY COMPUTER-PROGRAMS DEVELOPED AT MOLECULAR DESIGN LIMITED. J Chem Inf Comput Sci 32 (3):244–255. doi: 10.1021/ci00007a012 [DOI] [Google Scholar]
- 18.Weininger D (1988) SMILES, A CHEMICAL LANGUAGE AND INFORMATION-SYSTEM .1. INTRODUCTION TO METHODOLOGY AND ENCODING RULES. J Chem Inf Comput Sci 28 (1):31–36. doi: 10.1021/ci00057a005 [DOI] [Google Scholar]
- 19.Weininger D, Weininger A, Weininger JL (1989) SMILES .2. ALGORITHM FOR GENERATION OF UNIQUE SMILES NOTATION. J Chem Inf Comput Sci 29 (2):97–101. doi: 10.1021/ci00062a008 [DOI] [Google Scholar]
- 20.Weininger D (1990) SMILES .3. DEPICT - GRAPHICAL DEPICTION OF CHEMICAL STRUCTURES. J Chem Inf Comput Sci 30 (3):237–243. doi: 10.1021/ci00067a005 [DOI] [Google Scholar]
- 21.Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I (2013) InChI - the worldwide chemical structure identifier standard. J Cheminform 5:7. doi: 10.1186/1758-2946-5-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Heller S, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC International Chemical Identifier. J Cheminform 7:23. doi: 10.1186/s13321-015-0068-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bender A, Glen RC (2004) Molecular similarity: a key technique in molecular informatics. Org Biomol Chem 2 (22):3204–3218. doi: 10.1039/b409813g [DOI] [PubMed] [Google Scholar]
- 24.Maldonado AG, Doucet JP, Petitjean M, Fan BT (2006) Molecular similarity and diversity in chemoinformatics: From theory to applications. Mol Divers 10 (1):39–79. doi: 10.1007/s11030-006-8697-1 [DOI] [PubMed] [Google Scholar]
- 25.Eckert H, Bojorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12 (5–6):225–233. doi: 10.1016/j.drudis.2007.01.011 [DOI] [PubMed] [Google Scholar]
- 26.Willett P (2014) The Calculation of Molecular Structural Similarity: Principles and Practice. Mol Inf 33 (6–7):403–413. doi: 10.1002/minf.201400024 [DOI] [PubMed] [Google Scholar]
- 27.Koutsoukas A, Paricharak S, Galloway W, Spring DR, Ijzerman AP, Glen RC, Marcus D, Bender A (2014) How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space. J Chem Inf Model 54 (1):230–242. doi: 10.1021/ci400469u [DOI] [PubMed] [Google Scholar]
- 28.Sheridan RP, Kearsley SK (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7 (17):903–911. doi: 10.1016/s1359-6446(02)02411-x [DOI] [PubMed] [Google Scholar]
- 29.Kim S, Bolton EE, Bryant SH (2016) Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets. J Cheminform 8:62. doi: 10.1186/s13321-016-0163-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bolton EE, Kim S, Bryant SH (2011) PubChem3D: Similar conformers. J Cheminform 3:13. doi: 10.1186/1758-2946-3-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bolton EE, Chen J, Kim S, Han LY, He SQ, Shi WY, Simonyan V, Sun Y, Thiessen PA, Wang JY, Yu B, Zhang J, Bryant SH (2011) PubChem3D: a new resource for scientists. J Cheminform 3:32. doi: 10.1186/1758-2946-3-32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kim S, Bolton EE, Bryant SH (2011) PubChem3D: Biologically relevant 3-D similarity. J Cheminform 3:26. doi: 10.1186/1758-2946-3-26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kim S, Bolton EE, Bryant SH (2012) Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis. J Cheminform 4:28. doi: 10.1186/1758-2946-4-28 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kim S, Han LY, Yu B, Hahnke VD, Bolton EE, Bryant SH (2015) PubChem structure-activity relationship (SAR) clusters. J Cheminform 7:33. doi: 10.1186/s13321-015-0070-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.PubChem substructure fingerprint description. ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf.
- 36.Chen X, Reynolds CH (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci 42 (6):1407–1414. doi: 10.1021/ci025531g [DOI] [PubMed] [Google Scholar]
- 37.Holliday JD, Salim N, Whittle M, Willett P (2003) Analysis and display of the size dependence of chemical similarity coefficients. J Chem Inf Comput Sci 43 (3):819–828. doi: 10.1021/ci034001x [DOI] [PubMed] [Google Scholar]
- 38.Holliday JD, Hu CY, Willett P (2002) Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Comb Chem High Throughput Screen 5 (2):155–166 [DOI] [PubMed] [Google Scholar]
- 39.Grant JA, Pickup BT (1995) A Gaussian description of molecular shape. J Phys Chem 99 (11):3503–3510 [Google Scholar]
- 40.Grant JA, Gallardo MA, Pickup BT (1996) A fast method of molecular shape comparison: a simple application of a Gaussian description of molecular shape. J Comput Chem 17 (14):1653–1666 [Google Scholar]
- 41.Grant JA, Pickup BT (1996) A Gaussian description of molecular shape (vol 99, pg 3505, 1995). J Phys Chem 100 (6):2456–2456 [Google Scholar]
- 42.Grant JA, Pickup BT (1997) Gaussian shape methods. In: van Gunsteren WF, Weiner PK, Wilkinson AJ (eds) Computer Simulation of Biomolecular Systems. Kluwer Academic Publishers, Dordrecht, pp 150–176. [Google Scholar]
- 43.Rush TS, Grant JA, Mosyak L, Nicholls A (2005) A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. J Med Chem 48 (5):1489–1495. doi: 10.1021/jm040163o [DOI] [PubMed] [Google Scholar]
- 44.ROCS - Rapid Overlay of Chemical Structures (2010). 3.1.0 edn. OpenEye Scientific Software, Inc., Santa Fe, NM [Google Scholar]
- 45.ShapeTK - C++ (2010). 1.8.0 edn. OpenEye Scientific Software, Inc., Santa Fe, NM [Google Scholar]
- 46.Bolton EE, Kim S, Bryant SH (2011) PubChem3D: Conformer generation. J Cheminform 3:4. doi: 10.1186/1758-2946-3-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kim S, Bolton EE, Bryant SH (2013) PubChem3D: conformer ensemble accuracy. J Cheminform 5:1. doi: 10.1186/1758-2946-5-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Borodina YV, Bolton E, Fontaine F, Bryant SH (2007) Assessment of conformational ensemble sizes necessary for specific resolutions of coverage of conformational space. J Chem Inf Model 47 (4):1428–1437. doi: 10.1021/ci7000956 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Fu G, Batchelor C, Dumontier M, Hastings J, Willighagen E, Bolton E (2015) PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform 7:34. doi: 10.1186/s13321-015-0084-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
