Abstract
The US Food and Drug Administration (FDA) and the National Center for Advancing Translational Sciences (NCATS) have collaborated to publish rigorous scientific descriptions of substances relevant to regulated products. The FDA has adopted the global ISO 11238 data standard for the identification of substances in medicinal products and has populated a database to organize the agency's regulatory submissions and marketed products data. NCATS has worked with FDA to develop the Global Substance Registration System (GSRS) and produce a non-proprietary version of the database for public benefit. In 2019, more than half of all new drugs in clinical development were proteins, nucleic acid therapeutics, polymer products, structurally diverse natural products or cellular therapies. While multiple databases of small molecule chemical structures are available, this resource is unique in its application of regulatory standards for the identification of medicinal substances and its robust support for other substances in addition to small molecules. This public, manually curated dataset provides unique ingredient identifiers (UNIIs) and detailed descriptions for over 100 000 substances that are particularly relevant to medicine and translational research. The dataset can be accessed and queried at https://gsrs.ncats.nih.gov/app/substances.
INTRODUCTION
The globalization of the pharmaceutical industry has made global data standardization essential for promoting the safety, availability and quality of health products throughout the world (1). With increasing globalization, the supply chain spans many countries, raising issues about international standards and safety in regulated products and for their component substances. As fewer drugs are sourced entirely within one jurisdiction, cooperation between international regulatory bodies becomes critical. Interoperability and standardization of data based on international standards can remove the greatest barriers to such international coordination (1).
The ISO 11238 standard provides a stable structure and set of data elements for defining substances in a consistent, scientifically useful manner (2). The United States Food and Drug Administration (FDA) has adopted this standard to enhance the regulatory review of active and inactive substances in submissions and facilitate understanding of the relationships to other substances and products from a quality, safety and drug utilization perspective (3).
The complexity and enormous variety of health products currently marketed poses a significant challenge to systematic identification, yet it is vitally important for the sake of public safety. Effective regulation depends on the ability to answer complex queries that require leveraging data from multiple sources in many formats. As such, the need was for a system capable of representing substance data with rigorous definitions, supporting multiple scientific domains, such as nuclear chemistry, herbal plant varieties, autologous genetically transformed cell therapies, and medical air (4).
The ISO 11238 standard identifies key requirements for a global registration system: (i) the collection of defining properties for substances to enable their unambiguous definition, (ii) the creation of unique substance identifiers to reliably identify and trace the use of medicinal products and the materials within medicinal products and (iii) the centralized generation of unique identifiers and deposition of substance facts to both facilitate sponsor interactions with multiple regulators and harmonization amongst regulating agencies. Inherent in the ISO 11238 standard is the acknowledgement that existing systematic standards are often too rigid to accommodate all of the substances found in commerce. Market forces constantly drive innovation, pushing the boundaries of science itself and the creation of entirely new classes of products in ways that are difficult to anticipate. The standard is therefore accommodating of new and unusual materials in ways that have traditionally challenged other standards.
The Global Substance Registration System (GSRS) addresses these needs to uniquely identify, register and store substance-related information, consistent with the ISO 11238 standard. GSRS provides a system for the definition and identification of substances within medicinal products or substances used for medicinal purposes, including dietary supplements, foods and cosmetics and their official names across different languages, jurisdictions and domains. The system also captures relationships between substances and references all captured data to a definitive source of information. GSRS references existing nomenclatures but coins terms when necessary for nomenclature consistency. All the software developed is intended to be freely distributable to academic, government and commercial entities. The public reference database of substances is provided at https://gsrs.ncats.nih.gov/app/substances.
MATERIALS AND METHODS
The information system is designed around the 6 types of substances referenced in the ISO 11238 standard: chemicals, nucleic acids, proteins, polymers, structurally diverse, mixtures. Further details of the substance data model and software architecture are provided in Supplementary Data. Each of these substance types, and all relevant data fields present in the ISO standard and its technical implementation guide (5) are included within the system's data model, including official names, common names, brand names, systematic names, company codes and other identifiers such as registry numbers. All names and identifiers are provided with references. References (including links to external sites) are used in the ISO 11238 data model to document evidence for specific aspects of a substance definition and associated data. The authors do not intend GSRS to be a comprehensive cross-referencing index of substance websites. Moreover, inclusion of a reference to an external site does not imply that the substance definition found at that external site is fully consistent with the ISO 11238 definition provided within GSRS. GSRS also has the capability to capture an extensive number of relationships between substances. The latest public data release was on 7 July 2020 (v2.5.1–20200707) and consists of 116 636 substance definitions.
The FDA supports health information technology initiatives by generating unique ingredient identifiers (UNIIs) for substances in drugs, biologics, foods, cosmetics, dietary supplements, tobacco products, and devices. The UNII is a non-proprietary, free to use, unique, unambiguous, non-semantic, alphanumeric identifier based on a substance's defining properties from the ISO 11238 data model. The UNII is permanently associated with a given substance definition and when corrections are made, the UNII remains the same. GSRS is the software that generates UNIIs used in FDA electronic listing as seen in DailyMed website at https://dailymed.nlm.nih.gov/dailymed/. It is also used for other regulatory activities throughout product life cycles, encompassing clinical trial phases, product marketing and post-market surveillance. New UNII requests, data issues, or questions can be addressed by contacting FDA-SRS@fda.hhs.gov.
RESULTS
GSRS public substance dataset
Included in the 7 July 2020 public release are 116 636 substance definitions and accompanying data on nomenclature, properties and relationships between substances (Figure 1). GSRS includes detailed examples of an enormous range of drug substances encountered in drug discovery and development including chemicals, proteins, nucleic acids, polymers, mixtures and structurally diverse substances. The system captures many substance identifiers—names, codes and structural keys to facilitate substance identification. However, none of these on their own are sufficient for regulatory use both because redundant identifiers often exist for the same substance and also because some identifiers are ambiguous and do not differentiate between related substances. We further explore some of the challenges of providing a unique and unambiguous identifier for substances below.
Figure 1.
Overview of information provided in GSRS v2.5.1–20200707 dataset. (A) Total number of ingredient entries. (B) Number of ingredient records provided by substance class. (C) Commonly used public substance information sources referenced by the database. Only a partial list of information sources is provided. (D) For chemical substances in particular, a breakdown of type of stereochemical annotations is provided. The ‘Mixed’ type denotes where more complicated annotations are provided. ‘Unknown’ type denotes where drug substances have chiral specificity, for example demonstrate rotation of light, but absolute stereochemistry has never been assigned experimentally. Abbreviations provided in Supplementary Data.
Each substance class uses a different definitional data model, which reflects how the substances are often produced as well as the common types of substance heterogeneity that are encountered. For example, small molecule chemical definitions include a stereochemistry status field because many are marketed as chiral mixtures. Heterogeneity in protein samples typically arises from variations in glycosylation and other chemical modifications made after protein synthesis. Structurally diverse materials are inherently heterogenous preparations from natural materials. Common sources of variability that can be defining for these substances include the part of an organism from which is was prepared (leaves, roots, etc.) and even the time of harvest.
Trends in product materials
During clinical development, drug sponsors request nonproprietary names for active ingredients from the United States Adopted Name (USAN) and/or International Nonproprietary Name (INN) committees, often disclosing development candidates for the first time in the process of doing so. These public requests are typically made midway through clinical development, and the set of proposed names from a given year provides a useful snapshot of the types of products currently in development. As seen in Figure 2, medicines have historically been dominated by synthetic organic small molecules, but in recent times chemical substances represent a minority of all of the therapeutics in clinical development. Robust support for the registration of non-small molecules is therefore required to support regulatory needs.
Figure 2.
Trends in time for product materials: analysis of INN proposed lists categorized by substance class. In 2019, more than half of all new drugs in clinical development were substances other than small molecule chemicals.
Challenging substances
Antibody-drug-conjugates
Analyzing these substance trends by comparing high-level substances classes obscures some important additional therapeutic innovations such as antibody-drug-conjugates which also must be supported by GSRS. Brentuximab vedotin is a cancer drug that delivers the toxin monomethyl auristatin E to the cancer cell upon internalization of the antibody by binding to CD-30, which is also known as tumor necrosis factor receptor superfamily member 8. This antibody-drug-conjugate, UNII: 7XL5ISS668, is registered as a protein with structural modifications. In addition to the full protein sequence for its four subunits, details of disulfide links, glycosylation and two structural modifications are provided. The first modification indicates the tendency of N-terminal glutamic acids to form the lactam pidolic acid, which is commonly seen in many proteins. The second indicates the partial conjugation of a toxin-linker moiety to available cysteines on the protein. Instead of specifying the reactants, the substance definition registers the replacement of protein cysteines with the product of the reaction, whose full details are provided in UNII: 6603L01WUR (Figure 3).
Figure 3.
Chemical structure of the vedotin conjugate UNII: 6603L01WUR employed in definition of brentuximab vedotin, UNII: 7XL5ISS668. This moiety replaces defined cysteine residues in the protein sequence. It is a cysteine derivative with a maleimide-caproic acid attachment group, cathepsin cleavable linker (valine-citrulline), and para-aminobenzylcarbamate spacer attached to the toxin monomethyl auristatin E. Atoms and bonds are depicted as stated in Annex B of the ISO 11238 standard.
Vaccines
One notable product class excluded from the INN list analysis in Figure 2 but which are supported by the software are vaccines. Live vaccines are registered as structurally diverse substances. One example is a recently developed ebola virus vaccine candidate (6). This vaccine is registered as UNII: Y9VG7O3KTT. The vaccine substance is a live, attenuated, genetically-modified vesicular stomatitis Indiana virus (rVSV), engineered to express Zaire ebolavirus strain Kikwit-95 envelope glycoprotein (ZEBOV-GP). Multiple copies of glycoprotein are expressed and assembled into the viral envelope responsible for inducing protective immunity. The chimeric virus vaccine is attenuated by deletion of the principal virulence factor of VSV (the G protein), which also removes the primary target for anti-vector immunity. This is described within the substance definition by including the structural modifications provided in Table 1.
Table 1.
Structural modifications defining rVSVΔG-ZEBOV-GP (UNII: Y9VG7O3KTT), an ebola vaccine
Modification Type | Extent | Modification Name | UNII |
---|---|---|---|
Gene expression vector1 | Vesicular stomatitis Indiana virus | KTI7RPW4I0 | |
Gene deletion | Complete | Vesicular stomatitis Indiana virus vsivgp4 glycoprotein (g protein) precursor | E6TJ0Z0ZE8 |
Gene fragment replacement2 | Complete | ebola virus/h. sapiens-tc/cod/1995/Kikwit-9510622 envelope glycoprotein gene (viral negative strand) | P7ZRG1LJ3A |
Vector expressed protein3 | Complete | Zaire ebolavirus strain Kikwit-95 envelope glycoprotein | XH5V2SQ5FI |
(1) Parent organism: vesicular stomatitis Indiana virus (UNII: KTI7RPW410). (2) Reference: GenBank: KU182909.1 Ebola virus isolate Ebola virus/H. sapiens-tc/COD/1995/Kikwit-9510622, complete genome. (3) Virus particle incorporation (23) of ebolavirus glycoprotein (UNII: XH5V2SQ5FI) occurs via expression of the gene fragment replacement (UNII: P7ZRG1LJ3A) and further cellular processing including a number of complex idiosyncratic steps (24). The vector expressed protein registered reflects the final heteromultimeric protein product.
Other cases
The database also contains records for lesser defined substances using the concept class as well as some further specified substances which are defined in the ISO 11238 standard as group 1 specified substances. An example of a substance concept is fish, unspecified (UNII: 1PIO77PW2X). Such concept terms do not sufficiently define the ingredient used for regulatory purposes as producers must specify the type of fish actually used in a product, but the term may be useful for describing a class effect. For example, ‘fish, unspecified’ is used to refer to fish allergies. On the other hand, it can be equally important to further define aspects of a substance used in a product, using a group 1 specified substance definition. One recently published substance is air polymer type A from ExEm® Foam (UNII: WLT3PF2KX0) which is indicated for sonohysterosalpingography to assess fallopian tube patency in women with known or suspected infertility (7). This UNII refers to the foam ingredient created by a specific combination of water, air, glycerin, and hydroxyethyl cellulose (5500 MPA.S AT 2%). Other unique cases of substances can be accommodated within the existing substance model as in the case for the atropisomer BMS-986142 (UNII: PJX9GH268R) (8).
In addition to substance definitions, the database provides many relationships between substances that provide additional biological and manufacturing context. For example, the record for neratinib (UNII: JJH94R3PWB) contains 37 relationships with other substances reported in the product New Drug Application including its salt forms, links to a variety of tyrosine kinases it is known to target, cytochrome P450s that it interacts with and transporters.
Uniqueness and ambiguity of identifiers
Substance names
The literature usually refers to a given substance by its name, but names are not always unique and unambiguous. On average, each substance record includes 6 synonyms – including systematic, common, official and code names. Capsella bursa-pastoris L. (UNII: W0X9457M59) which is one of the most common weeds in the world (9) and has been the subject of clinical study (10) has an astounding 240 different names included in its record.
The dataset also contains over 1000 examples where the same name can refer to two different substances. For these cases of homographs, one needs additional information or context of use to distinguish which substance the name refers to. Four representative examples are given in Table 2. The first example, alpha-tocopherol, reflects ambivalence by naming authorities in distinguishing between the all R isomer of alpha-tocopherol purified from natural sources and the industrially-produced R,S mixture often used as a vitamin supplement in foodstuffs. Both substances have separate existences and are captured in the database, along with their absolute stereochemistry, and this shared synonym asserted by different references or sources. The second example reflects differences in naming conventions within the United States and outside of it, where the name must include the explicit hydration state or excludes the hydration state from the name of the predominant form currently marketed. In a similar vein, ‘scientific’ names are often appropriated as shorthand (third example) to refer to a specific part or useful component from a whole organism. Finally, we see the not infrequent occurrence of a word in common use having multiple, distinct meanings. Names, unfortunately, are inadequate for the purpose of uniquely identifying ingredients.
Table 2.
Representative examples of homographs from the public dataset
Homograph | UNII | Description | UNII | Description | Case |
---|---|---|---|---|---|
alpha-tocopherol | H4N855PNZ1 | Synthetic vitamin E | N9PR3490H9 | Natural extract vitamin E | Stereochemical ambiguity |
azithromycin | 5FD1131I7S | Azithromycin (trihydrate) | J2KLZ20U1M | Azithromycin (anhydrous) | Implicit versus explicit hydration |
lobelia | 7QFT17RLRG | Indian tobacco leaf | 9PP1T3TC5U | lobelia inflata L. plant | Whole versus part |
lime | C7X2M0VVNH | Lime (calcium oxide) | 8CZS546954 | Lime (citrus) | Language ambiguity |
Structure-based identifiers
Structural identifiers and keys are also often not unique and unambiguous identifiers of therapeutic ingredients. Of the 73 122 chemicals included in the database, 5617 have two or more IUPAC International Chemical Identifier (InChI) keys (11) referring to the same substance and 1244 InChI keys are shared by two or more substances. For example, different tautomer forms of the same compound can produce different InChI key values. In addition, some problematic substances such as a chiral substance of unknown absolute chirality can appear to have the same InChI as the racemic mixture, however such cases are really out of scope for the InChI approach. Chemical Abstracts Services (CAS) (12) and other registry numbers are also widely used to index and inventory chemicals. In the GSRS database release, 4296 substances have reference to more than one registry number. Carnitine chloride (UNII: F64264D63N) has ten. And 503 registry numbers point to multiple substances. This most often occurs when more specific substances refer to a more general concept registry number such as 100403-19-8 which can refer to any of seven different ceramides or the generic registry number 25322-68-3 for polyethylene glycol which is linked to 52 different substances of specific chain length and polydispersity.
DISCUSSION
The goal of this resource is to benefit public health, translational research and facilitate the transfer of regulatory information into the public domain and provide industry with the means to both obtain a global identifier and deliver information related to substance identification.
GSRS supports the registration of new substances by regulators, providing easy access to existing substance information and a framework to validate information integrity and systematize regulators’ expert opinion on what defines a new substance. The system links information and identifiers from different domains and jurisdictions together into a single database. The system also incorporates other data elements of the Identification of Medical Products ISO standard along with biological, chemical and physical data relevant to drug safety, quality and development.
This database is unique in its semantic approach to defining product ingredients and its support for the enormous range of substances encountered in medical products. Regulating the sale of food and medicines and reviewing their health claims requires an integrated review infrastructure, where product ingredients are cross indexed across applications and information on the safety and efficacy of related substances is easily retrievable.
Historically, FDA has approached this challenge through the development of several different databases and products. The Drug Registration and Listing System (13) was one of the first systems developed to support a specific aspect of manufacturer listing with the agency. Subsequently, the ‘Orange Book’ (14) and Inactive Ingredient Guide (15) were published, all of which focused on organizing agency information by ingredient name. This was followed by substantial efforts to develop cheminformatics capabilities through the initial development of a substance registration system (16) and eventually expanding in scope to provide a framework proposal for the further development of the ISO 11238 standard and to meet the needs of the Structured Product Labeling (17) standards. The labeling standard requires GSRS-generated UNIIs as the primary identifier for ingredients in medical products and includes UNIIs into the product labels of all marketed products regulated by the agency. Adoption of GSRS by other agencies will help to improve international harmonization, pharmacovigilance efforts and understanding of global supply chains by enabling data exchange based on a common standard for product ingredients.
Inherent in the ISO 11238 standard is the acknowledgement that existing systematic standards for organizing substance information are incomplete and will continue to be so. Market forces constantly drive innovation, pushing the boundaries of science itself and the creation of entirely new classes of products. For example, the problem of chemical registration is one of the most mature areas of research and many systems exist, but still provide incomplete support for unusual stereochemistry (18–20), metal-organic structures (21), and metastable isotopes (22). The ISO 11238 standard addresses this problem both by the use of an accommodating data model and use of expert review to enforce consistent use of that data model. While it is desirable to determine uniqueness via automated computational methods, in practice a process built with expert review at its center is necessary to handle the scope and challenges of regulated products. Approaches and tools to address the challenge of discerning uniqueness and removing ambiguity in the systematic definition of substances can be incorporated into this software in the future.
The Global Substance Registration System provides a public, manually curated dataset of the ingredients in medicinal products and their scientific definitions for regulatory and translational research. GSRS is the first database to provide ingredient definitions using the global ISO 11238 data standard for the identification of substances in medicinal products. Especially important is its robust support for substances other than small molecules and its curation process. Use of this data and the UNII will help to improve international harmonization and pharmacovigilance efforts as well as support knowledge diffusion within the translational research community.
DATA AVAILABILITY
Software, public domain data and important documentation are available from: https://gsrs.ncats.nih.gov. Source code is available on GitHub at: https://github.com/ncats/gsrs-play. The software is provided under an Apache 2.0 license. The latest production release of the software is v2.5.1. The latest public data release was on 7 July 2020 (v2.5.1–20200707) and consists of 116 636 substance definitions. In accordance with FAIR data standards, the UNII is the globally unique and persistent identifier can be searched at https://gsrs.ncats.nih.gov/app/substances and https://fdasis.nlm.nih.gov/srs/srs.jsp and is also included within many other online repositories. Data objects are provided in JavaScript Object Notation (JSON) and substances, e.g. UNII: 5Y3NBK9IS7, can be retrieved through a request to, for example, https://gsrs.ncats.nih.gov/app/api/v1/substances(5Y3NBK9IS7). The process of extracting and transforming arbitrary exports can be resource intense and such functionality is not currently accessible from the public site. Local installation of the software allows users to download arbitrary sets of selected records in a variety of formats, including full JSON, TSV, SDF, etc. The database is accessible but not optimized for phone and tablet screens owing to the complexity of the data model and certain features such as chemical structure search; users may prefer to request the desktop version of the site from their mobile browser.
Supplementary Material
ACKNOWLEDGEMENTS
The current project was developed in close collaboration with multiple regulatory authorities to ensure a robust substance registration product that supports the needs of national and regional authorities. In particular, we gratefully acknowledge the early support of Bob Allkin, Christopher Austin, Thomas Balzer, Marcel Hoefnagel, Panagiotis Telonis, Vada Perkins, Mary-Ann Slack and Philipp Weyermann toward this initiative and all of their feedback.
Contributor Information
Tyler Peryea, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA; Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Noel Southall, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Mitch Miller, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Daniel Katzel, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Niko Anderson, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Jorge Neyra, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Sarah Stemann, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Ðắc-Trung Nguyễn, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Dammika Amugoda, Informatics, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA.
Archana Newatia, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.
Ramez Ghazzaoui, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.
Elaine Johanson, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.
Herman Diederik, College ter Beoordeling van Geneesmiddelen, 3531 AH Utrecht, Netherlands.
Larry Callahan, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.
Frank Switzer, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Intramural research program of the National Center for Advancing Translational Sciences' National Institutes of Health; Office of the Commissioner, US Food and Drug Administration. Funding for open access charge: National Institutes of Health.
Conflict of interest statement. None declared.
REFERENCES
- 1. Gronning N. Data management in a regulatory context. Front. Med. (Lausanne). 2017; 4:114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. International Standards Organization ISO 11238: 2018. Health informatics — Identification of medicinal products — Data elements and structures for the unique identification and exchange of regulated information on substances. 2020; (7 October 2020,date last accessed)https://www.iso.org/standard/69697.html. [Google Scholar]
- 3. United States Food and Drug Administration Substance Identification. 2020; (7 October 2020, date last accessed)https://www.fda.gov/industry/fda-resources-data-standards/substance-identification. [Google Scholar]
- 4. Edwards P., Therriault P.A., Katz I.. Onsite production of medical air: is purity a problem. Multidiscip. Respir. Med. 2018; 13:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. International Standards Organization ISO/TS 19844: 2018. Implementation guidelines for ISO 11238 for data elements and structures for the unique identification and exchange of regulated information on substances. 2020; (7 October 2020, date last accessed)https://www.iso.org/standard/71965.html. [Google Scholar]
- 6. Monath T.P., Fast P.E., Modjarrad K., Clarke D.K., Martin B.K., Fusco J., Nichols R., Heppner D.G., Simon J.K., Dubey S. et al.. rVSVDeltaG-ZEBOV-GP (also designated V920) recombinant vesicular stomatitis virus pseudotyped with Ebola Zaire Glycoprotein: Standardized template with key considerations for a risk/benefit assessment. Vaccine X. 2019; 1:100009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. US Food and Drug Administration EXEM FOAM (air polymer-type A) intrauterine foam product label. Available from https://www.accessdata.fda.gov/drugsatfda_docs/label/2019/212279lbl.pdf.
- 8. Beutner G., Carrasquillo R., Geng P., Hsiao Y., Huang E.C., Janey J., Katipally K., Kolotuchin S., La Porte T., Lee A. et al.. Adventures in Atropisomerism: Total Synthesis of a Complex Active Pharmaceutical Ingredient with Two Chirality Axes. Org Lett. 2018; 20:3736–3740. [DOI] [PubMed] [Google Scholar]
- 9. Cornille A., Salcedo A., Kryvokhyzha D., Glémin S., Holm K., Wright S.I., Lascoux M.. Genomic signature of successful colonization of Eurasia by the allopolyploid shepherd's purse (Capsella bursa-pastoris). Mol. Ecol. 2016; 25:616–629. [DOI] [PubMed] [Google Scholar]
- 10. Naafe M., Kariman N., Keshavarz Z., Khademi N., Mojab F., Mohammadbeigi A.. Effect of hydroalcoholic extracts of capsella bursa-pastoris on heavy menstrual bleeding: a randomized clinical trial. J. Altern. Complement. Med. 2018; 24:694–700. [DOI] [PubMed] [Google Scholar]
- 11. Heller S.R., McNaught A., Pletnev I., Stein S., Tchekhovskoi D.. InChI, the IUPAC international chemical identifier. J Cheminform. 2015; 7:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Dittmar P.G., Stobaugh R.E., Watson C.E.. The chemical abstracts service chemical registry system. I. General Design. J. Chem. Inf. Comput. Sci. 1976; 16:111–121. [Google Scholar]
- 13. Slavin M. The Food and Drug Administration drug registration and listing system. Drug Inf. J. 1975; 9:239–240. [PubMed] [Google Scholar]
- 14. Knoben J.E., Scott G.R., Tonelli R.J.. An overview of the FDA publication approved drug products with therapeutic equivalence evaluations. Am. J. Hosp. Pharm. 1990; 47:2696–2700. [PubMed] [Google Scholar]
- 15. Nema S., Washkuhn R.J, Brendel R.J.. Excipients and their use in injectable products. PDA J. Pharm. Sci. Technol. 1997; 51:166–171. [PubMed] [Google Scholar]
- 16. United States Food and Drug Administration Substance Registration System Standard Operating Procedure. 2007; (7 October 2020, date last accessed)https://www.fda.gov/media/75274/download. [Google Scholar]
- 17. Schadow G. HL7 Structured Product Labeling - electronic prescribing information for provider order entry decision support. AMIA Annu. Symp. Proc. 2005; 2005:1108. [PMC free article] [PubMed] [Google Scholar]
- 18. Canfield P.J., Blake IM., Cai Z-Li, Luck IJ., Krausz E., Kobayashi R., Reimers J.R., Crossley MJ.. A new fundamental type of conformational isomerism. Nat Chem. 2018; 10:615–624. [DOI] [PubMed] [Google Scholar]
- 19. Laplante S.R., Fader LD., Fandrick KR., Fandrick DR., Hucke O., Kemper R., Miller SP.F., Edwards PJ.. Assessing atropisomer axial chirality in drug discovery and development. J. Med. Chem. 2011; 54:7005–7022. [DOI] [PubMed] [Google Scholar]
- 20. Chandrasekhar J., Dick R., Veldhuizen J.V., Koditek D., Lepist E.-.I., McGrath ME., Patel L., Phillips G., Sedillo K., Somoza J.R. et al.. Atropisomerism by Design: Discovery of a selective and stable phosphoinositide 3-Kinase (PI3K) beta inhibitor. J. Med. Chem. 2018; 61:6858–6868. [DOI] [PubMed] [Google Scholar]
- 21. Jurgens S., Kuhn F.E., Casini A.. Cyclometalated complexes of platinum and gold with biological properties: state-of-the-art and future perspectives. Curr. Med. Chem. 2018; 25:437–461. [DOI] [PubMed] [Google Scholar]
- 22. Kharissova O.V., Méndez-Rojas MA., Kharisov BI., Méndez U.O., Martínez P.E.. Metal complexes containing natural and and artificial radioactive elements and their applications. Molecules. 2014; 19:10755–10802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Maruyama J., Miyamoto H., Kajihara M., Ogawa H., Maeda K., Sakoda Y., Yoshida R., Takada A.. Characterization of the envelope glycoprotein of a novel filovirus, lloviu virus. J. Virol. 2014; 88:99–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Lee J.E., Saphire E.O.. Ebolavirus glycoprotein structure and mechanism of entry. Future Virol. 2009; 4:621–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Software, public domain data and important documentation are available from: https://gsrs.ncats.nih.gov. Source code is available on GitHub at: https://github.com/ncats/gsrs-play. The software is provided under an Apache 2.0 license. The latest production release of the software is v2.5.1. The latest public data release was on 7 July 2020 (v2.5.1–20200707) and consists of 116 636 substance definitions. In accordance with FAIR data standards, the UNII is the globally unique and persistent identifier can be searched at https://gsrs.ncats.nih.gov/app/substances and https://fdasis.nlm.nih.gov/srs/srs.jsp and is also included within many other online repositories. Data objects are provided in JavaScript Object Notation (JSON) and substances, e.g. UNII: 5Y3NBK9IS7, can be retrieved through a request to, for example, https://gsrs.ncats.nih.gov/app/api/v1/substances(5Y3NBK9IS7). The process of extracting and transforming arbitrary exports can be resource intense and such functionality is not currently accessible from the public site. Local installation of the software allows users to download arbitrary sets of selected records in a variety of formats, including full JSON, TSV, SDF, etc. The database is accessible but not optimized for phone and tablet screens owing to the complexity of the data model and certain features such as chemical structure search; users may prefer to request the desktop version of the site from their mobile browser.