Abstract
This paper introduces a database derived from Structured Product Labels (SPLs). SPLs are legally mandated snapshots containing information on all drugs released to market in the United States. Since publication is not required for pre-trial findings, we hypothesize that SPLs may contain knowledge absent in the literature, and hence “novel.” SemMedDB is an existing database of computable knowledge derived from the literature. If SPL content could be similarly transformed, novel clinically relevant assertions in the SPLs could be identified through comparison with SemMedDB. After we derive a database (containing 4,297,481 assertions), we compare the extracted content with SemMedDB for recent FDA drug approvals. We find that novelty between the SPLs and the literature is nuanced, due to the redundancy of SPLs. Highlighting areas for improvement and future work, we conclude that SPLs contain a wealth of novel knowledge relevant to research and complementary to the literature.
Introduction
Translational researchers have long recognized the value of integrative approaches to complex problems. The large-scale interpretation of multi-modal, heterogeneous, distributed sources of observational data often requires robust background information to help interpret and ultimately transform these data into new knowledge.1–7 Researchers have exploited computable knowledge extracted from the literature in drug safety, drug repurposing, and oncology applications using Semantic MEDLINE, or SemMedDB.8–16 SemMedDB is a repository of structured knowledge extracted using a semantic interpreter of biomedical text. To build off the success of previous text mining projects, there is a pressing need to seek out new sources of knowledge. Another source of relevant, but yet to be mined pharmaceuticals-related knowledge, is content embedded in the narrative text of drug product labeling.
Drug product labeling standards are written into federal law and administered by the FDA. Since 2006, the Code of Federal Regulations has required submissions be sent to the FDA in an electronic format known as Structured Product Labeling (SPL).17 The SPL format is intended to make labels readable to both computers and humans. To that end, SPLs use a general technology standard called eXtensible Markup Language (XML). SPL has also been certified as a Health Level Seven International (HL7) standard for interoperability of electronic health information.
SPLs exist for all prescription and over-the-counter drugs approved for marketing in the United States. Each SPL summarizes knowledge about a drug based on pre-market studies and post-marketing information. These include information on: safety (e.g., black box warnings and reported adverse reactions), approved indications, clinical pharmacology, use in special populations, and drug-drug interactions. Because there is no legal requirement for pre-clinical or in vitro studies to be published, many of the knowledge claims summarized in SPLs might not be present in the published peer-reviewed biomedical literature. The purpose of this study is to describe knowledge claims present in SPLs, compare these knowledge claims with knowledge claims extracted from the literature, and determine the extent of novel knowledge in the SPLs. Another goal is to introduce and report on a newly created resource, called SemMedDB_SPL, that represents structured knowledge claims extracted from the SPLs of all prescription drugs currently marketed in the United States.
Background
Structured knowledge is knowledge that is in a computable format, meaning that the knowledge content in such a format that it is readable by computer programs. One convenient, computable representation of knowledge is that of a semantic predication. Semantic predications, also known as “triples,” consist of two concepts that relate to each other through some predicate (i.e., verb) such as “CAUSES” and “TREATS.”18–20 For instance, “ibuprofen CAUSES gastrointestinal_hemorrhage” is one such semantic predication. Semantic predications have been referred to as the “atoms of thought.”21 In philosophy and cognitive science, these are referred to variously as propositions or assertions, but in practice, the “proposition” refers to the normalized form (triple) and the assertion is the source sentence in the literature from which the semantic predication was derived.
SemRep is a symbolic natural language processing tool that was developed for the purposes of extracting, translating, and loading knowledge in the form of semantic predications by researchers the National Library of Medicine.20 SemRep was used to build Semantic MEDLINE1 (SemMedDB). SemMedDB stores structured knowledge extracted from titles and abstracts of peer-reviewed biomedical literature stored in MEDLINE.9,13,21–25 A number of research studies have used the structured knowledge including drug safety and drug repositioning.26
Another potential source of relevant knowledge is embedded in the narrative text of SPLs. Although there have been several studies that have used natural language to extract specific kinds of knowledge, such as indications and adverse drug reactions, from SPLs27–30,30–35, no studies to date have applied SemRep to extract a broad range of semantic predications. Because many SPLs report knowledge from unpublished pre-market studies, we hypothesize that semantic predications present in SPLs will complement those extracted from the scientific literature. Our basic assumption is that the extent of “novel” structured knowledge may be measured using the number of semantic predications captured in the SPLs but absent among the predications extracted from the literature. In this study, we report on a pipeline for extracting semantic predications from SPLs. Semantic predications extracted by the pipeline are stored in a new resource that we call SemMedDB_SPL. We describe the new resource and report on its coverage for several newly approved drugs relative to the extracted knowledge in the latest version of SemMedDB.20,36
Methods
Extracting knowledge from SPLs. SPLs for prescription drugs were manually downloaded from the DailyMed website hosted by the National Library of Medicine. A custom parser that we developed for SPLs translated the XML format into a relational database.37 SPL sections are coded using LOINC and contain a mix of narrative and tables tagged with HTML. In order to run SemRep on SPL content, we exported the text of selected sections by querying the relational database using LOINC codes, saving the query results into individual text files, and then parsing the text files to separate the narrative text from table content. Only the narrative content was processed using SemRep.38,39 We processed text from the following SPL sections: adverse reactions, boxed warning, clinical pharmacology, clinical studies, contraindications, description, dosage and administration, drug interactions, how supplied, inactive ingredients, indications and usage, overdosage, precautions, and use in specific populations. The process of extracting structured knowledge in the form of semantic predications is illustrated below in (Figure 1).
Figure 1:
Workflow diagram of structured knowledge extraction process running the SemRep NLP system. The steps were as follows: 1.) download the SPL XML files from DailyMed, 2.) parse the XML and populate a relational (SQL) database, 3.) separate SPL narrative text from table content, 4.) run SemRep NLP to extract semantic predication knowledge content, and 5.) load resulting predications into a SQL database for further analysis.
To extract meaningful assertions, we ran SemRep version 1.8 on the narrative text extracted from the SPLs. SemRep was run in batch mode with the anaphora flag activated (to help assign meaning to assertions with pronouns) using the UMLS 2018AB dictionary on a server running Ubuntu Linux v.16.04 (64 GB RAM, four-core Xeon processors). Finally, we extracted predications from the raw SemRep output, along with the source sentences and accompanying metadata. The final data was stored in a relational database (Postgres v11.5-1). We used a combination of python, SQL, shell scripts (bash), and R scripts to implement the experiments in the present study.
Figure 2 shows the two core content tables in the new database. For the sake of providing convenient access for querying the newly extracted SPL structured knowledge and SemMedDB, we also added the predications table (smdbpredications in Figure 2) from SemMedDB version 40R (with SemRep version 1.8 run without anaphora resolution, released July 10, 2019). The database also includes a SPL metadata table (structuredProductLabelMetadata) to make its simple to navigate between predications and the source files.
Figure 2:
Tables in the SemMedDB_SPL schema that contains computable knowledge in semantic predication format. The predications table from SemMedDB was also included to facilitate rapid comparison.
We then wrote queries to explore the features of the new resource and test if it held knowledge indicative of gaps in the peer-reviewed literature that might be of clinical interest. Our analysis proceeded as follows:
Exploratory analysis of structured knowledge in SPLs. We collected statistics (counts, frequencies) and explored distributional information by generating visual summaries. We generated heatmap visualizations normalized by total predication count of the distribution of predicate types by SPL section type.40
Examination of novel content. To find novel structured knowledge, we ran SQL queries that filter out novel SPL-derived predications from the pre-existing, literature-derived predications in SemMedDB.
Investigate comparative coverage of newly released drugs. If there are new drugs in the structured knowledge extracted from the SPLs, it may be helpful to use the coverage of these new drugs as a metric of comparison of literature-derived versus SPL-derived structured knowledge. Since there is coverage in both the literature and the SPLs, it is convenient to use SQL to discover what is potentially novel in the SPLs.
The first two analyses are general analysis and without specific inclusion criteria. For the third analysis, we only examined the drugs with indications approved by the FDA between January 2015-August 2019. Drugs had to have at least one mention in the subject position of a predication in SemMedDB and also have at least one mention in the subject position of predicates in SemMedDB_SPL. This was to ensure that at least some structured knowledge extracted from the scientific literature existed in SemMedDB. Finally, to help us focus on predications where drugs are causal agents, drug mentions had to be in the subject position of predications.
Results
We extracted 4,297,481 semantic predications from the SPLs (373,061 SPL sections for 37,749 pharmaceutical therapies). The clinical pharmacology and precautions SPL sections yielded the most predications (Table 1). SemRep extracted predications with 62 predicate types (some of these are negated, so “TREATS” becomes “NEG_TREATS”). The five most common predicate types in general were “PROCESS_OF” (n = 986,243), “TREATS” (744,794), “ISA” (366,616), “ADMINISTERED_TO” (287,794) and “LOCATION_OF” (273,879). Counts for clinically relevant predicates include “CAUSES” (172,654), “AFFECTS” (115,798), “INHIBITS” (98,486), “PREDISPOSES” (54,270), “STIMULATES” (49,193) and “COMPLICATES” (7,668). 86 SPL sections died or errored during processing. An examination of the logs revealed the culprits for these processing failures. Specifically, SemRep timed out while processing unruly strings such as expansive lists of adverse events and unruly chemical names burdened with extensive diacritical marks.
Table 1:
Statistics for the semantic predications extracted by SemRep from SPLs by SPL section type.
Section Name | Section count | # Predications | # of Predications/Section | # Unique Predications |
---|---|---|---|---|
adverse_reactions | 29,259 | 519,884 | 17.77 | 21,595 |
boxed_warning | 10,490 | 104,463 | 9.96 | 4,137 |
clinical_pharmacology | 31,228 | 781,157 | 25.01 | 38,263 |
clinical_studies | 15,818 | 345,019 | 21.81 | 17,547 |
Contraindications | 26,770 | 145,742 | 5.44 | 7,624 |
Description | 27,638 | 91,846 | 3.22 | 7,811 |
dosage_and_administration | 26,598 | 275,063 | 10.34 | 13,886 |
drug_interactions | 21,951 | 350,929 | 15.99 | 16,125 |
how_supplied | 3,506 | 8,492 | 2.42 | 2,317 |
inactive_ingredient | 90 | 189 | 2.1 | 73 |
indications_and_usage | 29,612 | 298,604 | 10.08 | 15,111 |
Overdosage | 24,766 | 145,883 | 5.89 | 4,716 |
Precautions | 18,623 | 857,711 | 46.06 | 29,515 |
use_in_specific_populations | 12,960 | 372,489 | 28.74 | 20,050 |
To narrow down the range of predicate types to analyze, we selected a subset of predicate types that most frequently occur with a pharmaceutical substance subject semantic type (“phsu”) or disease/syndrome object semantic type (“dsyn”), finding these semantic types to be useful starting points for drug safety and drug repurposing use cases. The predicates we included in our analysis were “TREATS,” “PREVENTS”, PREDISPOSES,” “CAUSES,” and “INTERACTS_WITH,” among others. The full list of predicates chosen are reflected in the labels along the y-axes of the Figures 3 and 4.
Examination of novel content. We analyzed SemMedDB_SPL to determine the extent of novel content. The heatmap in Figure 4 illustrates the breakdown of novel content by SPL section. The precautions and clinical pharmacology SPL sections (followed by the drug_interactions SPL section) were notable for having substantially more novel predications than other SPL sections. Of the 4,297,481 total predications, only 142,311 of the semantic predications are unique. Each predication is repeated an average of 30.20 times. By comparison, the current release of SemMedDB (version 40R) has 97,972,561 semantic predications (extracted from 29,115,337 abstracts), with 19,836,608 of them being unique (with each predication repeated an average of 4.86 times).36
Figure 4:
This heatmap illustrates the relative distribution of novel predications (that is, not in SemMedDB) by predicate-type (along the y-axis) in each SPL section-type (x-axis) in SemMedDB_SPL. Red indicates a lower predicate frequency with lighter colors to white, indicating higher frequency by predicate type. The numbers in the legend indicate predicate counts on a natural log scale.
Investigate comparative coverage of newly released drugs. There was a total of 183 drugs in the list of drugs released in the last five years by the FDA. Of the 103 drugs from that list that were in SemMedDB_SPL, only 19 were also found in SemMedDB. Samples of novel predications for newly released drugs are enumerated below as a listing in Figure 5 and a summary of the drug coverage is provided in Table 2.
Figure 5:
This listing outlines a subset of novel predications extracted from the SPLs for drugs released by the FDA to market (January 2015-August 2019) that also had coverage in SemMedDB. The subject (pharmaceutical drug) of each predicate is underlined with a colon at the beginning, followed by each novel predicate and object, respectively. Potential side-effects and drug-gene mechanisms of strong pharmacogenomic importance have been bolded.
Table 2:
Coverage of drugs recently released by the FDA (January 2015 - August 2019) by # of predications per UMLS CUI. Search performed by querying the predications table using the subject_name string would yield slightly different results.
Drug name | # in SemMedDB (unique indexed articles) | # in SemMedDB_SPL (unique SPLs) | # Novel Predications (unique SPLs) |
---|---|---|---|
Cannabidiol (CUI: C0006863) | 2797 (1142) | 2 (1) | 1 (1) |
Ivabradine (CUI: C0257190) | 1800 (651) | 7 (2) | 6 (2) |
Daclizumab (CUI: C0663182) | 1477 (657) | 7 (2) | 4 (2) |
Deflazacort (CUI: C0057258) | 668 (256) | 9 (2) | 7 (2) |
Mepolizumab (CUI: C0969324) | 548 (185) | 14 (1) | 9 (1) |
Prucalopride (CUI: C0913506) | 460 (160) | 13 (1) | 10 (1) |
Stiripentol (CUI: C0075262) | 308 (112) | 1 (1) | 1 (1) |
Flibanserin (CUI: C0754280) | 245 (80) | 18 (3) | 15 (2) |
Safinamide (CUI: C1098261) | 166 (64) | 12 (1) | 8 (1) |
Secnidazole (CUI: C0074246) | 148 (65) | 7 (2) | 6 (2) |
Tafenoquine (CUI: C0903411) | 144 (61) | 22 (2) | 19 (2) |
Our query of SemMedDB_SPL for as yet undiscovered knowledge for the newly released drugs yielded many predications of potential interest, many of which are listed in Figure 5. We compared the coverage for older versus newer statins by counting unique predications with the statin name in the subject position. We found that approximately half of the predications were novel for the older statins (Simvastatin [55% or 60 of 108 predications] and atorvastatin [54.8% or 46 of 84]), but that all of the predications were novel for the newer statins (alirocumab [“Pralauent”] and evolocumab [“Repartha”]), with 10 and 9 distinct mentions in the subject position, respectively.
Discussion
SPLs are essentially snapshots of all knowledge for all drugs that have approval for marketing in the United States. SPLs are continuously updated, as product labels are legally required to be up to date. The model provided for the current release of SemMedDB_SPL shows the feasibility of automatically extracting predications from SPLs. We hypothesized that there was evidence in the SPLs that were not present in the literature. We found that across the sections that there are novel predications that are present in the SPLs but not in the predications extracted from the indexed biomedical literature. The distribution of predication types by SPL section was uneven. The yield of predications extracted varied considerably by section type, with precautions, use_in_specific_populations, and clinical pharmacology being the most fecund, as per Table 1 above, while other sections yield substantially fewer predications.
We also found that many predications are of potential clinical relevance as per Figure 5. The relational table structure enables linking from predications to the source texts in the sentences table. For example, the semantic predication “flibanserin CAUSES syncope” links to the following source sentence retrieved from the box warning section: “The concomitant use of ADDYI and moderate or strong CYP3A4 inhibitors increases flibanserin concentrations, which can cause severe hypotension and syncope [(see CONTRAINDICATIONS and WARNINGS)].”
Novelty and redundancy. We defined novelty as the presence structured knowledge present in the SPLs (SemMedDB_SPL) but absent in the structured knowledge extracted from the literature (SemMedDB). We tested our assumption that the SemRep NLP system could accurately map from detected concept unique identifier or CUI in the UMLS to concept mentions in the SPL narrative text and in a different process on the accuracy of concept identification in the indexed peer-reviewed biomedical literature.
As long as SemRep maps to the correct concept unique identifier or CUI in the UMLS that encodes those strings in the subjects of semantic predications, we were be able to compare the two knowledge sources. However, there may be variants that are not in the UMLS that we might have missed.
Novelty between the literature and SPLs is nuanced: some sections might hold statements that are more precautionary than scientific.32 The frequent repetition of predications may indicate that the predications derived from SPLs may be more redundant than those derived from the literature. There is known to be a many to many relationship between SPLs and drugs. Accordingly, SPLs for each product can be repeated many times. Since we did not select canonical labels for each drug, much of the duplication that has been noted is likely due to repeat narrative in the SPLs. Repetition and redundancy need to be addressed. This could be resolved by picking a canonical name for each drug to address this issue. The “structured” part of product labels may be more aspirational than a verdict. For example, predications having a “CAUSES” predicate extracted from precaution sections or black box warning sections may be uninformative, as causal predications in such sections may be issued by pharmaceutical companies as blanket coverage to indemnify themselves against lawsuits springing from potential side-effects.
We found that the distribution of predication types by the SPL section type is uneven. The yield of predications extracted using SemRep across section types varied substantially, with precautions, use_in_specific_populations, and clinical pharmacology being the most fecund, as per Table 1 and Figures 3 and 4 above.
Limitations and directions for future work. As a pilot project toward a more comprehensive biomedical database of structured causal knowledge and with such a crude measure of novelty, this work is subject to several limitations. First, we extracted the data using plain text output instead of the more informative XML output. The XML output contains metadata concerning the confidence of the predication. Such information could help inform machine learning models that incorporate information describing the degree of belief. Second, we were puzzled initially why so many recently released drugs were missing from SemMedDB. We performed a PubMed search for several of the missing drugs and associated side-effects, and found publications that mention drugs absent in SemMedDB. One explanation is that the SemMedDB_SPL and SemMedDB are not strictly comparable: in the process of constructing SemMedDB, the 2006AA version of the UMLS lexicon was applied without anaphora resolution, whereas the 2018AB version of the UMLS lexicon was applied to the SPLs with anaphora resolution. Forthcoming analyses should address this issue by using the same lexicons. We also suspect that much knowledge is missing because of the NLP: SemRep is noted for high precision, but low recall.41 NLP both facilitates and limits text the helpfulness of research. As methods for extracting knowledge from the free text with high confidence improves, we can expect that efforts in this domain to improve accordingly, particularly with efforts afoot to improve extraction recall for causal language.42 Finally, we analyzed only the latest SPLs available. One would expect that the longer a drug has been on the literature, the less novelty. In future work, we hope to track the accretion of new knowledge longitudinally by analyzing archived versions of the SPLs. In this way, we could take snapshots of what was known about the drug historically at various time points.
Potential applications. The schema is under revision to address longitudinal (archived) labeling information. Other potential applications of SemMedDB_SPL include literature-based discovery (LBD), causal feature selection for statistical and causal graphical modeling, and knowledge engineering. Recent developments in LBD support ingesting structured knowledge to help researchers generate hypotheses about potential therapies.12,13,24,43
Causal learning. Mathematical formalisms called graphical causal models have emerged that can learn causal structures from observational data.44–46 Observing that conditional dependence (and independence) results from causal relationships, causal graphical modeling methods work in the opposite direction, by learning dependencies to infer causal relationships.44,47 However, these methods often perform better when domain knowledge is available.1,14,21 However, the domain expertise of live human experts cannot scale. SemMedDB_SPL could be exploited as a component in a translational pipeline to stand in for theoretical domain expertise to help interpret retrospective observational data.
Discovering contradictions. Though not the focus on the present paper, research into contradictions and scientific retraction are active areas of research.49–51 In particular, contradictory knowledge claims could be a productive source of leads for hypothesis generation. Researchers may ask, for example: do findings from pre-clinical trials contradict those in the peer-reviewed literature? More specifically, are there cases where the pre-trial findings indicate a “CAUSES” predicate and the literature report a “PREVENT” or “NEG_CAUSES”?
More comprehensive qualitative, and quantitative, analysis of the “novel” content remains. For example, are the extracted predications correct? Are some static predicates such as “ISA” or “PROCESS_OF” really relevant for drug safety or drug repurposing purposes? Are any of the novel relationships are mentioned in the literature but missed by SemRep? Is any of the missing knowledge available in other structured knowledge resources?
Conclusion
The present work presents an initial pilot study intended to create a new resource of structured knowledge for use and reuse by other researchers in the area of drug repurposing, drug safety, and drug discovery. We have provided the code we used publicly so that other researchers can reproduce and build upon our efforts to make available a resource to elucidate novel or clinically relevant content contained in SPL text narratives.52 We intend for SemMedDB_SPL to contribute critical structured knowledge to pharmaceuticals-based research pipelines, and to complement existing structured knowledge resources such as SemMedDB. To that end, we have made the materials created for this paper publicly available at http://github.com/dbmi-pitt/SemMedDB_SPL.
Acknowledgments
This work is support by a training grant from the University of Pittsburgh Department of Biomedical Informatics grant T15LM007059. I want to thank Harry Hochheiser of the University of Pittsburgh DBMI and Halil Kilicoglu of the University of Illinois Urbana-Champaign for their thoughtful feedback on earlier drafts of this manuscript.
Figures & Table
Figure 3:
This heatmap illustrates the relative distribution of predicates (along the y-axis) in each SPL section-type (x-axis). Red indicates a higher predicate frequency with lighter colors to white, indicating lower frequency by predicate type. The numbers in the legend indicate predicate counts on a natural log scale.
References
- 1.Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002 Jan 15;155(2):176–84. doi: 10.1093/aje/155.2.176. [DOI] [PubMed] [Google Scholar]
- 2.Boyce RD, Ryan PB, Norén GN, Schuemie MJ, Reich C, Duke J, et al. Bridging Islands of Information to Establish an Integrated Knowledge Base of Drugs and Health Outcomes of Interest. Drug Safety [Internet] 2014 Aug;37(8):557–67. doi: 10.1007/s40264-014-0189-0. [cited 2017 Jul 21]. Available from: http://link.springer.com/10.1007/s40264-014-0189-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Koutkias VG, Jaulent M-C. Computational approaches for pharmacovigilance signal detection: toward integrated and semantically-enriched frameworks. Drug Saf. 2015 Mar;38(3):219–32. doi: 10.1007/s40264-015-0278-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Natsiavas P, Malousi A, Bousquet C, Jaulent M-C, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol. 2019;10:415. doi: 10.3389/fphar.2019.00415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Koutkias VG, Lillo-Le Louët A, Jaulent M-C. Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies. Expert Opin Drug Saf. 2016 Nov:4. doi: 10.1080/14740338.2017.1257604. [DOI] [PubMed] [Google Scholar]
- 6.Li Y. Combining Heterogeneous Databases to Detect Adverse Drug Reactions. 2015 [cited 2018 Jun 12]; Available from: https://academiccommons.columbia.edu/catalog/ac:189526. [Google Scholar]
- 7.Boyce R, Voss E, Evans L, Reich C, Duke J, Tatonetti N, et al. LAERTES: An open system architecture for linking pharmacovigilance evidence sources with clinical data. Journal of Biomedical Semantics. 2017 doi: 10.1186/s13326-017-0115-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cohen T, Widdows D. Empirical Distributional Semantics: Methods and Biomedical Applications. Journal of biomedical informatics [Internet] 2009 Apr;42(2):390–405. doi: 10.1016/j.jbi.2009.02.002. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2750802/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hristovski D, Friedman C, Rindflesch TC, Peterlin B. Exploiting semantic relations for literature-based discovery. AMIA Annu Symp Proc. 2006:349–53. [PMC free article] [PubMed] [Google Scholar]
- 10.Cameron D, Bodenreider O, Yalamanchili H, Danh T, Vallabhaneni S, Thirunarayan K, et al. A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. J Biomed Inform. 2013 Apr;46(2):238–51. doi: 10.1016/j.jbi.2012.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Smalheiser NR. Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery. J Data Inf Sci. 2017 Dec;2(4):43–64. doi: 10.1515/jdis-2017-0019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986 Autumn;30(1):7–18. doi: 10.1353/pbm.1986.0087. [DOI] [PubMed] [Google Scholar]
- 13.Cohen T, Widdows D. Embedding of semantic predications. J Biomed Inform. 2017 Apr;68:150–66. doi: 10.1016/j.jbi.2017.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Malec S. Using the Literature to Identify Confounders. UT SBMI Dissertations. 2018 Jan 1 (Open Access) [Internet]. Available from: https://digitalcommons.library.tmc.edu/uthshis_dissertations/41. [Google Scholar]
- 15.Malec SA, Gottlieb A, Bernstam E, Cohen T. Using the Literature to Construct Causal Models for Pharmacovigilance. 2018 [Google Scholar]
- 16.Fathiamini S, Johnson AM, Zeng J, Araya A, Holla V, Bailey AM, et al. Automated identification of molecular effects of drugs (AIMED) J Am Med Inform Assoc. 2016 Jul;23(4):758–65. doi: 10.1093/jamia/ocw030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.CFR - Code of Federal Regulations Title 21 [Internet] [cited 2019 Jul 29]. Available from: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?fr=201.200. [Google Scholar]
- 18.Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003 Dec;36(6):462–77. doi: 10.1016/j.jbi.2003.11.003. [DOI] [PubMed] [Google Scholar]
- 19.Ahlers CB, Fiszman M, Demner-Fushman D, Lang F-M, Rindflesch TC. Extracting semantic predications from Medline citations for pharmacogenomics. Pac Symp Biocomput. 2007:209–20. [PubMed] [Google Scholar]
- 20.Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics. 2012 Dec 1;28(23):3158–60. doi: 10.1093/bioinformatics/bts591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Malec SA, Wei P, Xu H, Bernstam EV, Myneni S, Cohen T. Literature-Based Discovery of Confounding in Observational Clinical Data. AMIA Annu Symp Proc. 2016;2016:1920–9. [PMC free article] [PubMed] [Google Scholar]
- 22.Widdows D, Cohen T. Reasoning with Vectors: A Continuous Model for Fast Robust Inference. Log J IGPL. 2015 Oct;23(2):141–73. doi: 10.1093/jigpal/jzu028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cohen T, Schvaneveldt RW, Rindflesch TC. Predication-based semantic indexing: permutations as a means to encode predications in semantic space. AMIA Annu Symp Proc. 2009 Nov. 2009;14:114–8. [PMC free article] [PubMed] [Google Scholar]
- 24.Hristovski D, Rindflesch T, Peterlin B. Using literature-based discovery to identify novel therapeutic approaches. Cardiovasc Hematol Agents Med Chem. 2013 Mar;11(1):14–24. doi: 10.2174/1871525711311010005. [DOI] [PubMed] [Google Scholar]
- 25.Cameron D, Kavuluru R, Rindflesch TC, Sheth AP, Thirunarayan K, Bodenreider O. Context-driven automatic subgraph creation for literature-based discovery. J Biomed Inform. 2015 Apr;54:141–57. doi: 10.1016/j.jbi.2015.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rindflesch TC, Blake CL, Fiszman M, Kilicoglu H, Rosemblat G, Schneider J, et al. Informatics Support for Basic Research in Biomedicine. ILAR J [Internet] 2017 Jul 1;58(1):80–9. doi: 10.1093/ilar/ilx004. [cited 2019 Aug 5]. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5886329/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Culbertson A, Fiszman M, Shin D, Rindflesch TC. Semantic Processing to Identify Adverse Drug Event Information from Black Box Warnings. AMIA Annu Symp Proc. 2014:442–8. [Internet]. 2014 Nov 14 [cited 2019 Jan 1]. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419903/ [PMC free article] [PubMed] [Google Scholar]
- 28.Hassanzadeh O, Qian Z, Freimuth R, Boyce R. Extending the “Web of Drug Identity” with Knowledge Extracted from United States Product Labels; Proceedings of the 2013 AMIA Summit on Translational Bioinformatics; San Francisco, CA; 2013. [PMC free article] [PubMed] [Google Scholar]
- 29.Khare R, Wei C-H, Lu Z. Automatic Extraction of Drug Indications from FDA Drug Labels. AMIA Annu Symp Proc [Internet] 2014:787–94. 2014 Nov 14 [cited 2019 Aug 12]; Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419914/ [PMC free article] [PubMed] [Google Scholar]
- 30.Bisgin H, Liu Z, Fang H, Xu X, Tong W. Mining FDA drug labels using an unsupervised learning technique - topic modeling. BMC Bioinformatics [Internet] 2011;12(Suppl 10):S11. doi: 10.1186/1471-2105-12-S10-S11. [cited 2019 Aug 12]. Available from: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-S10-S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Rastegar-Mojarad M, Harrington B, Belknap SM. Automatic detection of drug interaction mismatches in package inserts. 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2013:373–7. [Google Scholar]
- 32.Duke J FJ. A quantitative analysis of adverse events and “overwarning” in drug labeling. Archives of Internal Medicine [Internet] 2011 May;171(10):941–954. doi: 10.1001/archinternmed.2011.182. [cited 2012 Oct 25]. Available from: http://dx.doi.org/10.1001/archinternmed.2011.182. [DOI] [PubMed] [Google Scholar]
- 33.Fung KW, Jao CS, Demner-Fushman D. Extracting drug indication information from structured product labels using natural language processing. Journal of the American Medical Informatics Association [Internet] 2013 May;20(3):482–8. doi: 10.1136/amiajnl-2012-001291. [cited 2019 Aug 12]. Available from: https://academic.oup.com/jamia/article-lookup/doi/10.1136/amiajnl-2012-001291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Boyce RD, Horn JR, Hassanzadeh O, de Waard A, Schneider J, Luciano JS, et al. Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness. J Biomed Sem [Internet] 2013;4(1):5. doi: 10.1186/2041-1480-4-5. [cited 2019 Aug 12]. Available from: http://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Li Q, Deleger L, Lingren T, Zhai H, Kaiser M, Stoutenborough L, et al. Mining FDA drug labels for medical conditions. BMC Med Inform Decis Mak [Internet] 2013 Dec;13(1):53. doi: 10.1186/1472-6947-13-53. [cited 2019 Aug 12]. Available from: http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Semantic Knowledge Representation [Internet] [cited 2019 Jul 29]. Available from: https://skr3.nlm.nih.gov/SemMedDB/download/download.html. [Google Scholar]
- 37.Scripts that Bio2RDF users have created to generate RDF versions of scientific datasets: bio2rdf/bio2rdf-scripts [Internet] Bio2RDF Consortium. 2019 [cited 2019 Aug 7]. Available from: https://github.com/bio2rdf/bio2rdf-scripts. [Google Scholar]
- 38.Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform. 2008 Oct;41(5):706–16. doi: 10.1016/j.jbi.2008.03.004. [DOI] [PubMed] [Google Scholar]
- 39.DailyMed [Internet] [cited 2019 Feb 6]. Available from: https://dailymed.nlm.nih.gov/dailymed/ [Google Scholar]
- 40.Kolde R. pheatmap: Pretty Heatmaps [Internet] 2019 [cited 2019 Aug 9]. Available from: https://CRAN.R-project.org/package=pheatmap. [Google Scholar]
- 41.Kilicoglu H, Fiszman M, Rosemblat G, Marimpietri S, Rindflesch T. Arguments of Nominals in Semantic Interpretation of Biomedical Text; Proceedings of the 2010 Workshop on Biomedical Natural Language Processing [Internet]; Uppsala, Sweden: Association for Computational Linguistics; 2010. pp. 46–54. Available from: http://www.aclweb.org/anthology/W10-1906. [Google Scholar]
- 42.Dunietz J, Levin L, Carbonell J. The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations; Proceedings of the 11th Linguistic Annotation Workshop [Internet]; Valencia, Spain: Association for Computational Linguistics; 2017. pp. 95–104. [cited 2019 Aug 14]. Available from: http://aclweb.org/anthology/W17-0812. [Google Scholar]
- 43.Smalheiser NR. Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery. Journal of data and information science (Warsaw, Poland) [Internet] 2017 Dec;2(4):43–64. doi: 10.1515/jdis-2017-0019. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5771422/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Pearl J. Cambridge: Cambridge University Press; 2009. Causality: Models, Reasoning, and Inference [Internet] 2nd ed. [cited 2017 Jul 21]. Available from: http://ebooks.cambridge.org/ref/id/CBO9780511803161. [Google Scholar]
- 45.Scheines R, Spirtes P, Glymour C, Meek C, Richardson T. The TETRAD Project: Constraint Based Aids to Causal Model Specification. Multivariate Behav Res. 1998 Jan 1;33(1):65–117. doi: 10.1207/s15327906mbr3301_3. [DOI] [PubMed] [Google Scholar]
- 46.Cooper GF, Yoo C. Causal Discovery from a Mixture of Experimental and Observational Data. arXiv:13016686 [cs] [Internet] 2013 Jan 23 [cited 2018 Jun 17]; Available from: http://arxiv.org/abs/1301.6686. [Google Scholar]
- 47.Korb KB, Nicholson AE. CRC Press (Chapman & Hall/CRC Computer Science & Data Analysis); 2010. Bayesian Artificial Intelligence [Internet] Available from: https://books.google.com/books?id=LxXOBQAAQBAJ. [Google Scholar]
- 48.Celebi R, Yasar E, Uyar H, Gumus O, Dikenelli O, Dumontier M. Evaluation of Knowledge Graph Embedding Approaches for Drug-Drug Interaction Prediction using Linked Open Data. 2018 Dec 3 doi: 10.1186/s12859-019-3284-5. [cited 2019 Aug 15]; Available from: https://cris.maastrichtuniversity.nl/portal/en/publications/evaluation-of-knowledge-graph-embedding-approaches-for-drugdrug-interaction-prediction-using-linked-open-data(1eb48126-6544-4719-b331-d8c13c7d2a83).html. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ioannidis JPA. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005 Jul 13;294(2):218–28. doi: 10.1001/jama.294.2.218. [DOI] [PubMed] [Google Scholar]
- 50.Prasad V, Cifu A. Medical reversal: why we must raise the bar before adopting new technologies. The Yale journal of biology and medicine [Internet] 2011 Dec;84(4):471–8. Available from: https://www.ncbi.nlm.nih.gov/pubmed/22180684. [PMC free article] [PubMed] [Google Scholar]
- 51.Alamri A. University of Sheffield; 2016. The Detection of Contradictory Claims in Biomedical Abstracts [Internet] [phd] [cited 2019 Aug 15]; Available from: http://etheses.whiterose.ac.uk/15893/ [Google Scholar]
- 52.dbmi-pitt/SemMedDB_SPL [Internet]. GitHub. [cited 2019 Aug 12]. Available from: https://github.com/dbmi-pitt/SemMedDB_SPL. [Google Scholar]