Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 1.
Published in final edited form as: Drug Saf. 2020 Jun;43(6):567–582. doi: 10.1007/s40264-020-00915-6

Leveraging human genetics to identify safety signals prior to drug marketing approval and clinical use

Rebecca N Jerome 1, Meghan Morrison Joly 1, Nan Kennedy 1, Jana K Shirey-Rice 1, Dan M Roden 2,3, Gordon R Bernard 1, Kenneth J Holroyd 1,4, Joshua C Denny 3, Jill M Pulley 1
PMCID: PMC7398579  NIHMSID: NIHMS1569781  PMID: 32112228

Abstract

Introduction:

When a new drug or biologic product enters the market, its full spectrum of side effects is not yet fully understood, as use in the real world often uncovers nuances not suggested within the relatively narrow confines of preapproval preclinical and trial work.

Objective:

We describe a new, phenome-wide association study (PheWAS)- and evidence-based approach for detection of potential adverse drug effects.

Methods:

We leveraged our established platform that integrates human genetic data with associated phenotypes in electronic health records from 29,722 patients of European ancestry to identify gene-phenotype associations that may represent known safety issues. We examined PheWAS data and published literature for 16 genes, each of which encodes a protein targeted by at least one drug or biologic product.

Results:

Initial data demonstrate that our novel approach, SA-PheWAS—Safety Ascertainment using PheWAS-- can replicate published safety information across multiple drug classes, with validating findings for 13 of 16 gene/drug class pairs.

Conclusions:

By connecting and integrating in vivo and in silico data, SA-PheWAS offers an opportunity to supplement current methods for predicting or confirming safety signals associated with therapeutic agents.

2. Introduction

The safety of any new drug or biologic product in development is paramount to the regulatory approval process. Safety problems can manifest at any stage and result in the failure to bring a potential drug to market. Notably, safety signals can arise in later stages of development when additional patient factors come into play, including co-morbid conditions, concomitant medications, or genetic differences.[1] A recent study reported that almost 1 in 5 therapies being tested in Phase III clinical trials fail due to safety concerns, including increased risk of death or other serious adverse events (SAEs) such as cancer, stroke, or sepsis.[2] Moreover, by the time a drug is tested in later clinical trials, the bulk of development funds have already been invested. Discovery of safety issues in later phases thus comprise both greater risk to patients and more costly failures to a development effort.

Even when not resulting in a development failure, drug safety issues can lead to significant delays in the regulatory approval process and may disparately affect therapeutics for serious diseases. In one analysis, participant attrition rates due to safety concerns for all clinical studies between 2011 and 2012 were as high as 28%; rates were even higher in Phase III studies (35%), with the highest attrition rates in oncology and central nervous system (CNS) disorders.[3] Another analysis showed that 77% of New Drug Applications (NDAs) that required multiple rounds of Food and Drug Administration (FDA) review were due to safety concerns prompting the need for additional evaluation.[4]

Furthermore, when a new therapy is approved for marketing and enters clinical use, usually less than 10,000 individuals will have been exposed to the novel intervention in pre-approval trials. Rare serious adverse drug reactions may not have been observed, or not yet recognized as associated with the intervention. Safety issues, even major concerns resulting in market withdrawal, can and do occur after a drug is used more widely in broader and more diverse patient populations than those in its pre-approval clinical trials, and for longer durations than in previous research.[5] According to a recent study, 32% of drugs approved by FDA between 2001 and 2010 were affected by post-market safety events, including safety communications, boxed warnings, or withdrawal.[6] The smaller subset of drugs eventually withdrawn due to serious safety concerns includes agents valdecoxib, an anti-inflammatory withdrawn for adverse cardiovascular effects, efalizumab, a monoclonal antibody withdrawn due to risk of fatal brain infections, and sitaxentan, a pulmonary arterial hypertension therapeutic withdrawn due to reports of severe hepatotoxicity. [6-8]

These withdrawals are a testament to the strong safety surveillance systems currently in place, including FDA’s Adverse Events Reporting System (FAERS) and its post-marketing study requirements.[9,10] These reporting systems, however, are subject to well-known biases that include under- or over-reporting and an inability to determine causality. To mitigate some of these limitations, new approaches for pharmacovigilance, a priority for many drug developers, are being formulated and implemented. Data from electronic health records (EHRs), for example, can be used to detect safety signals when specified events exceed expected thresholds.[7] The linkage of EHR data to administrative data, typically in the form of International Classification of Diseases (ICD) codes primarily collected for inpatient billing purposes[11], can also be a powerful tool for identifying, pulling and categorizing such patient data. Data culled from disease/product registries, social media forums, mobile apps, and wearable devices, can also be used to examine real-world drug safety results.[12,13] Behavioral data drawn from internet search logs,[14] extensive reviews of the biomedical literature,[15] and mapping of biological and chemical mechanisms from knowledge bases[16] can also contribute information to safety profiles. These methods have been used intentionally but sporadically, and each has its limitations.

To complement and enhance our collective ability to identify safety concerns, we have developed a proactive approach to detecting a drug or biologic’s safety signals, both beneficial and deleterious, in human, with a focus on applicability to the development phase prior to actual clinical testing.[17] This novel drug safety platform employs phenome-wide association methods (PheWAS), analyzing de-identified human genetic data linked to electronic health records to identify associations between single nucleotide polymorphisms (SNPs) in drug target genes that recapitulate drug effects and diseases. While a genome-wide association study (GWAS) can identify novel genetic associations for many diseases, PheWAS can be considered a “reverse GWAS” - determining a range of clinical phenotypes associated with a given genotype. [18] After documenting that these SNPs capture the target disease, our Safety Ascertainment using PheWAS (SA-PheWAS) approach can use these variants to identify novel associated phenotypes or phenotype surrogates, such as abnormal laboratory values, that may serve as drug safety signals (Figure 1). SA-PheWAS is designed to detect potential safety signals early in the drug development process. Here, we describe the methodology and design underpinning our proposed drug safety ascertainment process and explore the utility of SA-PheWAS in replicating published safety information.

Fig. 1.

Fig. 1

Schematic overview of the SA-PheWAS method

3. Methods

Leveraging existing infrastructure for SA-PheWAS.

We undertook an application of existing infrastructure within Vanderbilt University Medical Center (VUMC) and employed current analytic approaches to explore whether the SA-PheWAS method could accurately detect known safety side effects of drugs marketed in the United States and elsewhere (Figure 1). We leveraged BioVU,[19] Vanderbilt’s large-scale DNA biobank that houses approximately 250,000 DNA samples extracted from patient blood samples collected during routine clinical testing that would otherwise be discarded. Sample collection began in 2007. These specimens are linked to corresponding, longitudinal clinical data from the Synthetic Derivative (SD), a database of over 2.2 million de-identified electronic health records (EHRs) built for research purposes and updated quarterly.[19] The SD includes the core elements of health-related phenotypes, e.g. diagnoses, procedures, medications, laboratory testing results, family and social history, vital signs, demographics and other details extracted from clinical notes.

We conducted PheWAS analyses using previously reported methods,[18,20,21] focusing on 29,722 participants of European ancestry genotyped using the Illumina Infinium Exomechip (Illumina, Inc, San Diego, CA), which contains ~250,000 variants across the protein coding region of the genome, and corresponding phenotype data extracted from de-identified EHR.[21] We focused on the European ancestry dataset, as determined by EIGENSTRAT principal components analysis of genetic ancestry,[22] in these studies because we are currently underpowered in other ethnicities. For the PheWAS analysis, all distinct ICD9 codes were captured from each patient’s EHR and translated into corresponding phenotypes groupings (phecodes).[21] We defined a case as a record with two or more occurrences of an ICD9 code that mapped to one of the approximately 1,800 phecodes; previous work has shown that requiring two or more occurrences of a code in a record aids in excluding misdiagnoses and other coding irregularities.[23] Patients with records that did not contain any ICD9 codes belonging to the exclusion code grouping corresponding for that case were designated as controls; these exclusion groupings are aligned with ICD9 hierarchical structure, which groups related conditions.[18,23]

3.2. Gene selection.

For this exploratory, proof of concept analysis, we selected 16 genes encoding proteins that function as the primary established target of known therapeutic agents in use today (Table 1), and noted the directionality of the therapeutic agent’s effect on the protein target (e.g. inhibitor, antagonist, analague). Because our method leverages SNPs within a given gene and extrapolates to effects on the associated protein (e.g. inhibition, activation) as a proxy for drug effects, this approach is not suitable for therapeutic agents with multiple protein targets. We intentionally included targets reflecting therapeutic agents across a wide spectrum of diseases to gauge the utility of this method across varied use cases.

Table 1. Drug-gene class pairs analyzed using the SA-PheWAS approach.

True positives are defined as safety signals identified by SA-PheWAS that match with published safety information. False negatives are defined as 1) safety signals identified by SA-PheWAS that are discordant with published safety information or 2) published safety information for the listed drug class that were not identified using the SA-PheWAS method. True negatives are defined as any non-statistically significant signal that also lacked published safety information; however, these would be difficult to ascertain through SA-PheWAS.

Gene Drug class Safety signal identified using SA-PheWAS
True positive PCSK9 PCSK9 inhibitors Spina bifida
TNF TNF inhibitors Cellulitis and abscess of leg, except foot
PPARG PPAR agonists Morbid obesity
ESR1 Selective estrogen receptor modulators (SERMs) Subarachnoid hemorrhage/Hemorrhage NOS
ACE ACE inhibitors Congenital anomalies of the urinary system
PLA2G2A sPLA2 inhibitors Primary hypercoagulable state
GRIN2B NMDA receptor antagonists Symbolic dysfunction
GRIN2A GluN2A antagonists Paroxysmal tachycardia, unspecified/Pulmonary heart disease/Sleep disorders
HMGCR Statins Polyneuropathy in diabetes/Type 2 diabetes with neurological manifestations
SLC6A4 Selective serotonin reuptake inhibitors (SSRIs) Peptic ulcer (excluding esophagus)
PDCD1 Anti-PDL1 antibodies Pain in joint/Osteoarthritis/Degenerative skin conditions and other dermatoses
ATP4A Proton pump inhibitors (PPIs) Diverticulosis
SLC5A2 SGLT2 inhibitors Other disorders of the kidney and ureters
False negative SLC5A2
CACNB2
TBXA2R
PTGER2
SGLT2 inhibitors
Calcium channel blockers
Thromboxane A2 receptor (TPR) antagonists
Prostaglandin analogues
Stress fracture
None identified
None identified
None identified
True negative - - -

3.3. SNP selection.

Single nucleotide polymorphisms (SNPs) in our candidate genes with minor allele frequency ≥0.001 and high quality (<5% call missingness rate) on the Illumina Exomechip were included in our analysis. SNPs for analysis in this study were chosen based on several criteria: 1) analyzing any functional impact of each SNP available in the published literature, or predicted functional impact of SNPs by using prediction algorithms such as SIFT,[24] PolyPhen2,[25] and Combined Annotation-Dependent Depletion[26] when functional studies have not yet been published, as well as consulting databases such as the data captured in Ensembl,[27] to assess whether each SNP corresponded with a known regulatory region; 2) analysis of predicted and known involvement of the SNP, gene, and protein in the biologic pathways linked to the phenotype; and 3) analysis of the peer-reviewed biomedical literature related to the SNP, gene, protein and known effects of modulating the protein via pharmacologic or genetic methods. For the purpose of this initial exploration, we focused on one SNP within each gene that met validation criteria described below. Our intent with this proof-of-concept sample was to allow for preliminary evaluation of the utility of the approach; future work will assess applying SA-PheWAS applied across multiple variants within a gene and across a broader sampling of genes corresponding with proteins targeted by therapeutic agents.

3.4. Validation of expected therapeutic indications.

We performed a literature and database review of SNPs within our PheWAS results to explore concordance between our PheWAS findings and known associations between variants and manifestations of health and disease. As part of this initial validation step, we used PheWAS to assess the association between each SNP and published phenotypes previously associated with those SNPs. For example, the involvement of PCSK9 in lipid homeostasis is well understood and loss-of-function mutations in PCSK9, like R46L, a PCSK9 variant (Table 2), are associated with unusually low-density lipoprotein (LDL) cholesterol and decreased risk for developing cardiovascular disease.[28] Therefore, we assessed the biologic effect of R46L in our PheWAS data, confirming an association with reduced risk of hyperlipemia (which mimics the known effect of the drug), as reflected by the odds ratio yielded by PheWAS. If PheWAS results for a gene-drug pair did not reflect the known indication for the therapeutic agent, we considered this lack of validation signal identification as a stopping point and did not pursue that SNP or gene further.

Table 2. The SA-PheWAS approach identifies potential adverse effects (or safety signals) concordant with safety issues observed during real world use.

Gene SNP and Basic
SNP effect
SNP minor
allele
frequency
(MAF)
Drug Class Validates expected
indication
(OR, p value)
Safety signal
detected
(OR, p value)
Cases
(n)
Controls
(n)
Cases
with
minor
allele
(n)
Published safety
concern
FDA warning
corresponding to
safety signal
SNP functional effects concordant with the drug class listed
PCSK9 rs11591147 R46L (validated as loss of function in functional studies)[56] 0.016 PCSK9 inhibitors Hypercholesterolemia (0.68, 0.00076) Spina bifida (5.9, 0.00027) 34 25,268 6 Studies in rats showed significantly reduced PCSK9 levels in the sera of neural tube defect (NTD) pregnancies. Similar reduced PCSK9 levels were found in NTD maternal serum, compared to controls.[57]

Alirocumab can cross the placental barrier during pregnancy.
However,_no effects on embryo-fetal development were found in rats administered alirocumab at a dose 12-fold the maximum recommended human dose.[58]
There are no available data on the use of PCSK9 inhibitors in pregnant women to inform drug-associated risk.[58,59] A pregnancy exposure registry study of alirocumab is currently enrolling patients.[60]
TNF rs3093662 Intronic (validated to lower serum levels in vivo) [61] 0.071 TNF inhibitors Rheumatoid arthritis (0.73, 0.00107) Cellulitis and abscess of leg, except foot (1.37, 0.00062) 720 24,376 128 Studies have found potential risks of serious infection, including pneumonia, cellulitis, and opportunistic infections in RA patients. Other side effects reported include tuberculosis, malignancies, demyelinating disorders, lupus-like disease, psoriasis, and congestive heart failure.[62,63] Increased risk of serious infections cited in the adalimumab brochure[64] (including tuberculosis (TB), bacterial sepsis, invasive fungal infections (such as histoplasmosis), infections due to other opportunistic pathogens, pneumonia, septic arthritis, prosthetic and postsurgical infections, erysipelas, cellulitis, diverticulitis, and pyelonephritis.
PPARG rs1801282 P12A (predicted to increase activity based on our PheWAS data*; SIFT score and rating: 0, deleterious) 0.117 PPAR gamma agonists Type 2 diabetes (0.91, 0.01259) Morbid obesity (1.22, 0.00324) 960 21,083 245 Pioglitazone significantly increased both body weight and body fat in patients with pre-diabetes (ACT NOW trial).[33]

Pioglitazone significantly increased food intake and decreased energy expenditure in mice on a high-fat diet.[65]
FDA drug data indicate pioglitazone alone or combined with other hypoglycemic agents can cause dose-related weight gain from fluid retention and fat accumulation.[66]
ESR1 rs149308960 G160C (predicted to decrease estrogen activity*; SIFT score and rating: 0.021, deleterious) 0.002 Selective estrogen receptor modulators (SERMs) Irregular menstrual cycle/bleeding (4.6, 0.00005)

Delay in sexual development and puberty NEC (11.62, 0.00135)
Subarachnoid hemorrhage (4.36, 0.034) 103 24,396 2 *Tamoxifen has both agonist and antagonist properties, depending on the target organ. Tamoxifen has antagonist properties in the breast and an estrogen-like effect in the uterus, and therefore increases the risk of endometrial hyperplasia, tumors[67], fibroids, polyps. Tamoxifen is also associated with hot flashes, menstrual irregularities, and blood clots[68] However, Raloxifene, another SERM, does not share tamoxifen’s pro-estrogenic effects on the uterus and is associated with LOWER odds of endometrial cancer[69] FDA reports that both tamoxifen[70] and raloxifene[71] can increase risk of stroke or a blood clot in lungs or veins.
Hemorrhage NOS (5.68, 0.01711) 85 24, 370 2
ACE rs4343 Synonymous (predicted to decrease activity based on our PheWAS data*; SIFT score and rating: 1, tolerated; not known to reside within a regulatory region) 0.468 ACE inhibitors Atherosclerosis (1.16, 0.00038) Congenital anomalies of urinary system (0.84, 0.00930) 453 27,795 310 Side effects of lisinopril include: decrease in urine output or urine-concentrating ability, cloudy urine.[72] FDA warns of various urinary and renal side effects.[73]
PLA2G2A rs34568801 R143H (predicted to decrease activity based on our PheWAS data*; SIFT score 0.135, tolerated) 0.008 sPLA2 inhibitors Peripheral vascular disease (0.36, 0.00298) Primary hypercoagulable state (2.57, 0.01611) 161 22,340 7 Varespladib trial (VISTA-16) for prevention of recurrent cardiac events in ACS was halted at interim analysis when a significantly increased risk of MI was detected; investigators postulated that varespladib may have induced a prothrombotic state, potentially through ablation of both pro-atherogenic and antiatherogenic sPLA2 isoforms,[74] concordant with PheWAS results suggesting both positive and negative effects.

Human sPLA2 exerts anticoagulant activity in plasma[75] and inhibits prothrombinase activity independent of its lipolytic activity.[76]
GRIN2B rs1806191 Synonymous (predicted to decrease activity based on our PheWAS data*; SIFT score and rating: 0.564, tolerated; not known to reside within a regulatory region) 0.481 NMDA receptor antagonists Major depressive disorder (0.895. 0.02133) Symbolic dysfunction (2.48, 0.01032) 20 23,549 10 Symbolic dysfunction has been reported as a neuropsychiatric feature among individuals experiencing psychotic disorders.[77,78] Ketamine, an NMDA receptor antagonist, can induce several neuropsychiatric symptoms, including acute and severe impairments of working, episodic and semantic memory as well as psychotogenic and dissociative effects. [79]

Frequent users have reported flashbacks and impaired memory. In overdose, can cause slurred speech.[80]
GRIN2A rs78631453 T141M (predicted to decrease activity based on our PheWAS data*; SIFT score 0.275 tolerated) 0.001 GluN2a antagonists Aphasia (5.28, 0.00554) Paroxysmal tachycardia, unspecified (3.4, 0.00048) 1206 14,985 11 In rats, memantine causes sleep disturbances including increases in sleep latency and motor activity.[81]

Memantine decreases probable REM sleep behavior in patients with dementia with Lewy bodies and Parkinson’s disease dementia.[82] Cardiovascular risk associated with memantine have also been described in the literature.[83,84]
Adverse reactions reported in the memantine prescribing information include leukopenia (including neutropenia), hypertension, somnolence.[85]
Pulmonary heart disease (2.94, 0.00183) 1363 21,277 10
Sleep disorders (2.81, 0.0061) 1903 20,523 14
SNP functional effects opposite to the drug class listed
HMGCR rs3846663 A1912G (predicted to increase activity based on our PheWAS data*; SIFT score and rating 0.502, tolerated) 0.376 Statins Hypercholesterolemia (1.10, 0.00032) Polyneuropathy in diabetes (0.822, 0.00115) 648 21,105 75 Studies have demonstrated an association between statin use and increased risk of developing diabetes in both low risk[86] and in high risk patient populations[87]. Review of current medical literature, clinical trial data, and reports of adverse events has prompted the FDA to add warnings for increased risk of developing diabetes to the labels of statins. [88]
Type 2 diabetes with neurological manifestations (0.868, 0.00523) 932 21,205 114
Diabetic retinopathy (0.832, 0.00953)
SLC6A4 rs6355 G56A (validated to increase serotonin transport in functional studies)[89] 0.021 Selective serotonin reuptake inhibitors (SSRIs) Antisocial/borderline personality disorder (2.43, 0.02407) Peptic ulcer (excl. esophagus) (0.3, 0.00160) 531 28,452 7 SSRI usage is associated with uncomplicated peptic ulcers[90]. H. pylori infection potentiates the risk of upper gastrointestinal bleeding in patients using SSRIs.[91] Product brochure reports possible side effect of peptic ulcers for fluoxetine hydrochloride.[92]
PDCD1 rs143359677 A263T (predicted to increase signaling based on our PheWAS data*; SIFT score and rating: 0.177, tolerated) 0.001 anti-PDL1 antibodies Polycythemia vera (12.71, 0.00007) Pain in joint (0.1672, 0.00001) 8332 18,123 4 In a pooled analysis of patients with advanced melanoma, nivolumab treatment resulted in fatigue, diarrhea, and skin-related issues such as rashes and pruritis, with some reports of arthritis, GI, endocrine, or hepatic side effects.[93,94] FDA-reported nivolumab side effects include pain in muscles and joints, changes to skin color and other skin changes.[32]
Myeloproliferative disease (5.94, 0.000850), and

Pancreatic cancer (6.57, 0.01249)
Osteoarthrosis (0.0862, 0.00059) 4655 22,907 1
Degenerative skin conditions and other dermatoses (0.1318, 0.01246) 3090 23,254 1
ATP4A rs139075511 P240H (predicted to increase activity based on our PheWAS data*; SIFT score and rating: 0.038, deleterious) 0.012 Proton pump inhibitors (PPIs) Heartburn (2.58, 0.0156) Diverticulosis (0.57, 0.01331) 1464 20,989 21 Side effects reported include digestive problems such as abdominal pain, constipation, diarrhea, and nausea.[95,96]
SLC5A2 rs150546732 1433V (predicted to increase activity based on our PheWAS data*; SIFT score and rating: 0.044 deleterious) 0.005 SGLT2 inhibitors Type 2 diabetes with ophthalmic manifestations (2.37, 0.01903) Other disorders of the kidney and ureters (0.51, 0.01901) 2,321 18,642 13 Empagliflozin as an SGLT2 inhibitor may help prevent acute kidney injury**.[97,98] Empagliflozin side effects include increased urination, UTIs, urinary discomfort, acute kidney injury and impairment of renal function[99] [100] and bone fracture.[29]
SLC5A2 rs150546732 1433V (predicted to increase activity based on our PheWAS data*; SIFT score and rating 0.044, deleterious) 0.005 SGLT2 inhibitors Type 2 diabetes with ophthalmic manifestations (2.37, 0.01903) Stress fracture (5.5, 0.01991) 39 19,123 2 Empagliflozin side effects include increased bone fracture, thus the directionality of the odds ratio is incorrect for this PheWAS result.[29]
CACNB2 rs149253719 S214T (predicted to increase activity based on our PheWAS data*; SIFT score and rating: 0.11, tolerated) 0.001 Calcium channel blockers Angina pectoris (3.2, 0.00294) None identified Common side effects of calcium channel blockers include drowsiness, headache, upset stomach, flushing, and ankle swelling.[101] More serious reactions can include hypotension, bradycardia or tachycardia, and heart failure.[102]
TBXA2R rs200445019 T399A (in vitro validated gain of function in functional studies[103]) 0.003 Thromboxane A2 receptor (TPR) antagonists Chronic venous hypertension (31.89, 9.1 x10−6) None identified The most frequently reported adverse events associated with ifetroban administration in randomized clinical trials were headache (8.4%) and musculoskeletal pain (5%). Not yet approved.
PTGER2 rs139552094 C83G (predicted to decrease activity based on our PheWAS data*; SIFT score and rating: 0.001, deleterious) 0.002 Prostaglandin analogues Ulcer of esophagus (6.28, 0.01171)

Decubitus ulcer (2.75, 0.00603)
None identified Fever and chills, diarrhea, and vomiting, as well as uterine contractions, are short term effects mentioned in the literature. Fetal abnormalities where abortion is unsuccessful are also well-documented.[104] Package insert notes adverse GI effects including diarrhea, nausea, and abdominal pain.[105]

Key:

*

No functional studies of this SNP have been identified to date; SNP effects are inferred based on our PheWAS data using existing indications for the drug/therapeutic agent.

**

Conflicting data in the literature and in FDA warning related to prevention of AKI versus potential risk of AKI. MAF minor allele frequency in the BioVU Exomechip population; SIFT scores drawn from dbSNP.[106]

Inline graphic True positive

Inline graphic False negative

Inline graphic True negative

3.5. Validation of safety-related phenotypes.

For gene-drug class pairs that validated on therapeutic indication, we then inferred the directionality of the SNP’s effect based on its effect on the encoded protein (e.g. gain or loss of function) and the orientation of validating phenotype odds ratios as described above, in order to confirm known adverse effects of drugs within PheWAS results. We used an inclusive operational definition for potential safety signals identified by previous findings, conducting an evidence review for each gene-drug class pair and integrating multiple sources of truth including the package insert, primary literature, FDA documentation, and database resources. To further explore disease characteristics, in some cases we also performed chart reviews of SNP carriers to either confirm the diagnosis that was associated with the ICD-9 code used in the PheWAS or to further elucidate specific diagnoses, comorbidities, or other features.

3.6. Anchoring SNP directionality on known indications and adverse effects of the therapeutic agent.

We categorized gene-drug class pairs based on their relationships to each other. For example, if the SNP functional effects were similar to the drug class action, these were considered to have the concordant directionality. If the functional effect of the SNP was opposite to the drug class, they were considered to have opposite directionality (Figure 2). This anchoring enabled us to make further inferences about directionality of drug effects using the odds ratios reported for various phenotypes within each PheWAS dataset.

Fig. 2.

Fig. 2

Categorization of gene-drug class pairs based on their impact on encoded protein (for SNP) or protein target (for drug class)

3.7. Statistical analysis.

The PheWAS association calculations were performed in PLINK using a logistic regression model, limited to individuals of European ancestry due to the sample size constraints in other ancestry groups as noted above; covariates included sex and age. This model was applied to calculate case and control genotype distributions, and associated allelic P-value and allelic odds ratio (OR).

4. Results

Application of the SA-PheWAS method results in 16 gene-drug class pairs, which are listed and further described in Tables 1 and 2 and below.

4.1. Validated safety effects correctly identified.

Application of our SA-PheWAS method identified safety signals that were concordant with published safety information (true positives), in 13 of 16 gene-drug class pairs queried. Safety signals identified using the SA-PheWAS method range from relatively benign side effects, e.g., skin rash (ATP4/lansoprazole) to more severe side effects, e.g., risk of serious infection (as for TNF/adalimumab) or spina bifida (PCSK9/alirocumab). Notably, in many cases our results were supported not only by published safety information but also published preclinical or clinical data (Table 2), illustrating the utility in triangulation with published literature for demonstrating plausibility.

4.2. Validated safety effects incorrectly identified.

In one case, SA-PheWAS identified a safety signal that was discordant with published safety information.While our method correctly identified an association between SGLT2 inhibition and weight gain (true positive), PheWAS also identified a signal in opposition to known risks of these agents. The odds ratio directionality in our data suggests a protective effect of SGLT2 inhibition on risk of stress fracture; however, this conflicts with the known increased risk of osteoporosis with use of the SGLT2 inhibition.[29] In other words, our data suggests no safety risk when in fact, published data shows increased risk, therefore we classify this case as a false negative.

4.3. No validated safety signal identified.

Additionally, in three cases, our method identified no safety signals concordant with those expected from existing drug safety literature on our agents of interest (false negatives).

4.4. Descriptive examples of SA-PheWAS applicability.

Tables 1 and 2 summarize our findings regarding preliminary utility of SA-PheWAS for identifying safety signals for drug and biologic therapies. Here, we discuss several examples to illustrate the application of this method to several genes and associated therapeutic classes, including PDCD1 (PD-1 inhibitors), PPARG (PPAR agonists), and PLA2G2A (sPLA2 inhibitors).

PDCD1 encodes the immunoglobulin superfamily member programmed cell death protein 1 (PD-1). PD-1 inhibitors, a type of checkpoint inhibitor, have gained attention for the treatment of several cancer types.[30] PheWAS confirmed an association between SNP Ala263Thr, which is found in the cytoplasmic domain of PDCD1, and a constellation of phenotypes related to cancer, including polycythemia vera (OR = 12.71, p = 0.00007), myeloproliferative disease (OR = 5.94, p = 0.000850), and pancreatic cancer (OR = 6.57, p = 0.01249) (Table 2). These data indicate increased risk for developing these cancer types and suggest an increase in activity of PD-1, based on concordance with the known indications for use of PD-1 inhibitors. PheWAS also yielded several phenotypes related to inflammatory conditions that align with previous data in the literature regarding possible safety signals associated with PD-1 inhibitors,[31,32] including pain in joint (OR = 0.16, p = 0.00001), osteoarthrosis (OR = 0.08, p = 0.00059), and degenerative skin conditions and other dermatoses (OR = 0.1318, p = 0.01246). Of note, pain in joint was the top PheWAS result for our variant of interest.

Peroxisome proliferator activated receptor (PPAR) gamma, encoded by gene PPARG, is targeted by agonists for therapeutic effects through reduction of hyperlipidemia and blood sugar in metabolic syndrome. Variant Pro12Ala in PPARG was shown to be associated with reduced risk of type 2 diabetes in our PheWAS data (OR 0.91, p=0.01259), suggesting the variant’s effects are similar to those of an agonist. Pro12Ala was also associated with increased risk of morbid obesity (OR 1.22, p=0.00324), consistent with known risks of pioglitazone in increasing fat accumulation and body weight in humans.[33]

PheWAS signals also showed concordance with evidence regarding conflicting effect signals in the literature. For example, a trial of varespladib (an sPLA2 inhibitor) for prevention of recurrent cardiac events in acute coronary syndrome was halted at interim analysis when a significantly increased risk of myocardial was detected; investigators postulated that varespladib may have induced a prothrombotic state, potentially through ablation of both pro-atherogenic and antiatherogenic sPLA2 isoforms.[63] Consistent with these potential conflicting positive and negative effects of sPLA2 inhibition, PheWAS data for PLA2G2A variant Arg143His indicated with reduced risk of peripheral vascular disease (OR 0.36, p=0.00298), but increased risk of primary hypercoagulable state (OR 2.57, p=0.01611).

In sum, these examples, along with additional results described in Table 2, demonstrate that our novel approach, SA-PheWAS—Safety Ascertainment using PheWAS-- can replicate published safety information across multiple drug classes.

5. Discussion

These results demonstrate the preliminary of our SA-PheWAS platform for identification of safety signals using human genetic data. The use of big data resources, including genotyping data and large EHR datasets such as those integrated in our approach, promises a new future in development and application of techniques for exploring safety signals. Targeted, disease-agnostic analyses of these datasets can reveal how SNPs affect protein changes, and thus further illuminate the ways in which drugs targeting these proteins may affect physiology and produce either beneficial effects or adverse reactions.

We recently reported on a successful application of this novel method for assessment of potential safety signals, in which we discovered associations between spina bifida and other cranial, skeletal, and neurological adverse effects and PCSK9 inhibition.[34] These findings, concordant with other published observations,[35] allowed us to suggest new mechanisms to further assess the long-term safety of PCSK9 inhibitor drugs. Here, we build on these preliminary findings to further demonstrate the utility of the SA-PheWAS method for replicating known published safety information across a larger sample of drug classes. These findings could prove useful in multiple ways. First, in the setting of an extremely strong safety warning for a severe effect which is also concordant with other information sources, circumstances might warrant a major change to clinical development via modifications to medicinal chemistry. Second, our SA-PheWAS method May help to better define inclusion and exclusion criteria to inform which subpopulation of patients to enroll in a clinical trial to maximize the benefit-risk profile. Third, SA-PheWAS results may help direct specific areas for proactive monitoring and/or development of Risk Evaluation and Mitigation Strategies[36] or additional management approaches. Importantly, intentional monitoring patients for these potential drug-related side effects would occur during both development and post-market studies, theoretically enabling the anticipation and prevention or mitigation of unintended consequences of a given drug while preserving its chance for therapeutic benefit.

The current ability to predict safety signals earlier in the drug development process is constrained by the limitations of existing preclinical models to accurately represent and predict human biology[37] as well as by the relatively small size of early human trials. To improve detection of safety issues at an earlier stage and thus improve safety in clinical investigations and decrease drug development costs requires an in-depth understanding of therapeutic and biologic pathways, including pharmacological responses, drug pharmacokinetics, and exposure at the target site, as well as the identification of specific patient subpopulations most likely to demonstrate a viable benefit-risk profile, or those at particularly increased risk of untoward consequences.[1] A recent increase in the drug development success rate has been partially attributed to focusing drug development on drugs with a known mechanism based in human genetics[38] as well as the increased use of biomarkers for targeting drugs to those individuals most likely to benefit.[38,39] Similarly, we believe that the success rate can be further elevated by safety signaling detection methodologies such as SA-PheWAS, which uses PheWAS analysis in conjunction with in-depth literature searches to identify significant associations between safety signals and drugs; we and others have explored the utility of PheWAS for this application in previous work.[34,40] Combining PheWAS analyses with assessment of biologic plausibility using the literature can help further inform design of hypotheses and practices related to improving drug safety.

Other members of the scientific community have also sought new ways to ascertain the safety of new drugs prior to clinical testing. Investigators have approached this problem from several angles that differ somewhat from our method, yet at the same time reinforce our essential process. For example, various in silico methods have been developed to identify safety signals. These include a predictive pharmaco-safety model developed by Cami et al that integrates a network of previously-identified drug-adverse event (ADE) relationships from a drug safety database with pharmacological information to predict potential adverse events not yet recognized.[41] Several more recent in silico, network-based proximity approaches to drug repurposing and detection of adverse effects have been reported. One such method integrates protein-protein interaction, drug-disease and drug-target association data to approximate the therapeutic effects of drugs, both for repurposing drugs and for identifying adverse effects.[42] Another similar interactome-based approach focuses on predicting efficacy and adverse interactions for drug combinations.[43] Additionally, recent machine-learning methods include an attempt to construct a drug-ADE network that integrates information regarding a drug’s target proteins to enhance the predictability of ADEs,[44] a deep learning framework that isolates molecular substructures related to ADRs,[45] and the development of an algorithm (DrugClust) that clusters drugs on common features, then uses Bayesian scores to predict drug side effects.[46] Wang et al mined data in DrugBank, the database containing molecular, mechanism, interaction, and target information, to discover both new drug indications and side effects simultaneously.[47] Other approaches to ADR prediction have harnessed a random-walk computational algorithm of protein-ADR associations[48] and a naive Bayesian model of gene-ADR associations.[40,49]

Other methods combine genotype and phenotype data, similar to SA-PheWAS. PathFX, for example, is an algorithm created to identify safety signals based on a constructed “drug pathway” which is annotated with the target genes and phenotypes/diseases associated with the drugs.[50] It differs from SA-PheWAS, however, in that the data they examine are not derived directly from patients. Another approach similar to SA-PheWAS, but which focused on predicting cardiovascular side effects of diabetes medications, used a targeted exome sequencing and genotyping approach followed by in silico testing to identify associations between gene variants that are drug targets for obesity or type 2 diabetes (T2D) and potential cardiovascular side effects.[51] Another approach, proposed by Walker et al, uses Mendelian randomization with potential genetic variants identified from expression quantitative trait loci (eQTL) catalogs in conjunction with evidence synthesis to predict unintended drug effects—both before and after drug approval.[52] It is clear from these efforts that there is a collective interest in developing methods to further ensure safety in drug development and eventual use in humans.

Through integration of genetic and real-world clinical data, the SA-PheWAS approach builds upon other currently described methods for evaluating the safety of new drugs. Unlike purely in silico approaches, SA-PheWAS employs both in silico and in vivo methods, relying on data from actual human diseases and markers of their pathophysiology as diagnosed and recorded in EHRs. Another differentiating benefit of SA-PheWAS is that it is disease agnostic—the PheWAS analysis identifies all diseases and conditions associated with variation in a specific gene product. This characteristic allows for not only the validation of known adverse events associated with approved drugs or biologics, but also identification of novel safety signals that have not been previously reported across a cohort of patients similar to those who may actually receive the drug.

5.1. Limitations.

The SA-PheWAS method has several limitations. Our primary focus of this application has been to identify drug’s known adverse events within our PheWAS data, a far easier task as compared with identification of novel safety signals. Our method also rests on some assumptions; though SNPs do not fully recapitulate the effects of acute (or even chronic) drug exposure, we interpret the effects of carrying the SNP to be similar to lifetime use of the drug. In this regard, we view our data in two general categories: congenital effects, which could be indicative of in utero exposure, and diseases that are indicative of exposure across the full lifespan of the patient (e.g., diabetes). We also recognize that our current dataset may be affected by sample size issues related to very rare adverse effects, a type of safety signal which has prompted some FDA withdrawals in the past; however, our ability to identify signals within an organ or type of disease associated with rare effects, such as cardiotoxicity, infectious complications, or hepatic disease, may partially mitigate this issue by prompting further scrutiny within the affected system. False positives are a known component of the method, although corrections for multiplicity can be applied, and our inclusion of assessment of concordance with data in the primary literature is intended to further help eliminate extraneous signals. In addition, this exploratory report is intended as a first foray into this area, and additional work is needed to identify any spurious associations. Further confirmatory work including replication in larger datasets, directed safety studies and/or other assessments must be incorporated into understanding SA-PheWAS data for more accurate assessment of signals. Furthermore, therapies that have multiple targets are not amenable to this method in its current form. We also acknowledge that observations from our European ancestry dataset may have limitations in generalizability; efforts are underway at our institution to increase the diversity of our genotyping data, which will enable future work applying this approach in additional populations. We also did not identify the full spectrum of safety issues in the results for a single SNP, likely due to a range of factors including variability in biologic effects of our target variant and power limitations related to prevalence of phenotypes associated with some adverse effects. For example, while a number of adverse effects related to immune dysfunction are also reported in association with PD-1 inhibitors,[53] our approach primarily detected inflammation. Further development of methods to combine information across multiple SNPs may help mitigate some of these issues. Finally, while all conditions and diseases across the phenome form the bases for safety scanning, these would have to be recorded in the medical record to be available for these methods. It is possible that some are not recorded in a way that directly translates into a clear safety signal. Indeed, the use of ICD codes to elicit phenotyping data entails its own limitations, including code ambiguity that may lead to coding errors.[54] In addition, it is important to recognize that some conditions, such as metastatic cancer, may be of such severity that usage of a therapy may nevertheless be indicated if the benefit exceeds even identified risks.

Our method infers directionality of SNP effects based on the phenotypes and odds ratios observed in PheWAS results; though we also incorporate existing evidence regarding variant and protein effects, it is important to acknowledge that this approach can aid in generating hypotheses for further evaluation, rather than inferring causal associations. It is also important to note that the SA-PheWAS method failed to identify safety signals in 3 of 16 gene-drug pairs analyzed, despite known safety issues existing for many of these drugs. These results suggest the potential for our method to generate false negatives. Possible explanations include off target effects of the drug, which have been documented to contribute to safety issues.[55] These effects can include alterations in drug metabolism or cell cycle genes/proteins, however many off target effects remain unknown. Together, these data highlight the importance of incorporating our method into existing processes for identifying safety signals, rather than using it as the sole source of information.

5.3. Future directions.

We envision the SA-PheWAS model being especially useful in the early stages of drug development. For example, safety signals identified by SA-PheWAS may contribute to go/no-go decisions earlier in the drug development process, saving time and money. Further, SA-PheWAS is poised to enable the detection of possible safety effects of new therapeutic agents before the first dose of an investigational new drug is ever given or a new preclinical program is launched; future work into this way of applying our technique will inform our ability to determine the optimal points in the development process for employing this safety signal detection approach. We also note that the use cases discussed herein typically leverage a single SNP’s effects and its strong validation signal to anchor directionality and effect. As our genotyping data expands in volume and breadth, and as our collective evidence base on SNP functional effects grows, we expect to be able to utilize these additional data to further triangulate signals and strengthen the application of this model to safety signal detection. Future confirmatory work in other datasets (e.g. UK Biobank) and in follow-on studies will also add to our understanding of ways in which this approach contributes value to the discovery of safety-related issues. Our method, when used in conjunction with existing methods including prospective experiments to test the signal hypotheses, may also allow for better understanding of safety issues identified throughout clinical trials and post-approval surveillance and ultimately improved drug prescribing information.

6. Conclusions

Drug safety is as critical as drug effectiveness. Estimation of risk-benefit is a crucial stage in early therapeutic development, and relies on a thorough understanding of potential safety issues. Novel and effective methods for predicting adverse side effects associated with new drugs are needed, not only to increase the efficiency and reduce costs in the drug development process, but also to protect public health. The past decade has witnessed a plethora of data extraction methods proposed and used to detect drug safety signals, supported by increasing availability of big data resources and associated analytical techniques. A method such as SA-PheWAS triangulates in silico methods with chart review and synthesis of literature-based evidence regarding already-known protein pathways to identify safety signals. Leveraging this approach, we can closely assess and monitor the safety and long-term tolerability of therapeutics as they move further along the development pipeline into human testing and expanded clinical use, protecting patients receiving these drugs in a more informed way.

Key Points.

  • We created a novel approach, SA-PheWAS – Safety Ascertainment using Phenome-Wide Association Studies (PheWAS) - for identifying safety signals associated with new therapeutic agents

  • Phenome-wide association studies in 29,722 patients of European ancestry confirmed known adverse effects in a variety of drug therapeutic classes.

  • Using phenotypic effects of genetic variants may strengthen the identification of potential safety signals associated with drug therapies.

Acknowledgments

Funding

The project described was supported by CTSA award No. UL1 TR002243 from the National Center for Advancing Translational Sciences. Its contents are solely the responsibility of the authors and do not necessarily represent official views of the National Center for Advancing Translational Sciences or the National Institutes of Health.

Footnotes

Conflict of Interest

Vanderbilt University Medical Center (VUMC) has licensed PheWAS technology to Nashville Biosciences, a VUMC-owned entity. Dr. Denny receives a portion of those royalty payments. Rebecca Jerome, Meghan Joly, Nan Kennedy, Jana Shirey-Rice, Dan Roden, Gordon Bernard, Kenneth Holroyd, and Jill Pulley have no conflicts of interest that are directly relevant to the content of this study.

Compliance with Ethical Standards

This project was reviewed and received a non-human subjects research determination from the Vanderbilt University Institutional Review Board (IRB number 151121).

Data Sharing

The data used and analyzed during the current study are available from the corresponding author on reasonable request.

References

RESOURCES