Bayesian Pathway Analysis for Complex Interactions

James W Baurley; Anders Kjærsgaard; Michael E Zwick; Deirdre P Cronin-Fenton; Lindsay J Collin; Per Damkier; Stephen Hamilton-Dutoit; Timothy L Lash; Thomas P Ahern

doi:10.1093/aje/kwaa130

. 2020 Jul 8;189(12):1610–1622. doi: 10.1093/aje/kwaa130

Bayesian Pathway Analysis for Complex Interactions

James W Baurley, Anders Kjærsgaard, Michael E Zwick, Deirdre P Cronin-Fenton, Lindsay J Collin, Per Damkier, Stephen Hamilton-Dutoit, Timothy L Lash, Thomas P Ahern ^✉

PMCID: PMC7822643 PMID: 32639515

Abstract

Modern epidemiologic studies permit investigation of the complex pathways that mediate effects of social, behavioral, and molecular factors on health outcomes. Conventional analytical approaches struggle with high-dimensional data, leading to high likelihoods of both false-positive and false-negative inferences. Herein, we describe a novel Bayesian pathway analysis approach, the algorithm for learning pathway structure (ALPS), which addresses key limitations in existing approaches to complex data analysis. ALPS uses prior information about pathways in concert with empirical data to identify and quantify complex interactions within networks of factors that mediate an association between an exposure and an outcome. We illustrate ALPS through application to a complex gene-drug interaction analysis in the Predictors of Breast Cancer Recurrence (ProBe CaRe) Study, a Danish cohort study of premenopausal breast cancer patients (2002–2011), for which conventional analyses severely limit the quality of inference.

Keywords: Bayesian analysis, breast neoplasms, pharmacogenetics, tamoxifen

Abbreviations

ABC: ATP-binding cassette (P-glycoprotein)
ALPS: algorithm for learning pathway structure
ATP: adenosine triphosphate
CYP: cytochrome P450
ER: estrogen receptor
LASSO: least absolute shrinkage and selection operator
MCMC: Markov chain Monte Carlo
SULT: sulfotransferase
UGT: uridine 5ʹ-diphosphoglucuronosyltransferase

Analysis of complex pathways

The enormous amount of information generated by modern data collection permits investigation of complex pathways that mediate effects of social, behavioral, and molecular exposures on health outcomes (1–3). For example, pharmacoepidemiology studies often evaluate the role of polymorphic metabolic enzymes in drug response so that dosing regimens can be personalized (4). Evidence from preclinical studies informs hypotheses about the structure, function, and health effects of these metabolic pathways, but such information is rarely represented within analyses. Conventional analytical approaches (e.g., user-specified multivariable regression models) struggle in the face of high-dimensional data, leading to considerable risk of both false-positive and false-negative inferences (5). Particularly troublesome are estimation of nonadditive effects among pathway components (interactions) and estimation of net effects of combined pathway elements. The number of candidate models for complex molecular data quickly becomes intractable. Pathway analysis techniques (e.g., structural equation modeling (6), gene set enrichment analysis (7), penalized regression models (8, 9), and Bayesian hierarchical models (10)) offer solutions to some of these challenges but do not incorporate prior information about the structures driving pathway interactions. Bayesian network methods improve on earlier techniques by explicitly incorporating prior knowledge about pathway structure into the model (11, 12).

Herein we describe a Bayesian pathway analysis method that not only allows specification of a prior for the pathway structure but also proposes and evaluates novel pathway structures using observed data. The method thus empirically derives pathway structures that are best supported by the data. The method is an upgraded version of the algorithm for learning pathway structure (ALPS) (12), with features to improve its utility for epidemiologists. Here we describe the ALPS method, provide an overview of its new features, and illustrate its application using simulated genetic pathway data.

Tamoxifen resistance in breast cancer

About three-quarters of breast tumors overexpress estrogen receptor (ER) (13). When estrogens bind to ER in breast cancer cells, the receptor complex initiates transcription of genes that promote survival and proliferation (14). The drug tamoxifen and its metabolites compete with estrogens for binding to ER, but instead neutralize the receptor (15). Five years of tamoxifen treatment approximately halves the risk of recurrence in patients with ER-positive breast tumors (16). Unfortunately, response to tamoxifen therapy varies between women with otherwise similar prognostic profiles, and some tamoxifen-treated women experience a recurrence of their breast cancer (17). This variation in response suggests that patient characteristics other than tumor ER expression can influence drug response (18, 19).

Tamoxifen metabolic pathway

Tamoxifen has relatively low affinity for ER in its administered form (20). Its pharmacological activity is potentiated by in vivo production of higher-affinity metabolites (20, 21). Pathways of tamoxifen metabolism and transport are summarized in Figure 1.

Inline graphic — The tamoxifen metabolic pathway. Tamoxifen and its metabolites appear as black boxes, which include relative estrogen receptor (ER)-binding affinities (20, 21). Arrows denote transitions between compounds and interactions with ER (“ER Signal”). Boxes overlying the arrows show the polymorphic genes encoding the enzymes involved in metabolism or transport of tamoxifen at each transition. ABC, ATP-binding cassette (P-glycoprotein); ATP, adenosine triphosphate; CYP, cytochrome P450; NDM-Tam, N-desmethyltamoxifen; 4-OH-Tam, 4-hydroxytamoxifen; SULT, sulfotransferase; Tam-G, tamoxifen glucuronide; Tam-S, tamoxifen sulfate; UGT, uridine 5-diphosphoglucuronosyltransferase.

The metabolites 4-hydroxytamoxifen and 4-hydroxy-N-desmethyltamoxifen (endoxifen) bind ER with about 100-fold higher affinity than native tamoxifen (17). The phase I cytochrome P450 (CYP) enzymes (CYP1A1, CYP2B6, CYP2C9, CYP2C19, CYP2D6, and CYP3A4/5) catalyze the oxidation and demethylation reactions that form these metabolites (22). Sulfonated and glucuronidated metabolites of tamoxifen (tamoxifen sulfate and tamoxifen glucuronide, respectively) are formed by phase II reactions catalyzed by sulfotransferases (SULTs) (SULT1A1 and SULT1E1) and uridine 5ʹ-diphosphoglucuronosyltransferases (UGTs) (UGT2B15 and UGT2B7), respectively (23–25). Tamoxifen sulfate and tamoxifen glucuronide have little or no ER-binding affinity and are rapidly eliminated (26, 27). The transporter proteins ABCB1, ABCG2, and ABCC2 (adenosine triphosphate (ATP)-binding cassette (P-glycoprotein) (ABC) B1, G2, and C2, respectively) do not participate in tamoxifen metabolism (and so appear separately from the metabolizing enzymes in Figure 1); rather, they influence the degree to which tamoxifen metabolites interact with ER by mediating their transport across cell membranes (18).

Genetic variation in the tamoxifen pathway

The genes encoding tamoxifen metabolic enzymes and transporters harbor functional polymorphisms (Table 1), which may affect formation and clearance of active metabolites and access to tumor ER—and hence patient response to therapy. These relatively common germline genetic variants are promising candidates for predicting patient response before initiation of tamoxifen therapy.

Table 1.

Genotyped Variants in the Tamoxifen Metabolism and Transport Pathway

Function	Gene	Reference SNP ID No.	Alias	TaqMan ^a Kit No.	Minor Allele	Minor Allele Frequency
Transporters	ABCB1	rs10248420	abc1	C-30375194-10	G	0.35
		rs1045642	abc2	C-7586657-20	G	0.41
		rs1128503	abc3	C-7586662-10	A	0.43
		rs2032582	abc4	C-11711720C-30	A	0.33
	ABCC2	rs3740065	abc5	C-22271640-10	G	0.20
		rs717620	abc6	C-2814642-10	T	0.20
		rs8187710	abc7	C-22272567-30	A	0.08
	ABCG2	rs1564481	abc8	C-9510430-10	T	0.35
		rs2231164	abc9	C-15922479-10	C	0.43
		rs2622604	abc10	C-9510352-10	T	0.23
Phase I metabolism	CYP1A1	rs1048943	cyp1	C-25624888-50	C	0.08
	CYP2B6	rs3745274	cyp2	C-7817765-60	T	0.24
		rs8192709	cyp3	C-2818162-20	T	0.06
	CYP2C19	rs12248560	cyp4	C-469857-10	T	0.17
		rs4244285	cyp5	C-25986767-70	A	0.14
	CYP2C9	rs1799853	cyp6	C-25625805-10	T	0.17
		rs1057910	cyp7	C-27104892-10	C	0.10
	CYP2D6	rs1065852	cyp8	C-11484460-40	A	0.21
		rs16947	cyp9	C-27102425-10	A	0.34
		rs3892097	cyp10	C-27102431-D0	T	0.19
		rs28371706	cyp11	C-2222771-A0	A	0.06
		rs28371725	cyp12	C-34816116-20	T	0.11
	CYP3A4/	rs10273424	cyp13	C-29554473-10	A	0.11
	CYP3A5	rs776746	cyp14	C-26201809-30	T	0.38
Phase II metabolism	SULT1A1	rs1042157	sul1	Custom	C	0.30
		rs1801030	sul2	Custom	G	0.11
		rs9282861	sul3	Custom	G	0.23
	SULT1E1	rs3775775	sul4	C-27479907-10	G	0.14
		rs3775778	sul5	C-198529-20	T	0.27
	UGT2B7	rs7434332	ugt1	C-45181112-10	G	0.33
	UGT2B10	rs294769	ugt2	Custom	A	0.48
	UGT2B15	rs1902023	ugt3	C-27028164-10	C	0.47

Open in a new tab

Abbreviations: ABC, ATP-binding cassette (P-glycoprotein); ATP, adenosine triphosphate; CYP, cytochrome P450; ID, identification; SNP, single nucleotide polymorphism; SULT, sulfotransferase; UGT, uridine 5 Inline graphic -diphosphoglucuronosyltransferase.

^a ThermoFisher Scientific, Waltham, Massachusetts.

Pathway analysis for complex observational data

There are at least 15 enzymes involved in tamoxifen metabolism and transport, which suggests the possibility of up to 32,768 combinations of full function and reduced function enzyme sets. Each enzyme may harbor more than 1 functional polymorphism, which further enhances the complexity. A human subjects study of sufficient sample size to allow an agnostic evaluation of all possible combinations is not feasible, so comprehensive analysis of the tamoxifen metabolic pathway requires a novel statistical approach. ALPS provides a solution for analyzing this pathway while avoiding the limitations of conventional analysis (12).

METHODS

Algorithm for learning pathway structure

ALPS was originally developed as described by Baurley et al. (12) and required programming proficiency in C++. ALPS requires 2 main inputs: 1) the observed data and 2) a map of the presumed structure and relationships between nodes comprising the pathway under study, referred to as the “prior pathway” ( Inline graphic ). In our example, the tamoxifen metabolic pathway provides the prior, but social, behavioral, or other biological knowledge could easily inform prior pathways for other topics. ALPS uses the prior pathway to guide exploration of new structures using the observed data and standard Markov chain Monte Carlo (MCMC) methods (28). Figure 2 illustrates the elements and parameters of an ALPS analysis in the context of the tamoxifen pharmacogenetic analysis described below. The following sequence summarizes the steps in 1 iteration of ALPS:

Select a random spot in the prior pathway to begin the MCMC search process.
Propose a change to the current pathway structure () such that a risk factor is added, deleted, or swapped.
Compute the marginal likelihood using the newly proposed structure and the observed data.
Accept or reject the new structure with Metropolis-Hastings probability.
If the new structure is rejected, restore the previous structure and parameter values. Otherwise, proceed with the new structure and its associated parameter values.
Repeat for a specified number of iterations.

We made the following upgrades to ALPS for improved performance and ease of use by the epidemiologic community:

ALPS was rewritten for the R programming language (R Foundation for Statistical Computing, Vienna, Austria) with improved reporting, visualization, and accessibility.
The algorithm now allows “swap” moves, in addition to “add” and “delete” moves, during the MCMC process. In a “swap” move, 1 risk factor in the current tree can be replaced by a risk factor not currently in the tree. This allows ALPS to propose moves without an increase or decrease in dimensionality, improving the efficiency of the algorithm.
The algorithm now allows users to encode relationships between multiple features of a hypothesized pathway, and allows grouping of these features according to shared characteristics (e.g., multiple polymorphic enzymes can be grouped together by gene family, by the type of reaction they catalyze, or in any other manner).
The method now supports modeling with Cox proportional hazards (including tied event-time data), logistic, and linear regression models, thus covering the most common regression models for epidemiologic analysis.

Prior pathway specification

Existing biological, social, and behavioral knowledge is encoded for ALPS through specification of the prior pathway. Table 2 illustrates coding of the prior pathway for the tamoxifen pharmacogenetics example. The process begins with coding all of the individual features that comprise the pathway of interest. Each feature is assigned a unique coordinate and is defined by an input, an output, and any modifiers that connect the input and output. In the first row of Table 2, the direct path from the input “tamoxifen” to the output “ER signal”—seen at the top of Figure 1—is assigned the coordinate of (1, 1). Only 1 gene, ABCG2, acts upon this edge in the pathway; since no other step lies along the path, it has no modifiers. All of the genetic variants in the ABCG2 gene (abc8, abc9, and abc10, as listed in Table 1) are then assigned to the (1, 1) coordinate in the prior specification. Users may also specify “many-to-many” relationships. In the context of the tamoxifen pharmacogenetics example, this means that individual genes (and their corresponding single nucleotide polymorphisms) can be assigned to more than 1 feature in the pathway. For example, the ABCG2 gene does not act only on the “tamoxifen–ER signal” edge of the pathway; it also acts on edges between “N-desmethyltamoxifen” and “endoxifen,” “endoxifen” and “ER signal,” and “N-desmethyltamoxifen” and “ER signal.” The prior will map all permuted pairs of the ABCG2 single nucleotide polymorphisms to the 4 pathway features to which they contribute. The simple codification of the pathway in Table 2 expands into a prior pathway with 1,700 entries when all gene variants are assigned to their pathway feature(s) according to Figure 1. The complete prior pathway from our tamoxifen example is available in our GitHub repository (29).

Table 2.

Prior Pathway Specifications for the Algorithm for Learning Pathway Structure (ALPS) (Partial Example Only)^a

Prior Pathway Structure	Input	Output	Modifier
(1, 1)	Tamoxifen	ER signal
(2, 2)	Tamoxifen	4-OH-Tam
(3, 3)	Tamoxifen	NDM-Tam
(4, 4)	4-OH-Tam	ER signal
(2, 4)	Tamoxifen	ER signal	4-OH-Tam
(5, 5)	4-OH-Tam	Tam-S
(2, 5)	Tamoxifen	Tam-S	4-OH-Tam
(6, 6)	4-OH-Tam	Endoxifen
(2, 6)	Tamoxifen	Endoxifen	4-OH-Tam
(7, 7)	NDM-Tam	Endoxifen
(3, 7)	Tamoxifen	Endoxifen	NDM-Tam
(8, 8)	NDM-Tam	ER signal
(3, 8)	Tamoxifen	ER signal	NDM-Tam
(9, 9)	Endoxifen	Tam-S
(6, 9)	4-OH-Tam	Tam-S	Endoxifen
(7, 9)	NDM-Tam	Tam-S	Endoxifen
(10, 10)	Endoxifen	Tam-G
(6, 10)	4-OH-Tam	Tam-G	Endoxifen
(7, 10)	NDM-Tam	Tam-G	Endoxifen
(11, 11)	Endoxifen	ER signal
(6, 11)	4-OH-Tam	ER signal	Endoxifen
(7, 11)	NDM-Tam	ER signal	Endoxifen

Open in a new tab

Abbreviations: ER, estrogen receptor; NDM-Tam, N-desmethyltamoxifen; 4-OH-Tam, 4-hydroxytamoxifen; Tam-G, tamoxifen glucuronide; Tam-S, tamoxifen sulfate.

^a SimulateddatafromthePredictorsofBreastCancerRecurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011.

At each iteration, ALPS compares features of the currently proposed tree to the prior pathway. This is achieved by counting the shared pairwise combinations of risk factors in the current tree with those in the prior specification. The similarity between the 2 structures is then used in the prior computation (12). This gives more weight to proposed trees with greater similarity to existing pathway information.

Cox proportional hazards regression

We extended ALPS to handle time-to-event data using Cox’s proportional hazards model. Assume that the hazard function for individual i is related to the pathway Inline graphic as follows:

(1)

where Inline graphic is the baseline hazard function, is the regression coefficient for the selected pathway structure, and is a derived variable that represents the combined influence of the gene variants and their interactions in that structure (we illustrate derivation of below). The model can be extended with terms for additional variables. The Cox model is fitted using methods of partial likelihoods. Briefly, the log-likelihood (LL) contribution for each individual who had the event (e.g., breast cancer recurrence) is

(2)

where Inline graphic is the risk set at time , including the participants who had the event at and all participants who were “in view” and cancer-free at . Note that is the same likelihood as that used for conditional logistic regression. That is, each risk set R represents a matched case-control set (subject i is the case). Similarly, for a stratified analysis, the partial log-likelihood is the sum of the partial log-likelihood functions for the individual strata. Tied event times are handled with the Breslow method (30).

Summarizing ALPS

To draw inferences from an ALPS analysis, we first summarize the posterior distribution of pathway structures (trees) sampled by the MCMC method. This includes computation of the posterior probabilities of each tree structure and the features comprising those structures. Bayes factors for any feature (or combination of features) can be computed by taking the ratio of posterior and prior odds.

We next illustrate implementation and interpretation of ALPS using a simulated example from this complex pharmacogenetic pathway.

SIMULATED TAMOXIFEN PHARMACOGENETIC DATA SET

Premenopausal breast cancer cohort

The Predictors of Breast Cancer Recurrence (ProBe CaRe) Study is a Danish population-based prospective cohort study of premenopausal breast cancer patients, enumerated from the Danish Breast Cancer Group registry (31). The cohort includes 5,959 women who were premenopausal at the time of diagnosis with Union for International Cancer Control stage I–III primary breast cancer between 2002 and 2011. Two strata were defined within the cohort: 1) women whose primary tumors expressed ER and who were treated with adjuvant tamoxifen therapy (ER-positive/tamoxifen-positive; n = 4,600) and 2) women whose primary tumors did not express ER and who were not treated with tamoxifen (ER-negative/tamoxifen-negative; n = 1,359). Participants’ archived breast tumors were retrieved and used to assay genetic variants in key tamoxifen pathway enzymes.

Tissue processing and pathway variant selection

We selected high-priority variants for genotyping. Variants were deemed high-priority if they had 1) documented functional consequences for the transcribed protein and 2) relatively common minor allele frequencies (≥5%). This resulted in selection of 32 functional polymorphisms in 15 genes (31). Table 1 summarizes key features of the genotyped variants.

Genotyping and data cleaning

Breast tumor sections were shipped to the Emory Integrated Genomics Core at Emory University (Atlanta, Georgia), where DNA extraction and genotyping were carried out with commercially available and custom-developed TaqMan kits (ThermoFisher Scientific, Waltham, Massachusetts). Genotypes were initially classified using the auto-call feature of TaqMan Genotyper software (version 1.3). Custom-call genotypes were defined by manual adjustment of genotype regions on TaqMan allelic discrimination plots by consensus between 5 investigators, while they were blinded to recurrence status and other clinical information. Custom-call data were exported from TaqMan Genotyper for further cleaning.

We classified custom-call genotypes as the number of minor alleles detected (i.e., additive coding, with a value of 0, 1, or 2). Because ALPS requires complete data, we filled in missing observations via multiple imputation. We carried out 50 imputations using observed genotypes of genes from the same family as that of the missing value, aggregated the 50 imputed copies into a single data set, and rounded aggregated minor allele counts to the nearest whole number (restricted to values of 0, 1, and 2). Multiple imputation was carried out with the “mice” package for R (32).

To ensure anonymity of cohort members, we scrambled the observed genotype data in 2 ways. We first randomly reclassified the number of minor alleles for each variant with approximately 1% probability and then swapped the identities of several variants in the data set. These measures preclude reidentification of cohort members using genetic data from any other source, thereby allowing public posting of the data for use as an exemplar for the ALPS method.

Simulation of genetic pathway associations

Starting with the scrambled genotype data, we simulated effects for a subset of the variants in the tamoxifen pathway. Our goal was to produce complex association patterns that would probably be missed by conventional analyses. We specified roles for the cyp4, cyp7, abc9, and sul3 variants (Table 1) based on the pathway depicted in Figure 1 and partially codified in Table 2. We set up interactions between cyp4 and cyp7, between abc9 and sul3, and between the 2 gene pairs (between (cyp4/cyp7) and (abc9/sul3)) by specifying Inline graphic parameters for each of the interacted terms (6 terms in all). We used these parameters to calculate the predicted value for the last pathway node, Z. We then specified a pathway regression coefficient () and a baseline hazard (), which were combined with Z to yield , the scale parameter of a Weibull distribution from which we sampled event times. Censoring times were sampled from a second Weibull distribution with a separate hazard ( Inline graphic ) set as the scale parameter. Person-time was taken as the minimum of the event and censoring times, and simulated recurrence status followed suit. The R script that defines parameter values and generates the simulated pathway data is available in our GitHub repository (29).

Data disclaimer

It is important to realize that the scrambling steps and the subsequent simulation of pathway data render the genotype data set scientifically inert. That is, the scrambled genetic data merely served as a convenient scaffold upon which we built interesting data for use in demonstrating the ALPS method. Therefore, no result from the example data set has any evidential value on the topic of genetic modification of tamoxifen therapy effectiveness. Results from ALPS analysis of the real-world tamoxifen metabolic pathway data are reported in a separate publication (33).

Comparative analysis of simulated tamoxifen pathway data

To provide a comparison of ALPS with more familiar methods, we analyzed the simulated tamoxifen data with conventional Cox regression, penalized regression (least absolute shrinkage and selection operator (LASSO), ridge, and elastic net), and Bayesian hierarchical Cox models. We fitted a single proportional hazards model to estimate mutually adjusted simulated marginal associations between tamoxifen pathway gene variants and breast cancer recurrence. This model was fitted with the “coxph” function of the “survival” package for R, and ties were handled using the Breslow method (30). Penalized regression models were fitted with the “glmnet” R package, using 10-fold cross-validation and with α parameters of 0 (ridge), 0.5 (elastic net), and 1 (LASSO) (34). The Bayesian hierarchical Cox regression model was fitted with the “BhGLM” R package, placing an Inline graphic prior on all parameters (10). All models treated genotype variables as linear terms; therefore, parameter estimates correspond to the change in the log hazard of breast cancer recurrence for any unit increase in the number of minor alleles carried.

ALPS analysis of simulated tamoxifen pathway data

ALPS analysis of the simulated tamoxifen cohort required the following elements, available for download from GitHub:

The scrambled cohort data.
The complete prior pathway.
The ALPS algorithm software.
The ALPS configuration R script, in which the user

specifies properties of the run.

The ALPS summarizing script.

Within the ALPS configuration script, we specified the following required parameters:

iter: the number of MCMC iterations to perform;
curtree: the initial tree upon which to start the MCMC search;
prefix: a descriptive text string to be used for ALPS output files; and
lik: the form of the likelihood function, which takes the following values:
- -
  “coxph-noties” for Cox regression without tied event times;
- “coxph-ties” for Cox regression with tied event times (Breslow approximation); and
- “logistic” for unconditional logistic regression for both cohort and case-control designs.

We conducted 10,000 MCMC iterations, starting at a random initial location. Because our data were based on times-to-event, some of which were tied, we selected the “coxph-ties” form of the likelihood. We also ran ALPS without the cohort data to obtain prior probabilities for calculating Bayes factors for the discovered features.

Upon completion of the MCMC run, we used the summarizing script to carry out postprocessing of the ALPS output. Postprocessing entails summarizing basic characteristics of the ALPS run, such as the number of iterations performed, the distribution of values for Inline graphic (the hyperparameter related to the prior pathway), the proportion of newly proposed tree structures that were accepted, and the distribution of the number of nodes across all proposed trees. The summarizing script calculates the posterior probabilities and Bayes factors for accepted tree structures and for the individual genetic factors. Finally, it outputs a portable document format (PDF) document with graphical depictions of the derived tree structures with Bayes factors that meet or exceed a user-defined threshold, which are used to understand and quantify interactions between genetic factors and their combined impact on the outcome.

Summarizing also permits estimation of wholesale effects of genetic factors when grouped by common attributes. For example, the gene variants in the tamoxifen pathway could be grouped together by function (phase I metabolic enzyme, phase II metabolic enzyme, or transport protein), by gene family (CYP450s, ABC proteins, SULTs, and UGTs), or by which tamoxifen metabolites they affect (tamoxifen, 4-hydroxyamoxifen, N-desmethyltamoxifen, or endoxifen). These groupings can be specified in a spreadsheet, with variants placed in rows and groupings defined in columns, with 0/1 indicators placed in each cell to encode whether each variant belongs to a given group. The summarizing script reads in the spreadsheet and calculates posterior probabilities and Bayes factors for the specified groups. For our example, we grouped variants by the tamoxifen metabolite(s) they affect. ALPS analysis therefore informed us as to which portions of the simulated tamoxifen pathway had the largest impact on the recurrence hazard when disrupted by genetic variation and which network of gene variants contributed most to that disruption.

RESULTS

The scrambled, simulated data set contained 5,087 records and 36 variables (a sequential record identification number, 32 gene variants, person-time at risk for breast cancer recurrence, and an indicator of recurrence status). There were 2,945 simulated recurrences, and median follow-up time was 118 weeks.

Comparative analyses of simulated data

Figure 3 shows the mutually adjusted simulated marginal associations between tamoxifen pathway variants and breast cancer recurrence from the conventional Cox regression model. The aliases in the figure correspond to the gene variants in Table 1. Most of the 32 associations are null-centered or near-null, but some are consistent with nonnull effects on recurrence. Table 3 shows these associations alongside those from the LASSO, ridge, elastic net, and Bayesian hierarchical models. All of these methods yielded similar association estimates for the variants with simulated “true” effects (i.e., cyp4, cyp7, abc9, and sul3), but the penalized and Bayesian hierarchical models were more parsimonious, having deemphasized most of the variants with no simulated effect.

Simulated marginal associations between tamoxifen pathway variants and breast cancer recurrence in the Predictors of Breast Cancer Recurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011. Hazard ratios (HRs) correspond to any unit increase in the minor allele count. All associations were mutually adjusted for all genetic factors. Bars, 95% confidence intervals (CIs). abc, ATP-binding cassette (P-glycoprotein); ATP, adenosine triphosphate; cyp, cytochrome P450; sul, sulfotransferase; ugt, uridine 5-diphosphoglucuronosyltransferase.

Table 3.

Coefficients (Log Hazards) From Conventional Cox Regression, Penalized Regression (LASSO, Ridge, and Elastic Net), and Bayesian Hierarchical Models Applied to the Simulated Tamoxifen Pathway Data^a

Alias	Gene	Reference SNP ID No.	Modeling Approach
			Conventional Cox Regression	Penalized Regression
				LASSO	Ridge	Elastic Net	BhGLM
abc1	ABCB1	rs10248420	0.012		−0.006		0.000
abc2	ABCB1	rs1045642	0.069		0.055	0.005	0.039
abc3	ABCB1	rs1128503	0.046		0.029		0.017
abc4	ABCB1	rs2032582	0.0003		0.003		0.000
abc5	ABCB1	rs3740065	−0.048		−0.033		−0.013
abc6	ABCB1	rs717620	−0.013		−0.011		0.000
abc7	ABCB1	rs8187710	0.032		0.019		0.000
abc8	ABCB1	rs1564481	0.032		0.009		0.015
abc9^b	ABCG2	rs2231164	0.589	0.556	0.532	0.550	0.578
abc10	ABCG2	rs2622604	−0.123	−0.000	−0.100	−0.019	−0.095
cyp1	CYP1A1	rs1048943	0.026		0.028		0.000
cyp2	CYP2B6	rs3745274	0.089	0.044	0.075	0.050	0.078
cyp3	CYP2B6	rs8192709	0.062		0.041		0.027
cyp4^b	CYP2C19	rs12248560	0.190	0.099	0.155	0.106	0.163
cyp5	CYP2C19	rs4244285	0.074		0.056		0.048
cyp6	CYP2C9	rs1799853	−0.013		−0.011		0.000
cyp7^b	CYP2C9	rs1057910	0.312	0.217	0.269	0.223	0.284
cyp8	CYP2D6	rs1065852	0.013		0.007		0.000
cyp9	CYP2D6	rs16947	−0.010		−0.009		0.000
cyp10	CYP2D6	rs3892097	−0.029		−0.023		−0.003
cyp11	CYP2D6	rs28371706	0.119		0.100		0.077
cyp12	CYP2D6	rs28371725	−0.063		−0.059	−0.006	−0.047
cyp13	CYP3A4/5	rs10273424	0.027		0.017		0.007
cyp14	CYP3A4/5	rs776746	0.001		0.004		0.000
sul1	SULT1A1	rs1042157	−0.040		−0.026		−0.003
sul2	SULT1A1	rs1801030	0.035		0.040		0.000
sul3^b	SULT1A1	rs9282861	0.373	0.338	0.364	0.342	0.365
sul4	SULT1E1	rs3775775	0.026		0.021		0.015
sul5	SULT1E1	rs3775778	−0.017		−0.014		−0.007
ugt1	UGT2B7	rs7434332	0.016		0.015		0.002
ugt2	UGT2B10	rs294769	−0.007		−0.008		0.000
ugt3	UGT2B15	rs1902023	−0.016		−0.014		−0.005

Open in a new tab

Abbreviations: ABC, ATP-binding cassette (P-glycoprotein); ATP, adenosine triphosphate; BhGLM, Bayesian hierarchical generalized linear model; CYP, cytochrome P450; ID, identification; LASSO, least absolute shrinkage and selection operator; SNP, single nucleotide polymorphism; SULT, sulfotransferase; UGT, uridine 5 Inline graphic -diphosphoglucuronosyltransferase.

^a Simulated data from the Predictors of Breast Cancer Recurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011. Missing values for LASSO and elastic net indicate ignored parameters.

^b Variant with a nonnull simulated effect.

A logical next step is to consider interactions between gene variants highlighted by the models in Table 3. For example, one might want to see whether the effect of polymorphisms in phase I enzymes (the CYPs) depends on the presence of polymorphisms in phase II enzymes (the UGT and SULT genes). Suppose an investigator focused on the 8 variants selected by the elastic net model (Table 3); they would have to—at a minimum—explore all of the Inline graphic possible pairwise interactions with these data, requiring 28 unique models. Transporter variants (i.e., ABC genes) add another layer of complexity, requiring specification of 3-way interactions.

In addition to the complexity of modeling all of these possible interactions, arriving at a holistic interpretation of the associations within the context of the presumed pathway would be a daunting task. And while the presumed structure and function of the pathway elements would probably guide the investigator in their specification and assessment of these models, that knowledge and its attendant uncertainties would not be formally represented.

ALPS analysis of simulated data

The run time for 10,000 ALPS iterations was 3 hours on a 64-bit computer system with a 3.40-GHz Intel i7 processor (Intel Corporation, Santa Clara, California) and 16 GB of random access memory. Proposed structural changes were accepted in approximately 1% of the iterations. Table 4 shows the marginal posterior probabilities and Bayes factors for individual genetic factors that appeared in any accepted tree. These results show that most gene variants in the simulated tamoxifen pathway were not associated with recurrence. The 4 variants with simulated recurrence effects (abc9, cyp4, cyp7, and sul3) had strong-to-decisive Bayes factors (range, 43–7,443).

Table 4.

ALPS Genetic Factor Marginal Posteriors for the Simulated Pathway Data^a

Alias	Gene	Reference SNP ID No.	Count ^b	Posterior Probability	Bayes Factor
abc1	ABCB1	rs10248420	23	0.0023	0
abc2	ABCB1	rs1045642	4	0.0004	0
abc8	ABCG2	rs1564481	14	0.0014	0
abc9	ABCG2	rs2231164	9,997	0.9997	7,443
cyp2	CYP2B6	rs3745274	2	0.0002	0
cyp4	CYP2C19	rs12248560	9,505	0.9505	43
cyp7	CYP2C9	rs1057910	9,843	0.9843	140
cyp8	CYP2D6	rs1065852	38	0.0038	0
cyp11	CYP2D6	rs28371706	6	0.0006	0
cyp14	CYP3A4/5	rs776746	3	0.0003	0
sul1	SULT1A1	rs1042157	774	0.0774	0
sul3	SULT1A1	rs9282861	9,979	0.9979	1,061
ugt2	UGT2B10	rs294769	17	0.0017	0

Open in a new tab

Abbreviations: ABC, ATP-binding cassette (P-glycoprotein); ALPS, algorithm for learning pathway structure; ATP, adenosine triphosphate; CYP, cytochrome P450; ID, identification; SNP, single nucleotide polymorphism; SULT, sulfotransferase; UGT, uridine 5 Inline graphic -diphosphoglucuronosyltransferase.

^a Simulated data from the Predictors of Breast Cancer Recurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011.

^b Number of times a particular gene variant appeared in a tree structure visited during the Markov chain Monte Carlo process.

Table 5 shows the tree structures highlighted by ALPS, the number of times they were visited by the MCMC process, their posterior probabilities, and Bayes factors. The dominant structure, with a very strong Bayes factor of 130, was ((abc9, (cyp7, cyp4)), sul3)—shown as a cladogram in Figure 4. The leftmost vertex is Z, the net effect of the interactions between cyp4, cyp7, sul3, and abc9. This effect can be calculated for different combinations of genotypes using the Inline graphic parameter values in the circles on each edge:

Table 5.

ALPS Tree Posteriors and Bayes Factors for the Simulated Pathway Data^a

Tree Structure	Count ^b	Posterior Probability	Bayes Factor
((abc1, abc8), (abc9, sul3))	4	0.0004	0
((abc1, abc8), abc9)	2	0.0002	0
((abc1, abc9), abc2)	1	0.0001	0
((abc9, ((cyp7, cyp8), cyp4)), sul3)	20	0.0020	0
((abc9, ((cyp7, sul1), cyp4)), sul3)	215	0.0215	0
((abc9, ((cyp7, ugt2), cyp4)), sul3)	17	0.0017	0
((abc9, (cyp7, (cyp4, cyp8))), sul3)	18	0.0018	0
((abc9, (cyp7, (cyp4, sul1))), sul3)	551	0.0551	1
((abc9, (cyp7, cyp4)), (sul3, abc8))	5	0.0005	0
((abc9, (cyp7, cyp4)), (sul3, cyp11))	6	0.0006	0
((abc9, (cyp7, cyp4)), (sul3, cyp14))	1	0.0001	0
((abc9, (cyp7, cyp4)), (sul3, cyp2))	2	0.0002	0
((abc9, (cyp7, cyp4)), (sul3, sul1))	7	0.0007	0
((abc9, (cyp7, cyp4)), sul3)	8,663	0.8663	130
((abc9, cyp7), sul3)	338	0.0338	1
(abc1, (abc9, sul1))	1	0.0001	0
(abc1, abc2)	1	0.0001	0
(abc1, abc9)	14	0.0014	0
(abc8, (abc9, sul3))	3	0.0003	0
(abc9, sul3)	129	0.0129	0
(cyp14, abc2)	2	0.0002	0

Open in a new tab

Abbreviations: abc, ATP-binding cassette (P-glycoprotein); ALPS, algorithm for learning pathway structure; ATP, adenosine triphosphate; cyp, cytochrome P450; sul, sulfotransferase; ugt, uridine 5 Inline graphic -diphosphoglucuronosyltransferase.

^a Simulated data from the Predictors of Breast Cancer Recurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011.

^b Number of times the tree structure was visited during the Markov chain Monte Carlo process.

Top tree structure from a 10,000-iteration algorithm for learning pathway structure (ALPS) run on simulated data from the Predictors of Breast Cancer Recurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011. Interactions between genes are quantified by the parameters shown in the circles. ALPS output produces annotated tree structures with the posterior, the Bayes factor, and the overall pathway effect, β. abc, ATP-binding cassette (P-glycoprotein); ATP, adenosine triphosphate; cyp, cytochrome P450; sul, sulfotransferase.

The log hazard of recurrence, given an observed set of genotypes on the 4 genes in the tree, can then be calculated as Inline graphic , using the parameter at the bottom of the cladogram. Taking will give the hazard ratio comparing the recurrence hazard for the specified set of genotypes with the recurrence hazard when all genes are homozygous wild-type.

Table 6 summarizes the ALPS results according to the tamoxifen metabolites whose formation, transport, and elimination are influenced by gene variants in the tamoxifen pathway. There is strong evidence (Bayes factor = 1,565) that variants involved in the formation of tamoxifen sulfate affect recurrence in tamoxifen-treated women. Less pronounced (but still noteworthy) are effects of variants involved in formation of endoxifen, 4-hydroxytamoxifen, and N-desmethyltamoxifen, with Bayes factors of 219, 30, and 33, respectively. The Bayes factor for genes involved in tamoxifen glucuronidation (tamoxifen glucuronide) is 0.01, which indicates that the data are more consistent with no effect of these variants.

Table 6.

Bayes Factors for the Importance of Tamoxifen Pathway Variants in the Simulated Data, According to the Metabolites They Affect^a

Metabolite	Prior Probability	Posterior Probability	Bayes Factor
Tam-S	0.2417	0.998	1,565.54
Tam-G	0.1342	0.002	0.01
Endoxifen	0.8202	0.999	219.00
4-OH-Tam	0.6825	0.985	30.55
NDM-Tam	0.6592	0.985	33.95

Open in a new tab

Abbreviations: NDM-Tam, N-desmethyltamoxifen; 4-OH-Tam, 4-hydroxytamoxifen; Tam-G, tamoxifen glucuronide; Tam-S, tamoxifen sulfate.

^a SimulateddatafromthePredictorsofBreastCancerRecurrence (ProBe CaRe) Study cohort, Denmark, 2002–2011.

Software

We assembled a Jupyter Notebook (Project Jupyter (https://jupyter.org/)) to accompany this paper. The notebook includes all of the ALPS software elements, the simulated tamoxifen pathway data, and an interactive markdown document explaining the simulation process and implementation of the analyses reported herein. The notebook can be accessed in our GitHub repository (29). Additional support can be obtained from the corresponding author.

DISCUSSION

We observed strong marginal recurrence associations for 4 of the gene variants in the simulated tamoxifen pathway (abc9, cyp4, cyp7, and sul3 (Table 4)). ALPS revealed a number of plausible structures for complex interactions between these 4 variants (Table 5), the most likely of which was the (abc9, (cyp7, cyp4)), sul3) arrangement (depicted in Figure 4). This structure and its associated Inline graphic and parameters can be used to estimate recurrence hazards for any desired combination of genotypes for the contributing genes. The most profound simulated pathway effect was for variants that influence tamoxifen sulfation.

In conclusion, ALPS is an easily implemented Bayesian pathway analysis method that addresses key challenges facing modern epidemiologic studies. Its ability to derive complex interactions within networks of mediating factors, while incorporating prior information about the structure and function of the network, are expected to improve the validity and accuracy of inferences from such studies. In our example, ALPS correctly identified the gene variants with simulated effects on breast cancer recurrence in tamoxifen users, and it highlighted a complex interaction structure between these variants that is unlikely to have been discovered with conventional analytical approaches. ALPS can be adapted to any scenario in which some exogenous exposure affects some outcome through a measurable network of factors (12). One can easily imagine applications to research within other scientific domains, including studies of social and behavioral determinants of health. Therefore, the method is a response to calls for development of analytical strategies with which to study complex interactions that incorporate substantial prior knowledge (35).

ACKNOWLEDGMENTS

Author affiliations: BioRealm LLC, Walnut, California (James W. Baurley); Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark (Anders Kjærsgaard, Deirdre P. Cronin-Fenton); Department of Human Genetics, School of Medicine, Emory University, Atlanta, Georgia (Michael E. Zwick); Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia (Lindsay J. Collin, Timothy L. Lash); Winship Cancer Institute, Emory University, Atlanta, Georgia (Timothy L. Lash); Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, Odense, Denmark (Per Damkier); Department of Clinical Research, University of Southern Denmark, Odense, Denmark (Per Damkier); Department of Pathology, Aarhus University Hospital, Aarhus, Denmark (Stephen Hamilton-Dutoit); and Department of Surgery, Larner College of Medicine, University of Vermont, Burlington, Vermont (Thomas Ahern).

The results reported herein correspond to the specific aims of grant R01CA166825 from the National Cancer Institute (NCI), US National Institutes of Health (NIH), awarded to T.L.L. This project was also supported by funding from the NCI (grant R01 CA118708) and the US National Library of Medicine (grant R01LM013049) awarded to T.L.L.; the National Institute on Alcohol Abuse and Alcoholism, NIH (grant R44 AA027675) awarded to J.B.; the Danish Cancer Society (grant DP06117) awarded to S.H.-D.; the Lundbeck Foundation (grant R167-2013-15861) awarded to D.C.-F.; the Danish Medical Research Council (grant DOK 1158869) awarded to T.L.L.; the Karen Elise Jensen Foundation awarded to H.T.S.; and the Program for Clinical Research Infrastructure (established by the Lundbeck and Novo Nordisk Foundations) awarded to H.T.S. This research was additionally supported in part by the Emory Integrated Genomics Core Shared Resource (Winship Cancer Institute, Emory University) and by the NIH/NCI under award 2P30CA138292-04. T.P.A. was supported in part by funding from the National Institute of General Medical Sciences, NIH (grant P20 GM103644).

The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other funding supporters.

Conflict of interest: none declared.

REFERENCES

1. Thomas DC, Baurley JW, Brown EE, et al. Approaches to complex pathways in molecular epidemiology: summary of a special conference of the American Association for Cancer Research. Cancer Res. 2008;68(24):10028–10030. [DOI] [PubMed] [Google Scholar]
2. Borgatti SP, Mehra A, Brass DJ, et al. Network analysis in the social sciences. Science. 2009;323(5916):892–895. [DOI] [PubMed] [Google Scholar]
3. Khoury MJ, Ioannidis JPA. Big data meets public health. Science. 2014;346(6213):1054–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Weinshilboum RM, Wang L. Pharmacogenomics: precision medicine and drug response. Mayo Clin Proc. 2017;92(11):1711–1722. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Khoury MJ, Millikan R, Gwinn M. Genetic and molecular epidemiology In: Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008:564–579. [Google Scholar]
6. Goldberger AS. Structural equation methods in the social sciences. Econometrica. 1972;40(6):979–1001. [Google Scholar]
7. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B. 1996;58(1):267–288. [Google Scholar]
9. Tibshirani R. The LASSO method for variable selection in the Cox model. Stat Med. 1997;16(4):385–395. [DOI] [PubMed] [Google Scholar]
10. Yi N, Tang Z, Zhang X, et al. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics. 2019;35(8):1419–1421. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Isci S, Ozturk C, Jones J, et al. Pathway analysis of high-throughput biological data within a Bayesian network framework. Bioinformatics. 2011;27(12):1667–1674. [DOI] [PubMed] [Google Scholar]
12. Baurley JW, Conti DV, Gauderman WJ, et al. Discovery of complex pathways from observational data. Stat Med. 2010;29(19):1998–2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Howlader N, Altekruse SF, Li CI, et al. US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status. J Natl Cancer Inst. 2014;106(5):dju055. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Yaşar P, Ayaz G, User SD, et al. Molecular mechanism of estrogen-estrogen receptor signaling. Reprod Med Biol. 2016;16(1):4–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Lippman ME, Bolan G. Oestrogen-responsive human breast cancer in long term tissue culture. Nature. 1975;256(5518):592–593. [DOI] [PubMed] [Google Scholar]
16. Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet. 2005;365(9472):1687–1717. [DOI] [PubMed] [Google Scholar]
17. Lash TL, Lien EA, Sørensen HT, et al. Genotype-guided tamoxifen therapy: time to pause for reflection? Lancet Oncol. 2009;10(8):825–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Cronin-Fenton DP, Damkier P, Lash TL. Metabolism and transport of tamoxifen in relation to its effectiveness: new perspectives on an ongoing controversy. Future Oncol. 2014;10(1):107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Haque MM, Desai KV. Pathways to endocrine therapy resistance in breast cancer. Front Endocrinol (Lausanne). 2019;10:Article 573. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Lim YC, Desta Z, Flockhart DA, et al. Endoxifen (4-hydroxy-N-desmethyl-tamoxifen) has anti-estrogenic effects in breast cancer cells with potency similar to 4-hydroxy-tamoxifen. Cancer Chemother Pharmacol. 2005;55(5):471–478. [DOI] [PubMed] [Google Scholar]
21. Goetz MP, Rae JM, Suman VJ, et al. Pharmacogenetics of tamoxifen biotransformation is associated with clinical outcomes of efficacy and hot flashes. J Clin Oncol. 2005;23(36):9312–9318. [DOI] [PubMed] [Google Scholar]
22. Desta Z, Ward BA, Soukhova NV, et al. Comprehensive evaluation of tamoxifen sequential biotransformation by the human cytochrome P450 system in vitro: prominent roles for CYP3A and CYP2D6. J Pharmacol Exp Ther. 2004;310(3):1062–1075. [DOI] [PubMed] [Google Scholar]
23. Blevins-Primeau A, Sun D, Chen G, et al. Functional significance of UDP-glucuronosyltransferase variants in the metabolism of active tamoxifen metabolites. Cancer Res. 2009;69(5):1892–1900. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Lazarus P, Blevins-Primeau AS, Zheng Y, et al. Potential role of UGT pharmacogenetics in cancer treatment and prevention: focus on tamoxifen. Ann N Y Acad Sci. 2009;1155:99–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Gjerde J, Hauglid M, Breilid H, et al. Effects of CYP2D6 and SULT1A1 genotypes including SULT1A1 gene copy number on tamoxifen metabolism. Ann Oncol. 2008;19(1):56–61. [DOI] [PubMed] [Google Scholar]
26. Zheng Y, Sun D, Sharma A, et al. Elimination of antiestrogenic effects of active tamoxifen metabolites by glucuronidation. Drug Metab Dispos. 2007;35(10):1942–1948. [DOI] [PubMed] [Google Scholar]
27. Falany JL, Pilloff DE, Leyh TS, et al. Sulfation of raloxifene and 4-hydroxytamoxifen by human cytosolic sulfotransferases. Drug Metab Dispos. 2006;34(3):361–368. [DOI] [PubMed] [Google Scholar]
28. Hamra G, MacLehose R, Richardson D. Markov chain Monte Carlo: an introduction for epidemiologists. Int J Epidemiol. 2013;42(2):627–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Ahern TP. ALPS-Bayesian-Pathway-Analysis. https://github.com/tpahern/ALPS-Bayesian-Pathway-Analysis. Published March 21, 2019 Accessed October 16, 2019.
30. Hosmer DW, Lemeshow S. Applied Survival Analysis: Regression Modeling of Time-to-Event Data. 1st ed. New York, NY: John Wiley & Sons, Inc.; 1999. [Google Scholar]
31. Collin LJ, Cronin-Fenton DP, Ahern TP, et al. Cohort profile: the Predictors of Breast Cancer Recurrence (ProBe CaRE) premenopausal breast cancer cohort study in Denmark. BMJ Open. 2018;8(7):e021805. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Soft. 2011;45(3):1–67. [Google Scholar]
33. Ahern TP, Collin LJ, Baurley JW, et al. Metabolic pathway analysis and effectiveness of tamoxifen in Danish breast cancer patients. Cancer Epidemiol Biomarkers Prev. 2020;29(3):582–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–24. [PMC free article] [PubMed] [Google Scholar]
35. Keyes KM, Galea S. The limits of risk factors revisited: is it time for a causal architecture approach? Epidemiology. 2017;28(1):1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref1] 1. Thomas DC, Baurley JW, Brown EE, et al. Approaches to complex pathways in molecular epidemiology: summary of a special conference of the American Association for Cancer Research. Cancer Res. 2008;68(24):10028–10030. [DOI] [PubMed] [Google Scholar]

[ref2] 2. Borgatti SP, Mehra A, Brass DJ, et al. Network analysis in the social sciences. Science. 2009;323(5916):892–895. [DOI] [PubMed] [Google Scholar]

[ref3] 3. Khoury MJ, Ioannidis JPA. Big data meets public health. Science. 2014;346(6213):1054–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4. Weinshilboum RM, Wang L. Pharmacogenomics: precision medicine and drug response. Mayo Clin Proc. 2017;92(11):1711–1722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] 5. Khoury MJ, Millikan R, Gwinn M. Genetic and molecular epidemiology In: Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008:564–579. [Google Scholar]

[ref6] 6. Goldberger AS. Structural equation methods in the social sciences. Econometrica. 1972;40(6):979–1001. [Google Scholar]

[ref7] 7. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B. 1996;58(1):267–288. [Google Scholar]

[ref9] 9. Tibshirani R. The LASSO method for variable selection in the Cox model. Stat Med. 1997;16(4):385–395. [DOI] [PubMed] [Google Scholar]

[ref10] 10. Yi N, Tang Z, Zhang X, et al. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics. 2019;35(8):1419–1421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Isci S, Ozturk C, Jones J, et al. Pathway analysis of high-throughput biological data within a Bayesian network framework. Bioinformatics. 2011;27(12):1667–1674. [DOI] [PubMed] [Google Scholar]

[ref12] 12. Baurley JW, Conti DV, Gauderman WJ, et al. Discovery of complex pathways from observational data. Stat Med. 2010;29(19):1998–2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13. Howlader N, Altekruse SF, Li CI, et al. US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status. J Natl Cancer Inst. 2014;106(5):dju055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] 14. Yaşar P, Ayaz G, User SD, et al. Molecular mechanism of estrogen-estrogen receptor signaling. Reprod Med Biol. 2016;16(1):4–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15. Lippman ME, Bolan G. Oestrogen-responsive human breast cancer in long term tissue culture. Nature. 1975;256(5518):592–593. [DOI] [PubMed] [Google Scholar]

[ref16] 16. Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet. 2005;365(9472):1687–1717. [DOI] [PubMed] [Google Scholar]

[ref17] 17. Lash TL, Lien EA, Sørensen HT, et al. Genotype-guided tamoxifen therapy: time to pause for reflection? Lancet Oncol. 2009;10(8):825–833. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] 18. Cronin-Fenton DP, Damkier P, Lash TL. Metabolism and transport of tamoxifen in relation to its effectiveness: new perspectives on an ongoing controversy. Future Oncol. 2014;10(1):107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] 19. Haque MM, Desai KV. Pathways to endocrine therapy resistance in breast cancer. Front Endocrinol (Lausanne). 2019;10:Article 573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 20. Lim YC, Desta Z, Flockhart DA, et al. Endoxifen (4-hydroxy-N-desmethyl-tamoxifen) has anti-estrogenic effects in breast cancer cells with potency similar to 4-hydroxy-tamoxifen. Cancer Chemother Pharmacol. 2005;55(5):471–478. [DOI] [PubMed] [Google Scholar]

[ref21] 21. Goetz MP, Rae JM, Suman VJ, et al. Pharmacogenetics of tamoxifen biotransformation is associated with clinical outcomes of efficacy and hot flashes. J Clin Oncol. 2005;23(36):9312–9318. [DOI] [PubMed] [Google Scholar]

[ref22] 22. Desta Z, Ward BA, Soukhova NV, et al. Comprehensive evaluation of tamoxifen sequential biotransformation by the human cytochrome P450 system in vitro: prominent roles for CYP3A and CYP2D6. J Pharmacol Exp Ther. 2004;310(3):1062–1075. [DOI] [PubMed] [Google Scholar]

[ref23] 23. Blevins-Primeau A, Sun D, Chen G, et al. Functional significance of UDP-glucuronosyltransferase variants in the metabolism of active tamoxifen metabolites. Cancer Res. 2009;69(5):1892–1900. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 24. Lazarus P, Blevins-Primeau AS, Zheng Y, et al. Potential role of UGT pharmacogenetics in cancer treatment and prevention: focus on tamoxifen. Ann N Y Acad Sci. 2009;1155:99–111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 25. Gjerde J, Hauglid M, Breilid H, et al. Effects of CYP2D6 and SULT1A1 genotypes including SULT1A1 gene copy number on tamoxifen metabolism. Ann Oncol. 2008;19(1):56–61. [DOI] [PubMed] [Google Scholar]

[ref26] 26. Zheng Y, Sun D, Sharma A, et al. Elimination of antiestrogenic effects of active tamoxifen metabolites by glucuronidation. Drug Metab Dispos. 2007;35(10):1942–1948. [DOI] [PubMed] [Google Scholar]

[ref27] 27. Falany JL, Pilloff DE, Leyh TS, et al. Sulfation of raloxifene and 4-hydroxytamoxifen by human cytosolic sulfotransferases. Drug Metab Dispos. 2006;34(3):361–368. [DOI] [PubMed] [Google Scholar]

[ref28] 28. Hamra G, MacLehose R, Richardson D. Markov chain Monte Carlo: an introduction for epidemiologists. Int J Epidemiol. 2013;42(2):627–634. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29. Ahern TP. ALPS-Bayesian-Pathway-Analysis. https://github.com/tpahern/ALPS-Bayesian-Pathway-Analysis. Published March 21, 2019 Accessed October 16, 2019.

[ref30] 30. Hosmer DW, Lemeshow S. Applied Survival Analysis: Regression Modeling of Time-to-Event Data. 1st ed. New York, NY: John Wiley & Sons, Inc.; 1999. [Google Scholar]

[ref31] 31. Collin LJ, Cronin-Fenton DP, Ahern TP, et al. Cohort profile: the Predictors of Breast Cancer Recurrence (ProBe CaRE) premenopausal breast cancer cohort study in Denmark. BMJ Open. 2018;8(7):e021805. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] 32. Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Soft. 2011;45(3):1–67. [Google Scholar]

[ref33] 33. Ahern TP, Collin LJ, Baurley JW, et al. Metabolic pathway analysis and effectiveness of tamoxifen in Danish breast cancer patients. Cancer Epidemiol Biomarkers Prev. 2020;29(3):582–590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–24. [PMC free article] [PubMed] [Google Scholar]

[ref35] 35. Keyes KM, Galea S. The limits of risk factors revisited: is it time for a causal architecture approach? Epidemiology. 2017;28(1):1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Bayesian Pathway Analysis for Complex Interactions

James W Baurley

Anders Kjærsgaard

Michael E Zwick

Deirdre P Cronin-Fenton

Lindsay J Collin

Per Damkier

Stephen Hamilton-Dutoit

Timothy L Lash

Thomas P Ahern

Abstract

Abbreviations

Analysis of complex pathways

Tamoxifen resistance in breast cancer

Tamoxifen metabolic pathway

Figure 1.

Genetic variation in the tamoxifen pathway

Table 1.

Pathway analysis for complex observational data

METHODS

Algorithm for learning pathway structure

Figure 2.

Prior pathway specification

Table 2.

Cox proportional hazards regression

Summarizing ALPS

SIMULATED TAMOXIFEN PHARMACOGENETIC DATA SET

Premenopausal breast cancer cohort

Tissue processing and pathway variant selection

Genotyping and data cleaning

Simulation of genetic pathway associations

Data disclaimer

Comparative analysis of simulated tamoxifen pathway data

ALPS analysis of simulated tamoxifen pathway data

RESULTS

Comparative analyses of simulated data

Figure 3.

Table 3.

ALPS analysis of simulated data

Table 4.

Table 5.

Figure 4.

Table 6.

Software

DISCUSSION

ACKNOWLEDGMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases