Abstract
In interdisciplinary biomedical, epidemiologic, and population research, it is increasingly necessary to consider pathogenesis and inherent heterogeneity of any given health condition and outcome. As the unique disease principle implies, no single biomarker can perfectly define disease subtypes. The complex nature of molecular pathology and biology necessitates biostatistical methodologies to simultaneously analyze multiple biomarkers and subtypes. To analyze and test for heterogeneity hypotheses across subtypes defined by multiple categorical and/or ordinal markers, we developed a meta-regression method that can utilize existing statistical software for mixed-model analysis. This method can be used to assess whether the exposure-subtype associations are different across subtypes defined by 1 marker while controlling for other markers and to evaluate whether the difference in exposure-subtype association across subtypes defined by 1 marker depends on any other markers. To illustrate this method in molecular pathological epidemiology research, we examined the associations between smoking status and colorectal cancer subtypes defined by 3 correlated tumor molecular characteristics (CpG island methylator phenotype, microsatellite instability, and the B-Raf protooncogene, serine/threonine kinase (BRAF), mutation) in the Nurses' Health Study (1980–2010) and the Health Professionals Follow-up Study (1986–2010). This method can be widely useful as molecular diagnostics and genomic technologies become routine in clinical medicine and public health.
Keywords: causal inference, genomics, heterogeneity test, molecular diagnosis, omics, transdisciplinary epidemiology
Based on the underlying premise that individuals with the same disease name have similar etiologies and disease evolution, epidemiologic research typically aims to investigate the relationship between exposure and disease. With the advancement of biomedical sciences, it is increasingly evident that many human disease processes comprise a range of heterogeneous molecular pathological processes, modified by the exposome (1). Molecular classification can be utilized in epidemiology because individuals with similar molecular pathological processes likely share similar etiologies (2). Pathogenic heterogeneity has been considered in various neoplasms such as endometrial (3), colorectal (3–20), and lung (21–24) cancers, as well as nonneoplastic diseases such as stroke (25), cardiovascular disease (26), autism (27), infectious disease (28), autoimmune disease (29), glaucoma (30), and obesity (31).
New statistical methodologies to address disease heterogeneity are useful in not only molecular pathological epidemiology (MPE) (32) with bona fide molecular subclassification but also epidemiologic research that takes other features of disease heterogeneity (e.g., lethality, disease severity) into consideration. There are statistical methods for evaluating whether the association of an exposure with disease varies by subtypes that are defined by categorical (33–36) or ordinal (33–35) subclassifiers (M.W., unpublished manuscript, 2015); the published methods by Chatterjee (33), Chatterjee et al. (34), and Rosner et al. (35) apply to cohort studies, and the method by Begg et al. (36) focuses on case-control studies. For simplicity, we use the term “categorical variable” (or the adjective “categorical”) when referring to “nonordinal categorical variable” throughout this paper.
Given the complexity of molecular pathology and pathogenesis indicated by the unique disease principle (1), no single biomarker can perfectly subclassify any disease entity. Notably, molecular disease markers are often correlated (37). For example, in colorectal cancer, there is a strong association between high-level microsatellite instability (MSI) and high-level CpG island methylator phenotype (CIMP) and between high-level CIMP and the B-Raf protooncogene, serine/threonine kinase (BRAF), mutation (38).
Cigarette smoking has been associated with the risk of high-level MSI colorectal cancer (16–18, 20, 39–42), high-level CIMP colorectal cancer (17, 20, 42, 43), and BRAF-mutated colorectal cancer (17, 19, 20, 42). Given the correlations between these molecular markers, the association of smoking with a subtype defined by 1 marker may solely (or in part) reflect the association with a subtype defined by another marker. Thus, it remains unclear which molecular marker subtypes are primarily differentially associated with smoking, and how a marker can confound the association between smoking and subtypes defined by other markers. Although the published methods (33–35) are useful to analyze the exposure-subtype associations according to multiple subtyping markers in cohort studies using existing statistical software, analysis using those methods can become computationally infeasible in large data sets. In this article, we present an intuitive and computationally efficient biostatistical method for the analysis of disease and etiological heterogeneity when there are multiple disease subtyping markers (categorical and/or ordinal), which are possibly, but not necessarily, correlated.
METHODS
Cohort and nested case-control studies
In cohort studies where age at disease onset is available, a commonly used statistical model for evaluating subtype-specific exposure-disease associations is the cause-specific hazards model (44, 45):
(1) |
where λj(t) is the incidence rate at age t for subtype j, λ0j(t) is the baseline incidence rate for subtype j, Xi(t) is a possibly time-varying column vector of exposure variables for the ith individual, Wi(t) is a possibly time-varying column vector of potential confounders, and β1j and β2j are row vector-valued log relative risks (RRs) for the corresponding covariates for subtype j. Model 1 can be estimated in cohort studies and incidence density–sampled case-control studies (46). Assume that J subtypes result from cross-classification of multiple categorical and/or ordinal markers. We create binary indicators for categorical markers; thus, hereafter, we treat the marker variables as either binary or ordinal. Let spj denote the level of the pth marker variable corresponding to the jth subtype; it is 1 or 0 if the pth marker variable is binary, and it is the ordinal or median score of the marker level corresponding to the jth subtype if the pth marker is an ordinal marker, p = 1, …, P, j = 1, …, J.
One-stage method
The method developed by Rosner et al. (35), Chatterjee (33), and Chatterjee et al. (34) can be usefully applied in cohort studies to investigate multiple markers. In that method, β1j in model 1 is modeled by using the marker variables, for example, by where some interaction terms of marker variables can be added. Model 1 then becomes
(2) |
To distinguish this method from the proposed 2-stage method below, we name it “1-stage method.” The parameters of interest, γ0 and each γp, which have the same dimension as β1j, characterize how the levels of multiple markers are associated with differential exposure associations. We can obtain the maximum partial likelihood estimate (33, 34) of γ = {γ0, γp, p = 1, …, P} using existing statistical software for the Cox model analysis, such as PROC PHREG in SAS (SAS Institute, Inc., Cary, North Carolina), through the data duplication method (47), which is based on the following transformation of model 2:
where Wli(t) = Wi(t) for l = j, and Wli(t) = 0 for l ≠ j. In this data duplication method, model 2 can be fit by using stratified Cox regression (stratified by subtype) on an augmented data set, in which each block of person-time is augmented for each subtype, and the variables and Wji are created for p = 1, …, P, j = 1, …, J. Rosner et al. (35) also proposed an adjusted RR for the exposure-disease association for a disease subtype defined by 1 or more marker(s) while adjusting for other markers. The data duplication method may become computationally infeasible when the augmented data set becomes too large; this can easily happen when the original data set is sizable and the number of subtypes cross-classified from the multiple markers is large. For example, in our colorectal cancer example, there are 3,099,586 rows in our original data set. With P = 3 and J = 8, in the augmented data set there will be about 3,099,586 × 8 =24,796,688 rows, P × J = 24 new variables created for each exposure variable, and J = 8 variables created for each confounding variable. If more markers are being considered, the large augmented data set can easily make the Cox model analysis computationally infeasible.
Two-stage method
When subtypes are defined by multiple categorical and/or ordinal markers, we propose a meta-regression method that is intuitive, does not need augmentation of the data set, and can be easily implemented by using existing statistical software for the mixed-model analysis. We first assume that the exposure variable Xi(t) in model 1 is scalar. This includes the situations in which the exposure is continuous or binary, and the trend analysis for categorical exposure in which a new continuous variable, median level in each exposure category, is included in model 1. The meta-regression method includes 2 stages of analysis. The first stage is to conduct the subtype-specific analysis for each cross-classified subtype from the multiple markers. For the cohort and nested case-control study, this analysis can be based on model 1. Typically, a standard competing risks framework can be used, where it is assumed that only 1 disease subtype can be observed in each individual. The occurrence of a disease subtype that is different from the subtype for which the exposure association is studied is censored at the date of diagnosis. The model for the second-stage analysis is
(3) |
where the estimated log(RR) representing the exposure association with the jth subtype, is obtained in the first-stage analysis, and ej's are within-study sampling errors; that is, Because, in the competing risk framework, the relative risks for distinct tumor subtypes are asymptotically uncorrelated (45), this meta-regression for J subtypes is the same as the standard meta-regression for J independent studies. Interactions of spj can be included as covariates in model 3 if appropriate. We can use the Wald test to test the hypothesis H0 : γp = 0 for each p. This null hypothesis implies that the exposure-subtype association does not change over the level of the pth marker variable while controlling for the other marker variables. For a categorical marker, we can also test whether γp = 0 for all p's corresponding to the binary marker variables created for this categorical marker; the null hypothesis implies that the categorical marker does not contribute to the possible etiological heterogeneity. Note that the difference between this 2-stage method with a fixed-effects meta-regression model and the 1-stage method is essentially only in the estimation method, not the model.
We can also add subtype-specific random effects in model 3 to account for heterogeneity between subtypes that cannot be explained by the variables in model 3. Below is a random-effects meta-regression model (48),
(4) |
where are subtype-specific random effects accounting for heterogeneity between the subtypes that cannot be explained by the variables spj and ej, and ej, defined in model 3, is assumed independent of bj. This random-effects 2-stage method uses a different model from the fixed-effects 2-stage and 1-stage methods. It has the advantage over both the fixed-effects 2-stage method and the 1-stage method in that it can incorporate additional heterogeneity between subtypes that cannot be explained by the given marker variables. If where model 4 agrees with model 3, the random-effects meta-regression model method is typically less efficient than the fixed-effects method, and because the 1-stage method is a maximum likelihood method, it should be the most efficient among the 3 methods. In the random-effects model, the test assesses the significance of the random-effects term. Note that when the number of subtypes is small, this test may be underpowered and the estimate of may be imprecise. When the test rejects or when we believe there is heterogeneity in addition to those explained by the marker variables, we may use the random-effects model in the 2-stage method.
Unmatched case-control study
In the unmatched case-control design, the first-stage model of the 2-stage method can be the nominal polytomous logistic regression
where Y = j represents subtype j cases, Y = 0 represents controls, and β1j represents the subtype-specific log odds ratio, assumed to be scalar. The scenarios where the exposure is a vector will be considered in a later section. If the disease is rare, exp(β1j) approximates RR. In this design, the subtype-specific association estimates, are typically correlated. The second-stage model of the 2-stage method is the meta-regression model 3 or 4 with an additional condition: R function rma.mv() can be used to estimate p = 1, …, P, in models 3 and 4 and the variance of (49). We can then use the Wald test to test the hypothesis H0 : γp = 0 for each p, or we can test whether γp = 0 for all p's corresponding to the binary marker variables created for a categorical marker.
Interaction between markers
The adjusted proposed by Rosner et al. (35) can also be estimated in models 3 and 4. For example, if there are 2 binary markers, cross-classification of which defines 4 subtypes, and the second-stage model of the fixed-effects meta-regression method is where γp represents the difference in exposure-disease subtype associations between the 2 subtypes defined by the pth marker while the level of the other marker is the same, p = 1,2. The meta-regression method can also be used to evaluate whether the difference in exposure-disease subtype association across the subtypes defined by 1 marker depends on the level of another marker by including appropriate interaction terms for these markers in the meta-regression model. For example, in the second-stage fixed-effects model, rejection of the null hypothesis H0 : γ3 = 0 implies that the difference in exposure-disease subtype associations across the subtypes defined by the first marker depends on the level of the second marker. The discussion above, which is for the fixed-effects 2-stage method, can be easily extended to the random-effects method.
Categorical exposures and multiple exposures
Let β1j = (β1j1, …, β1jK), K > 1, represent the subtype-specific exposure-disease association corresponding to binary indicators created for a categorical exposure with K + 1 levels, or multiple exposures, 1 or more of which could be categorical exposures, for which binary indicators are created. The first-stage analysis of the 2-stage method, which is the subtype-specific analysis for each cross-classified subtype, is the same as in the cases when β1j is scalar. At the second stage, 1 strategy is to conduct the meta-regression analysis for each element of β1j separately. For the kth element of β1j, the random-effects meta-regression model or the fixed-effects meta-regression model, which does not include the random-effects term bjk, may be used to characterize the relationship between β1jk and levels of the multiple markers. For an any given k, in cohort and nested case-control studies, ejk's, j = 1, …, J, are independent, and in unmatched case-control studies,
Alternatively, the second-stage model can be a random-effects multivariate meta-regression model (50, 51)
(5) |
where the error term ej = (ej1, …, ejK) is a K-dimension normal distribution with for k1 ≠ k2, and In cohort and nested case-control studies, and for unmatched case-control studies, for j1 ≠ j2, k1, k2 = 1, …, K. The random-effects term bj is a K-dimension normal distribution with mean 0, independent from ej. The fixed-effects multivariate meta-regression model is model 5 with bj excluded. As pointed out previously (50, 51), the estimator of rpk using the multivariate random-effects meta-regression method is more efficient than that from the univariate random-effects meta-regression method presented above. Presumably the same conclusion can be made on the fixed-effects models. R function rma.mv() can be used to estimate in the random-effects and fixed-effects multivariate meta-regression models.
EXAMPLE
To illustrate the proposed meta-regression method for multiple markers, we examine the associations between smoking status (never, former, current) and 8 possible colorectal cancer subtypes defined by 3 binary markers, CIMP (high vs. low/negative), MSI (high vs. microsatellite stable (MSS)), and BRAF (mutant vs. wild type). The smoking status is coded as 0 for never, 1 for former, and 2 for current, and the trend association is examined. The analysis includes 88,620 women in the Nurses’ Health Study (NHS), following from 1980 to 2010, and 46,251 men in the Health Professionals Follow-up Study (HPFS), following from 1986 to 2010, with 3,099,586 person-years of follow-up. In each cohort, 1 subtype with fewer than 5 cases (low-level/negative CIMP, high-level MSI, mutated BRAF) was excluded, leading to a total of 1,118 colorectal cancer cases (654 women in NHS and 464 men in HPFS) in the remaining 7 subtypes.
In the first stage of the 2-stage meta-regression approach, a subtype-specific multivariate Cox model analysis, stratified by age (months) and calendar year of the questionnaire cycle, as well as adjusted for potential confounders, was performed for each cohort. Table 1 contains subtype definitions, subtype-specific case numbers, and the estimated smoking status-colorectal cancer subtype associations in the NHS and HPFS. In the second-stage analysis, we modeled the subtype and cohort-specific log(RR) using the 3 markers considered (MSI, CIMP, and BRAF) and cohort (NHS vs. HPFS) and compared the results with those from the 1-stage method (33–35); in the 1-stage method, we conducted the Cox model analysis for each cohort using the data duplication method and then combined the estimates from NHS and HPFS by the fixed-effects meta-analysis approach. Table 2 shows inferences for the function exp() of the coefficients of the marker variables in the model for log(RR) that represent the ratios of RRs between marker levels. For example, based on the meta-regression method, the estimated ratio of the RR for the association of smoking with high-level CIMP colorectal cancer over the RR for low-level/negative CIMP colorectal cancer, while the MSI and BRAF levels stay the same, was 1.23 (95% confidence interval: 0.84, 1.82). As shown in Table 2, the results from these 2 methods were consistent. The results from this analysis suggest that we do not have sufficient statistical evidence to conclude that the smoking-colorectal cancer subtype associations are different across subtypes defined by any 1 of the biomarkers (MSI, CIMP, and BRAF) while controlling for the other 2 biomarkers.
Table 1.
Subtype | CIMP | MSI | BRAF | No. of Cases | RR | 95% CIb | P Valueb |
---|---|---|---|---|---|---|---|
1 | L/N | MSS | Wild type | 832 | 1.12 | 1.01, 1.25 | 0.039 |
2 | L/N | MSS | Mutant | 47 | 0.86 | 0.54, 1.37 | 0.53 |
3 | L/N | High | Wild type | 42 | 1.35 | 0.80, 2.25 | 0.26 |
4 | High | MSS | Wild type | 34 | 1.28 | 0.71, 2.32 | 0.41 |
5 | High | MSS | Mutant | 31 | 1.00 | 0.57, 1.78 | 0.99 |
6 | High | High | Wild type | 43 | 1.93 | 1.18, 3.14 | 0.008 |
7 | High | High | Mutant | 95 | 1.45 | 1.05, 2.00 | 0.026 |
Abbreviations: BRAF, B-Raf protooncogene, serine/threonine kinase; CI, confidence interval; CIMP, CpG island methylator phenotype; L/N, low/negative; MSI, microsatellite instability; MSS, microsatellite stable; RR, relative risk.
a The analysis includes only subtypes with ≥5 cases. The subtype-specific analyses were controlled for body mass index expressed as weight (kg)/height (m)2 (<25, 25–29.9, ≥30), family history of colorectal cancer (yes/no), physical activity in metabolic equivalent tasks (quintiles), red meat intake (quintiles of servings/day), alcohol consumption (0, quartiles of g/day), total caloric intake (quintiles of calories/day), regular aspirin use (2 or more tablets/week or at least 2 times/week or less) and stratified by age (months) and calendar year. Postmenopausal hormone use (never/ever) is also adjusted in the Nurses’ Health Study.
b The cohort-specific estimates were combined by using a fixed-effects meta-analysis method.
Table 2.
Marker | Two-Stage Approach |
One-Stage Approach |
||||
---|---|---|---|---|---|---|
RRR | 95% CI | P Value | RRR | 95% CI | P Value | |
CIMP | 1.23 | 0.84, 1.82 | 0.29 | 1.28 | 0.87, 1.88 | 0.21 |
MSI | 1.34 | 0.93, 1.91 | 0.11 | 1.31 | 0.92, 1.87 | 0.13 |
BRAF | 0.78 | 0.55, 1.09 | 0.14 | 0.78 | 0.56, 1.10 | 0.16 |
Abbreviations: BRAF, B-Raf protooncogene, serine/threonine kinase; CI, confidence interval; CIMP, CpG island methylator phenotype; MSI, microsatellite instability; RRR, ratio of relative risks.
In a second analysis for illustrating the proposed meta-regression method, the first-stage analysis was the same as before, but in the second stage, we started from a model with all 3 markers, 2-way interactions of the markers, and cohort, and then used stepwise model selection with a cutoff P = 0.05 for entering or removing the variables. This analysis was for selecting covariates in the meta-regression model that are important for characterizing the subtype-specific exposure-disease association. Only MSI was in the final model (ratio of RR for high-level MSI vs. MSS = 1.38, 95% confidence interval: 1.07, 1.79; P value = 0.015).
DISCUSSION
When subtypes are defined by multiple categorical and/or ordinal markers, we propose a meta-regression method that is intuitive, does not need augmentation of the data set, and can be easily implemented using existing statistical software such as SAS procedures for the mixed-model analysis. This meta-regression method can be used to test for etiological heterogeneity across multiple disease subtypes classified by multiple markers, to assess whether the exposure-disease subtype associations are different across subtypes defined by 1 marker while controlling for other markers, and to evaluate whether the difference in exposure-disease subtype association across subtypes by 1 marker depends on any of other markers.
Addressing etiological heterogeneity by MPE research has relevance to disease prevention. As an example, we herein discuss smoking, colonoscopy, and colorectal cancer risk. Colonoscopy has been associated with lower colorectal cancer risk for up to 10 years after the procedure in individuals with average risk for developing colorectal cancer (52); however, it remains to be determined whether colonoscopy every 10 years is also effective for colorectal cancer prevention in high-risk individuals. A recent MPE study suggests that the preventive effect of colonoscopy may be weaker for high-level MSI colorectal cancer than for non–high-level MSI colorectal cancer (52). MPE studies (16–18, 20, 39–42) have also shown that smokers are susceptible to developing high-level MSI colorectal cancer. Taken together, it is implied that the preventive effect of colonoscopy is not as effective for smokers compared with nonsmokers. Hence, MPE research can help us toward more personalized disease- prevention strategies.
In addition to heterogeneity between tumors across individuals, accumulating evidence has indicated heterogeneity within 1 tumor in 1 individual. An integrative concept (“the unique tumor principle”) on intra- and intertumor heterogeneity along with epidemiologic exposures has been discussed in detail (53). Though our current paper primarily addresses intertumor (or interindividual) heterogeneity, it is of interest to develop new statistical methodologies to address both intra- and intertumor heterogeneities in the future.
As advancements of biomedical technologies, molecular pathology tests are increasingly common in clinical practice, as well as epidemiologic studies (54–56). The MPE approach is useful for not only assessment of risk of developing disease but also evaluation of predictive biomarkers for intervention in a disease population (57). In the future, routine clinical molecular pathology data may be integrated into population-based disease registries and databases, and large-scale MPE studies can be routine research practice (58). Thus, our methodology will be widely useful.
We developed a user-friendly SAS macro %stepmetareg implementing this meta-regression method. It includes a stepwise selection procedure to select covariates considered in the meta-regression model that are important for characterizing the subtype-specific exposure-disease association, represented by The SAS macro can be obtained at the website http://www.hsph.harvard.edu/donna-spiegelman/software/.
This meta-regression method will be most useful in situations where the number of subtypes is relatively low; otherwise, the number of cases for each unique tumor subtype defined by cross-classification of the multiple markers may be too small to obtain stable estimates of each β1j. The minimum number of cases required for each tumor subtype for obtaining stable estimates of each β1j depends on the number of covariates in the first-stage model. A rule of thumb for the minimum events per covariate is 5–10. An advantage of the proposed 2-stage method for cohort studies is that j =1, …, J can be estimated separately, without using the data duplication method, which becomes computationally infeasible when the augmented data set becomes too large. In addition, the random-effects model has the advantage that it can incorporate additional heterogeneity between subtypes that cannot be explained by the given marker variables.
Disease subtype data are often missing in some proportion of cases. Chatterjee et al. (34) developed an estimating function method based on model 2 that can be used to handle missing subtype data under a missing-at-random assumption. That method can be used directly to handle missing subtype data for estimating β1j in the first stage of the 2-stage models. Statistical methods for handling missing marker data, which are covariates data now, in the second-stage model of the 2-stage method may be developed through extension of existing methods for missing covariates data problems in the mixed-model analysis; this is a topic of future research. Alternatively, we may use the conventional method of creating missing indicators for missing markers data, as well as the method of imputing the missing marker data based on regression models that link the marker data and covariates that contain information about the marker data. When these methods are used, the 2-stage method with a random-effect meta-regression model could have the advantage of partially taking into account additional variability due to using missing indicators or using imputed marker data through the random-effect term; future research is needed for this topic.
In conclusion, in consideration of pathogenesis and etiological heterogeneity of disease, we developed a meta-regression method to study etiological heterogeneity across disease subtypes defined by multiple biomarkers. This method is useful in the emerging interdisciplinary field of molecular pathological epidemiology (32, 59). There is an increasing need to integrate molecular pathology and epidemiology to better understand disease etiologies and causalities (59–62). Our meta-regression method can be widely useful, as use of molecular pathology and genomic technologies is increasingly common in clinical medicine and public health.
ACKNOWLEDGMENTS
Author affiliations: Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts (Molin Wang, Shuji Ogino); Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts (Molin Wang); Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts (Molin Wang); Biostatistics Division, Center for Research Administration and Support, National Cancer Center, Tokyo, Japan (Aya Kuchiba); Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts (Shuji Ogino); and Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts (Shuji Ogino).
This work was supported by US National Institutes of Health grants (R01 CA151993 and R35 CA197735 to S.O., P01 CA87969 to S. E. Hankinson, UM1 CA186107 to M. J. Stampfer, P01 CA55075 and UM1 CA167552 to W. C. Willett, and P50 CA127003 to C. S. Fuchs); grants from The Paula and Russell Agrusa Fund for Colorectal Cancer Research (to C. S. Fuchs); and the Friends of the Dana-Farber Cancer Institute (to S.O.).
We deeply thank hospitals and pathology departments throughout the United States for generously providing us with tissue specimens. We also would like to thank the following state cancer registries for their help: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Nebraska, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Virginia, Washington, and Wyoming.
Conflict of interest: none declared.
REFERENCES
- 1.Ogino S, Lochhead P, Chan AT, et al. Molecular pathological epidemiology of epigenetics: emerging integrative science to analyze environment, host, and disease. Mod Pathol. 2013;264:465–484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ogino S, Chan AT, Fuchs CS, et al. Molecular pathological epidemiology of colorectal neoplasia: an emerging transdisciplinary and interdisciplinary field. Gut. 2011;603:397–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chen H, Taylor NP, Sotamaa KM, et al. Evidence for heritable predisposition to epigenetic silencing of MLH1. Int J Cancer. 2007;1208:1684–1688. [DOI] [PubMed] [Google Scholar]
- 4.Allan JM, Shorto J, Adlard J, et al. MLH1 -93G>A promoter polymorphism and risk of mismatch repair deficient colorectal cancer. Int J Cancer. 2008;12310:2456–2459. [DOI] [PubMed] [Google Scholar]
- 5.Campbell PT, Curtin K, Ulrich CM, et al. Mismatch repair polymorphisms and risk of colon cancer, tumour microsatellite instability and interactions with lifestyle factors. Gut. 2009;585:661–667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Raptis S, Mrkonjic M, Green RC, et al. MLH1 -93G>A promoter polymorphism and the risk of microsatellite-unstable colorectal cancer. J Natl Cancer Inst. 2007;996:463–474. [DOI] [PubMed] [Google Scholar]
- 7.Samowitz WS, Curtin K, Wolff RK, et al. The MLH1 -93 G>A promoter polymorphism and genetic and epigenetic alterations in colon cancer. Genes Chromosomes Cancer. 2008;4710:835–844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ogino S, Hazra A, Tranah GJ, et al. MGMT germline polymorphism is associated with somatic MGMT promoter methylation and gene silencing in colorectal cancer. Carcinogenesis. 2007;289:1985–1990. [DOI] [PubMed] [Google Scholar]
- 9.Hawkins NJ, Lee JH, Wong JJ, et al. MGMT methylation is associated primarily with the germline C>T SNP (rs16906252) in colorectal cancer and normal colonic mucosa. Mod Pathol. 2009;2212:1588–1599. [DOI] [PubMed] [Google Scholar]
- 10.Slattery ML, Curtin K, Anderson K, et al. Associations between cigarette smoking, lifestyle factors, and microsatellite instability in colon tumors. J Natl Cancer Inst. 2000;9222:1831–1836. [DOI] [PubMed] [Google Scholar]
- 11.Satia JA, Keku T, Galanko JA, et al. Diet, lifestyle, and genomic instability in the North Carolina Colon Cancer Study. Cancer Epidemiol Biomarkers Prev. 2005;142:429–436. [DOI] [PubMed] [Google Scholar]
- 12.Slattery ML, Curtin K, Sweeney C, et al. Diet and lifestyle factor associations with CpG island methylator phenotype and BRAF mutations in colon cancer. Int J Cancer. 2007;1203:656–663. [DOI] [PubMed] [Google Scholar]
- 13.Campbell PT, Jacobs ET, Ulrich CM, et al. Case-control study of overweight, obesity, and colorectal cancer risk, overall and by tumor microsatellite instability status. J Natl Cancer Inst. 2010;1026:391–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kuchiba A, Morikawa T, Yamauchi M, et al. Body mass index and risk of colorectal cancer according to fatty acid synthase expression in the Nurses’ Health Study. J Natl Cancer Inst. 2012;1045:415–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wu AH, Shibata D, Yu MC, et al. Dietary heterocyclic amines and microsatellite instability in colon adenocarcinomas. Carcinogenesis. 2001;2210:1681–1684. [DOI] [PubMed] [Google Scholar]
- 16.Chia VM, Newcomb PA, Bigler J, et al. Risk of microsatellite-unstable colorectal cancer is associated jointly with smoking and nonsteroidal anti-inflammatory drug use. Cancer Res. 2006;6613:6877–6883. [DOI] [PubMed] [Google Scholar]
- 17.Samowitz WS, Albertsen H, Sweeney C, et al. Association of smoking, CpG island methylator phenotype, and V600E BRAF mutations in colon cancer. J Natl Cancer Inst. 2006;9823:1731–1738. [DOI] [PubMed] [Google Scholar]
- 18.Poynter JN, Haile RW, Siegmund KD, et al. Associations between smoking, alcohol consumption, and colorectal cancer, overall and by tumor microsatellite instability status. Cancer Epidemiol Biomarkers Prev. 2009;1810:2745–2750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rozek LS, Herron CM, Greenson JK, et al. Smoking, gender, and ethnicity predict somatic BRAF mutations in colorectal cancer. Cancer Epidemiol Biomarkers Prev. 2010;193:838–843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Limsui D, Vierkant RA, Tillmans LS, et al. Cigarette smoking and colorectal cancer risk by molecularly defined subtypes. J Natl Cancer Inst. 2010;10214:1012–1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Leng S, Bernauer AM, Hong C, et al. The A/G allele of rs16906252 predicts for MGMT methylation and is selectively silenced in premalignant lesions from smokers and in lung adenocarcinomas. Clin Cancer Res. 2011;177:2014–2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ahrendt SA, Decker PA, Alawi EA, et al. Cigarette smoking is strongly associated with mutation of the K-ras gene in patients with primary adenocarcinoma of the lung. Cancer. 2001;926:1525–1530. [DOI] [PubMed] [Google Scholar]
- 23.Riely GJ, Kris MG, Rosenbaum D, et al. Frequency and distinctive spectrum of KRAS mutations in never smokers with lung adenocarcinoma. Clin Cancer Res. 2008;1418:5731–5734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Riely GJ, Marks J, Pao W. KRAS mutations in non-small cell lung cancer. Proc Am Thorac Soc. 2009;62:201–205. [DOI] [PubMed] [Google Scholar]
- 25.Julin B, Bergkvist C, Wolk A, et al. Cadmium in diet and risk of cardiovascular disease in women. Epidemiology. 2013;246:880–885. [DOI] [PubMed] [Google Scholar]
- 26.Jeong I, Rhie J, Kim I, et al. Working hours and cardiovascular disease in Korean workers: a case-control study. J Occup Health. 2014;555:385–391. [DOI] [PubMed] [Google Scholar]
- 27.Doshi-Velez F, Ge Y, Kohane I. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics. 2014;1331:e54–e63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Dandri M, Locarnini S. New insight in the pathobiology of hepatitis B virus infection. Gut. 2012;61(suppl 1):i6–i17. [DOI] [PubMed] [Google Scholar]
- 29.Perez OD. Appreciating the heterogeneity in autoimmune disease: multiparameter assessment of intracellular signaling mechanisms. Ann N Y Acad Sci. 2005;1062:155–164. [DOI] [PubMed] [Google Scholar]
- 30.Takamoto M, Kaburaki T, Mabuchi A, et al. Common variants on chromosome 9p21 are associated with normal tension glaucoma. PLoS One. 2012;77:e40107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Field AE, Camargo CA, Jr, Ogino S. The merits of subtyping obesity: one size does not fit all. JAMA. 2013;31020:2147–2148. [DOI] [PubMed] [Google Scholar]
- 32.Ogino S, Stampfer M. Lifestyle factors and microsatellite instability in colorectal cancer: the evolving field of molecular pathological epidemiology. J Natl Cancer Inst. 2010;1026:365–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chatterjee N. A two-stage regression model for epidemiological studies with multivariate disease classification data. J Am Stat Assoc. 2004;99465:127–138. [Google Scholar]
- 34.Chatterjee N, Sinha S, Diver WR, et al. Analysis of cohort studies with multivariate and partially observed disease classification data. Biometrika. 2010;973:683–698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rosner B, Glynn RJ, Tamimi RM, et al. Breast cancer risk prediction with heterogeneous risk profiles according to breast cancer tumor markers. Am J Epidemiol. 2013;1782:296–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Begg CB, Zabor EC, Bernstein JL, et al. A conceptual and methodological framework for investigating etiologic heterogeneity. Stat Med. 2013;3229:5039–5052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ogino S, Goel A. Molecular classification and correlates in colorectal cancer. J Mol Diagn. 2008;101:13–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Nosho K, Irahara N, Shima K, et al. Comprehensive biostatistical analysis of CpG island methylator phenotype in colorectal cancer using a large population-based sample. PLoS One. 2008;311:e3698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lindor NM, Yang P, Evans I, et al. Alpha-1-antitrypsin deficiency and smoking as risk factors for mismatch repair deficient colorectal cancer: a study from the colon cancer family registry. Mol Genet Metab. 2010;992:157–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Phipps AI, Baron J, Newcomb PA. Prediagnostic smoking history, alcohol consumption, and colorectal cancer survival: the Seattle Colon Cancer Family Registry. Cancer. 2011;11721:4948–4957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Eaton AM, Sandler R, Carethers JM, et al. 5,10-Methylenetetrahydrofolate reductase 677 and 1298 polymorphisms, folate intake, and microsatellite instability in colon cancer. Cancer Epidemiol Biomarkers Prev. 2005;148:2023–2029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nishihara R, Morikawa T, Kuchiba A, et al. A prospective study of duration of smoking cessation and colorectal cancer risk by epigenetics-related tumor classification. Am J Epidemiol. 2013;1781:84–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Curtin K, Samowitz WS, Wolff RK, et al. Somatic alterations, metabolizing genes and smoking in rectal cancer. Int J Cancer. 2009;1251:158–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York, NY: Wiley; 1980. [Google Scholar]
- 45.Prentice RL, Kalbfleisch JD, Peterson AV, Jr, et al. The analysis of failure times in the presence of competing risks. Biometrics. 1978;344:541–554. [PubMed] [Google Scholar]
- 46.Prentice RL. On the design of synthetic case-control studies. Biometrics. 1986;422:301–310. [PubMed] [Google Scholar]
- 47.Lunn M, McNeil D. Applying Cox regression to competing risks. Biometrics. 1995;512:524–532. [PubMed] [Google Scholar]
- 48.Stram DO. Meta-analysis of published data using a linear mixed-effects model. Biometrics. 1996;522:536–544. [PubMed] [Google Scholar]
- 49.Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;363:1–48. [Google Scholar]
- 50.Ritz J, Demidenko E, Spiegelman D. Multivariate meta-analysis for data consortia, individual patient meta-analysis, and pooling projects. J Stat Plan Inference. 2008;1387:1919–1933. [Google Scholar]
- 51.van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med. 2002;214:589–624. [DOI] [PubMed] [Google Scholar]
- 52.Nishihara R, Wu K, Lochhead P, et al. Long-term colorectal-cancer incidence and mortality after lower endoscopy. N Engl J Med. 2013;36912:1095–1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Ogino S, Fuchs CS, Giovannucci E. How many molecular subtypes? Implications of the unique tumor principle in personalized medicine. Expert Rev Mol Diagn. 2012;126:621–628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Colussi D, Brandi G, Bazzoli F, et al. Molecular pathways involved in colorectal cancer: implications for disease behavior and prevention. Int J Mol Sci. 2013;148:16365–16385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Phipps AI, Limburg PJ, Baron JA, et al. Association between molecular subtypes of colorectal cancer and patient survival. Gastroenterology. 2015;1481:77–87.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Caiazza F, Ryan EJ, Doherty G, et al. Estrogen receptors and their implications in colorectal carcinogenesis. Front Oncol. 2015;5:19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Liao X, Lochhead P, Nishihara R, et al. Aspirin use, tumor PIK3CA mutation, and colorectal-cancer survival. N Engl J Med. 2012;36717:1596–1606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ogino S, Lochhead P, Giovannucci E, et al. Discovery of colorectal cancer PIK3CA mutation as potential predictive biomarker: power and promise of molecular pathological epidemiology. Oncogene. 2014;3323:2949–2955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Ogino S, King EE, Beck AH, et al. Interdisciplinary education to integrate pathology and epidemiology: towards molecular and population-level health science. Am J Epidemiol. 2012;1768:659–667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Campbell PT, Deka A, Briggs P, et al. Establishment of the Cancer Prevention Study II Nutrition Cohort Colorectal Tissue Repository. Cancer Epidemiol Biomarkers Prev. 2014;2312:2694–2702. [DOI] [PubMed] [Google Scholar]
- 61.Wild CP, Bucher JR, de Jong BW, et al. Translational cancer research: balancing prevention and treatment to combat cancer globally. J Natl Cancer Inst. 2014;1071:353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kuller LH, Bracken MB, Ogino S, et al. The role of epidemiology in the era of molecular epidemiology and genomics: summary of the 2013 AJE-sponsored Society of Epidemiologic Research Symposium. Am J Epidemiol. 2013;1789:1350–1354. [DOI] [PMC free article] [PubMed] [Google Scholar]