Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2013 May 5;178(4):652–660. doi: 10.1093/aje/kwt009

Biological and Statistical Approaches for Modeling Exposure to Specific Trihalomethanes and Bladder Cancer Risk

Lucas A Salas, Kenneth P Cantor, Adonina Tardon, Consol Serra, Alfredo Carrato, Reina Garcia-Closas, Nathaniel Rothman, Núria Malats, Debra Silverman, Manolis Kogevinas, Cristina M Villanueva *
PMCID: PMC3736753  PMID: 23648803

Abstract

Lifetime exposure to trihalomethanes (THM) has been associated with increased risk of bladder cancer. We explored methods of analyzing bladder cancer risk associated with 4 THM (chloroform, bromodichloromethane, dibromochloromethane, and bromoform) as surrogates for disinfection by-product (DBP) mixtures in a case-control study in Spain (1998–2001). Lifetime average concentrations of THM in the households of 686 incident bladder cancer cases and 750 matched hospital-based controls were calculated. Several exposure metrics were modeled through conditional logistic regression, including the following analyses: total THM (μg/L), cytotoxicity-weighted sum of total THM (pmol/L), 4 THM in separate models, 4 THM in 1 model, chloroform and the sum of brominated THM in 1 model, and a principal-components analysis. THM composition, concentrations, and correlations varied between areas. The model for total THM was stable and showed increasing dose-response trends. Models for separate THM provided unstable estimates and inconsistent dose-response relationships. Risk estimation for specific THM is hampered by the varying composition of the mixture, correlation between species, and imprecision of historical estimates. Total THM (μg/L) provided a proxy measure of DBPs that yielded the strongest dose-response relationship with bladder cancer risk. A variety of metrics and statistical approaches should be used to evaluate this association in other settings.

Keywords: chloroform, complex mixtures, logistic models, principal-components analysis, Spain, trihalomethanes, urinary bladder neoplasms


Drinking-water disinfection systems use reactive chemicals, such as chlorine, to inactivate microbiological threats to human health. However, disinfectants react with organic matter to produce a variety of undesired chemicals, known as disinfection by-products (DBPs) (1). Approximately 600–700 DBPs have been identified, and more compounds are being identified (2). The most widely used disinfectant is chlorine. Trihalomethanes (THM) constitute the most prevalent class of DBPs, representing 10%–20% of the mixture in chlorinated water (3). Total trihalomethanes (TTHM) are defined as the sum (in µg/L) of 4 constituents: chloroform, bromodichloromethane (BDCM), dibromochloromethane (DBCM), and bromoform.

TTHM have been used as surrogates for the halogenated DBP mixture in epidemiologic studies. Several cohort and case-control studies have shown that long-term exposure (over 40 years) to the sum of the 4 THM increases cancer risks, especially risk of bladder neoplasms (410). Experimental in vivo and in vitro studies have tested the cytotoxicity, mutagenicity, and carcinogenicity of individual compounds (1114). Enzyme-dependent mutagenicity has been found in vitro for brominated THM, such as bromoform and DBCM (15, 16). Liver and kidney carcinogenicity has occurred in rodents exposed to BDCM and other DBPs via drinking water (17). The International Agency for Research on Cancer has classified chloroform and BDCM as possible human carcinogens (1822). Other DBPs have also been identified as carcinogens and mutagens—for example, halogenated hydroxyfuranones (e.g., 3-chloro-4-(dichloromethyl)-5-hydroxy-2(5H)-furanone, also called mutagen X or MX), nitrogenated DBPs, and iodinated DBPs (23). Long-term exposure assessment of these substances is difficult because they are unregulated, they appear in very low concentrations, routine measurements are unavailable, and estimates of historical exposures cannot be established (24).

As a result, estimation of the risks associated with DBP exposure is a challenge for environmental epidemiologists. Much of the epidemiologic evidence relies on exposure estimates based on TTHM or on a single compound, such as chloroform, to represent the mixture. The first approach assumes that all the constituents of the mixture are equivalent, ignoring the experimental evidence of their differential toxicity. On the other hand, component-based analyses assume that the mixture's constituents are different and may be analyzed individually (25). However, this approach ignores correlations between THM and how they are related to unknown constituents of the DBP mixture. Adjustment for multiple components that are weakly correlated could address this problem.

We explored different biology-based approaches and used statistical modeling to evaluate the bladder cancer risk associated with 4 THM (chloroform, BDCM, DBCM, and bromoform) in a case-control study.

MATERIALS AND METHODS

Study design and participants

We used data from the Spanish Bladder Cancer Study, a multicenter, hospital-based case-control study conducted in Spain between June 1998 and June 2001. Cases and controls were recruited from 6 geographical areas: Alicante, Asturias, Barcelona, Tenerife, Manresa, and Sabadell (Vallès/Bages). Cases were patients aged 20–80 years with a histologically confirmed diagnosis of primary bladder cancer who were living in one of the geographic catchment areas of the participating hospitals. Cases were identified through the hospital urological services at diagnosis. Complete case ascertainment was guaranteed through regular evaluations of local cancer registries and hospital discharge and pathology records. Controls were patients admitted to the participating hospitals with diagnoses unrelated to the main risk factors for bladder cancer, such as tobacco use. Control diagnoses included: circulatory, dermatological, and ophthalmological disorders (4%, 2%, and 1%, respectively), fractures (23%), hernias (37%), hydrocele (12%), other abdominal surgery (11%), other orthopedic problems (7%), and other diseases (3%). Controls were individually matched to cases by sex, age group (5-year strata), and geographical area of residence.

The study protocol was approved by the institutional review boards of the participating institutions. All participants gave informed consent beforehand. A total of 1,457 eligible cases and 1,465 eligible controls were identified. Among them, 84% of cases (n = 1,219) and 87% of controls (n = 1,271) participated.

Personal interview

Trained interviewers administered a comprehensive computer-assisted personal questionnaire to participants during their hospitalization. Collected information included sociodemographic characteristics, smoking habits, family history of cancer, and medical, occupational, and residential histories from birth (for all residences of at least 1 year). Residential histories provided information on water exposures relevant to the present analysis. In addition, a food frequency questionnaire was self-administered. When a subject refused to answer the questionnaire, a reduced interview on critical items was administered (21% of cases, 19% of controls).

Water utility data

Current and historical information about water source, treatment, and quality was obtained from public water supplies in the study areas. Structured questionnaires were sent to approximately 200 local authorities and 150 water companies to ascertain: 1) the proportions of groundwater and surface water sources over the years back to 1920; 2) the year in which chlorination started at each utility; and 3) annual average concentration (in µg/L) of chloroform, BDCM, DBCM, and bromoform, when available. The amount of data collected differed among areas. Data on water-source history and year in which chlorination started were available for 123 municipalities, accounting for 78.5% of person-years from lifetime residential histories. In addition, 48 of these municipalities also had data on chloroform, BDCM, DBCM, and bromoform levels (58% of person-years). To augment the database of water utility measurements, we measured chloroform, BDCM, DBCM, and bromoform levels in 113 tap water samples from the study's geographical areas between September and December 1999.

Estimation of historical levels

We used data on water-source history (proportions of groundwater and surface water over the years), the year in which chlorination was initiated, and available chloroform, BDCM, DBCM, and bromoform levels to estimate annual average levels in the past. We assumed that levels remained unchanged by municipality when water source had not changed. Available measurements were averaged and imputed back to the year 1920, as long as water source remained unchanged. If the water source had changed, the proportion of surface water was used as a weight (26). THM levels before chlorination started were assumed to be zero. The year in which chlorination started varied widely among study municipalities, from 1933 in Barcelona to the 1990s in many small municipalities in Asturias. For those municipalities with a water-source history but missing data on THM levels, levels were imputed from neighboring municipalities with the same water source. Estimation of past levels in Barcelona was conducted at the postal-code level, since the city is supplied by 2 rivers (Llobregat and Ter) with dissimilar raw water characteristics. Details on the exposure assessment are available elsewhere (26, 27).

Lifetime individual exposure indices

For all residences where participants had lived for at least 1 year from birth to the time of interview, the following information was requested: year in which the participant started living in that location, year stopped, full street address, city, province, region, and country. The address was used to ascertain postal code in Barcelona. Individual and municipal databases were merged by year and municipality of residence to obtain annual average levels of chloroform, BDCM, DBCM, and bromoform for each study subject. Different exposure windows were explored, and the period from age 15 years to the time of interview was selected because it maximized the information available, since exposure data were scarce before that age. A time-weighted average level of exposure at all residences where the participant had lived during this exposure window was calculated for all subjects.

Statistical methods

The normality of the interindividual THM levels was examined, and Spearman rank correlation coefficients were calculated overall and by area. In alternative analyses, residuals from a linear regression of the THM components using area of residence as an independent variable were calculated. Spearman correlations between these residuals (partial Spearman correlations) were calculated in order to obtain overall correlations adjusted for area.

Fixed-effects conditional logistic regression (CLR) was used to estimate bladder cancer risk. CLR is the standard technique used to analyze binary matched data, since the resulting coefficients are derived from within-matching-strata comparisons (28). In order to explore nonlinearity of the effects of exposure, we fitted new CLR models in which, instead of using a linear term for exposure, we used spline functions. Splines use piecewise polynomials to model the shape of the association, and their high flexibility allows capturing almost any kind of shape. We used cubic splines with knots at the 10th, 50th, and 90th percentiles, according to Harrell's recommendations in 2012 (29). Exposure coefficients from spline models do not have a direct interpretation. Instead, plots of the resulting associations were used to interpret the results. The models using a linear term and the model using splines were compared via likelihood ratio tests. If the spline model did not provide a statistically significantly better fit, this was taken as support for linearity of the effect. Model fit was examined using Akaike's Information Criterion. The lowest Akaike Information Criterion value determined the best-fitting model.

Different models were fitted using the following exposure variables: 1) TTHM in μg/L, as the sum of the 4 constituents; 2) weighted sum of the 4 THM, obtained by multiplying the concentrations with a weight derived from a mammalian cell cytotoxicity assay (0.4116 × chloroform + 0.3443 × BDCM + 0.7388 × DBCM + bromoform) (23); 3) TTHM concentration on a molar basis as the sum of the 4 constituents in pmol/L; 4) the 4 THM constituents in separate models with single compounds; 5) the 4 THM constituents in 1 model; 6) total brominated THM (BDCM, DBCM, bromoform) and chloroform in 1 model; and 7) principal-components logistic regression. To observe the exposure response, we grouped exposure variables using quartiles as boundaries in the CLR. In models with splines, we kept exposure variables continuous (30).

The principal-components logistic regression was preceded by principal-components analysis (PCA) of the 4 THM. The selected components included were those explaining more than 10% of variance. The component scores were predicted following this procedure: 1) average residential levels of the 4 THM were mean-centered and standardized (converted to z scores); 2) 4 z scores were weighted by score coefficients (correlation coefficients of the corresponding eigenvector); and 3) finally, weighted scores were summed. The procedure was repeated for all selected component scores using the corresponding eigenvector. The scores obtained were entered as independent variables in the CLR and spline models as quartiles and as continuous variables, respectively.

The CLR used the matching groups as fixed effects. All models adjusted for smoking status (never, former, or current cigarette smoker), employment in a high-risk occupation (occupations linked to the production of aromatic amines, rubber manufacture, exposure to dyes and printing in the textile industry, paint, aluminum, tanning and curing of hides, and the driving of motor vehicles (31)), and quartiles of fruit and vegetable consumption. Missing data in the categorical covariates were coded in a separate category and included in the analyses. We calculated 95% confidence intervals for the CLR estimations.

To determine the accuracy of risk estimates, we bootstrapped the confidence intervals of the models using 50 iterations (32). Unfitted sample matched sets were resampled, and confidence intervals of CLR models were adjusted using the bootstrap standard error correction (33). Variation above 10% in the standard error between the original CLR and the bootstrapped estimates was considered to represent instability. Statistical analyses were performed using Stata statistical software, release 12 (StataCorp LP, College Station, Texas) and the POSTRCSPLINE module developed by Marteen Buis (34).

RESULTS

Among all cases and controls (n = 2,490), only persons with a reliable or high-quality interview as reported by the interviewer (n = 2,213; 88.9%) and those with more than 70% of modeled THM data in the exposure window (n = 1,448; 58.2%) were included in the analyses. Original individual matching was broken because of exclusions in the final data set, and subjects were grouped according to matching strata in 83 pooled k1j:k2j groups. Ten groups (12 observations) were unmatched and were excluded, leading to 686 cases and 750 controls suitable for analysis. We compared data on case-control status, sex, age, high-risk occupation, and smoking status between the excluded and included groups. Statistically significant differences were found for age (the mean age of excluded subjects was 2.8 years higher than that of those included; P = 0.005) and smoking status (1.7% more current smokers were excluded from the analyses; P = 0.012). The median age at interview was 66 years, and 87.4% of participants were men (Table 1). Excess risks were found for former and current smokers. Subjects who reported higher fruit and vegetable intake were at lower risk of bladder cancer (Table 1).

Table 1.

Characteristics of Participants in a Case-Control Study of Bladder Cancer, Spanish Bladder Cancer Study, 1998–2001

Characteristic Cases (n = 686)
Controls (n = 750)
Odds Ratioa 95% Confidence Interval
No. % No. %
Sex  
 Male 603 87.9 652 86.9
 Female 83 12.1 98 13.1
Mean age, years 64.57 (10.2)b 63.87 (10.0)
Geographical area  
 Alicante 58 8.5 66 8.8
 Asturias 295 43.0 321 42.8
 Barcelona 118 17.2 137 18.3
 Manresa 26 3.8 26 3.5
 Sabadell 62 9.0 55 7.3
 Tenerife 127 18.5 145 19.3
High-risk profession  
 No 558 81.3 639 85.2 1.00 Reference
 Yes 128 18.7 111 14.8 1.29 0.96, 1.74
Smoking status  
 Never smoker 128 18.7 272 36.3 1.00 Reference
 Former smoker 276 40.2 315 42.0 2.64 1.89, 3.69
 Current smoker 282 41.1 163 21.7 6.01 4.21, 8.58
  Ptrend <0.001
Quartile of fruit and vegetable consumption, g/dayc  
 0–421.8 166 29.5 137 24.9 1.00 Reference
 >421.8–671.0 148 26.3 135 24.5 0.88 0.63, 1.23
 >671.0–1,000.6 142 25.2 138 25.0 0.77 0.55, 1.08
 >1,000.6 107 19.0 141 25.6 0.59 0.41, 0.83
  Ptrend 0.023

a Odds ratios from conditional logistic regression models stratified by matched set. Cases and controls were matched on sex, age (in 5-year groups), and geographical area.

b Numbers in parentheses, standard deviation.

c Numbers do not total 686 and 750 because of missing data (included in a separate category).

Concentrations of TTHM and specific THM in the residences of study subjects varied among study areas (Table 2). Median levels of chloroform were elevated in Manresa (37.1 µg/L; interquartile range (IQR), 16.8–44.6), Barcelona (20.6 µg/L; IQR, 18.4–25.1), and Asturias (16.1 µg/L; IQR, 9.4–22.3), while the median bromoform level was elevated in Alicante (17.8 µg/L; IQR, 10.0–20.9), Sabadell (9.3 µg/L; IQR, 7.2–10.6), and Tenerife (2.5 µg/L; IQR, 1.9–3.1).

Table 2.

Lifetime Average Level of Trihalomethanes in the Residences of Study Subjects, Spanish Bladder Cancer Study, Spain, 1998–2001

Area No. % TTHM
Chloroform
BDCM
DBCM
Bromoform
Median IQRa Median IQR Median IQR Median IQR Median IQR
All areas 1,436 100.0 27.2 9.4, 49.8 15.0 3.4, 21.2 4.7 1.5, 16.8 0.9 0.5, 8.2 1.5 0.5, 3.9
Alicante 124 8.6 72.1 64.8, 85 14.1 12.3, 16.1 23.2 20.2, 27.2 20.0 17.2, 23.3 17.8 10.0, 20.9
Asturias 616 42.9 20.8 12.6, 28.3 16.1 9.4, 22.3 3.7 2.3, 4.9 0.6 0.4, 0.7 0.4 0.3, 0.8
Barcelona 255 17.8 59.4 48.4, 71.8 20.6 18.4, 25.1 22.1 18.4, 25.1 9.6 7.6, 12.2 1.8 0.9, 11.4
Manresa 52 3.6 49.3 35.1, 57.7 37.1 16.8, 44.6 8.9 6.3, 9.4 1.6 1.1, 1.7 1.9 1.6, 2.0
Sabadell 117 8.2 44.2 38.8, 52.2 14.2 12.4, 18.6 11.2 9.8, 15.6 8.1 7.2, 9.6 9.3 7.2, 10.6
Tenerife 272 18.9 4.0 3.1, 5.8 0.6 0.2, 1.0 0.5 0.4, 0.7 0.7 0.5, 1.0 2.5 1.9, 3.1

Abbreviations: BDCM, bromodichloromethane; DBCM, dibromochloromethane; IQR, interquartile range; TTHM, total trihalomethanes.

a 25th–75th percentiles.

Spearman rank correlation coefficients for correlations between individual THM and combinations thereof showed high variability, from negative correlations (−0.20) to high positive correlations (0.99) (Table 3). These correlations differed by area (see Web Table 1, available at http://aje.oxfordjournals.org/). For example, the TTHM-chloroform correlation ranged from 0.30 (P < 0.05) in Barcelona to 0.99 (P < 0.05) in Asturias. Partial Spearman correlations of residuals adjusted for area led to the following coefficients: 0.69 (chloroform-BDCM), 0.19 (chloroform-DBCM), −0.10 (chloroform-bromoform), 0.58 (BDCM-DBCM), 0.35 (BDCM-bromoform), and 0.70 (DBCM-bromoform).

Table 3.

Spearman Rank Correlation Coefficients for Correlations Between Different Indices of Trihalomethane Exposure (All Areas), Spanish Bladder Cancer Study, Spain, 1998–2001

Exposure Index TTHM TTHM (Weighted)a Chloroform BDCM DBCM Bromoform Total Br-THMb Cl-THM Scorec
TTHM (weighted) 0.99* 1
Chloroform 0.75* 0.70* 1
BDCM 0.99* 0.98* 0.73* 1
DBCM 0.76* 0.80* 0.33* 0.77* 1
Bromoform 0.35* 0.42* −0.20* 0.33* 0.75* 1
Total Br-THM 0.94* 0.96* 0.60* 0.93* 0.90* 0.58* 1
Cl-THM score 0.50* 0.44* 0.90* 0.49* 0.03 −0.50* 0.33* 1
Br-THM scored 0.99* 0.99* 0.70* 0.99* 0.80* 0.41* 0.93* 0.44e*

Abbreviations: BDCM, bromodichloromethane; Br-THM, brominated trihalomethanes; Cl-THM, chlorinated trihalomethanes; DBCM, dibromochloromethane; TTHM, total trihalomethanes.

* P < 0.05.

a TTHM weighted by cytotoxicity: 0.4116 × chloroform + 0.3443 × BDCM + 0.7388 × DBCM + bromoform.

b BDCM + DBCM + bromoform.

c Cl-THM score from the principal-components analysis.

d Br-THM score from the principal-components analysis.

e Pearson correlation coefficient; shows independence between the Br-THM and Cl-THM scores from the principal-components analysis.

Results from the multivariate CLR models are shown in Table 4. The model using TTHM (in µg/L) as a surrogate measure of the mixture showed a monotonically increased risk of bladder cancer, with statistically significant associations in groups above the median (Table 4). Confidence intervals were highly stable after bootstrapping (standard error variation of 6%). The model using cytotoxicity-weighted TTHM (in µg/L) showed a similar pattern that was attenuated, but statistical significance at the P < 0.05 level was lacking (Table 4). The use of molar concentrations (pmol/L) produced odds ratios similar to those obtained using cytotoxicity weights (Table 4). The cytotoxicity-weighted model and TTHM in molar concentration were highly unstable after bootstrapping, with standard error variations between 20% and 31%. Models evaluating risks for specific compounds and the model including the 4 THM led to inconsistent and highly unstable dose-response relationships (results not shown), because of multicollinearity (variance inflation factors: chloroform, 2.18; BDCM, 13.37; DBCM, 14.80; and bromoform, 3.80). Grouping all brominated compounds and adjusting for chloroform in the same model solved multicollinearity (variance inflation factor = 1.49), and all point estimates showed a higher risk for chloroform (Table 4).

Table 4.

Bladder Cancer Risk Associated With Different Indices of Lifetime Trihalomethane Exposure, Spanish Bladder Cancer Study, Spain, 1998–2001

Model No. of Cases No. of Controls Odds Ratioa 95% Confidence Interval
TTHM, µg/L  
 Q1 (<9.4) 159 200 1.00  Reference
 Q2 (9.4–<27.4) 166 193 1.22  0.8, 1.88
 Q3 (27.4–<49.8) 194 166 2.06  1.2, 3.55
 Q4 (≥49.8) 167 191 2.09  1.1, 3.98
  Ptrend  0.008
Cytotoxicity- weighted TTHM, µg/L  
 Q1 (<5) 165 194 1.00  Reference
 Q2 (5–<11.4) 165 196 1.03  0.73, 1.44
 Q3 (11.4–<24.8) 192 165 1.65  0.89, 3.06
 Q4 (≥24.8) 164 195 1.56  0.7, 3.45
  Ptrend  0.083
Molar concentration of TTHM, pmol/L  
 Q1 (<0.06) 162 198 1.00  Reference
 Q2 (0.06–<0.21) 167 191 1.17  0.73, 1.89
 Q3 (0.21–<0.34) 191 168 1.74  0.92, 3.29
 Q4 (≥0.34) 166 193 1.52  0.67, 3.45
  Ptrend  0.075
Chloroform and brominated THM, mutually adjusted  
 Chloroform, µg/L  
  Q1 (<3.4) 160 199 1.00  Reference
  Q2 (3.4–<15) 176 183 1.43  0.83, 2.48
  Q3 (15–<21.2) 170 189 1.34  0.65, 2.75
  Q4 (≥21.2) 180 179 1.76  0.91, 3.39
   Ptrend  0.119
 Brominated THM, µg/L  
  Q1 (<3.8) 158 201 1.00  Reference
  Q2 (3.8–<6.2) 180 180 1.14  0.75, 1.75
  Q3 (6.2–<29.1) 181 177 1.04  0.68, 1.57
  Q4 (≥29.1) 167 192 1.05  0.55, 2
   Ptrend  0.773
Scores from principal- components analysis, mutually adjusted  
 Chlorinated THM score  
  Q1 (<−1.02) 160 199 1.00  Reference
  Q2 (−1.02 to <−0.09) 176 183 0.77  0.53, 1.13
  Q3 (−0.09 to <0.78) 170 189 0.90  0.54, 1.48
  Q4 (≥0.78) 180 179 0.83  0.49, 1.39
   Ptrend  0.835
Brominated THM score
  Q1 (<−1.20) 168 191 1.00  Reference
  Q2 (−1.20 to <−0.72) 167 192 1.08  0.66, 1.79
  Q3 (−0.72 to <0.77) 185 177 1.67  0.71, 3.96
  Q4 (≥0.77) 166 190 1.74  0.69, 4.39
   Ptrend  0.190

Abbreviations: Q, quartile; THM, trihalomethanes; TTHM, total trihalomethanes.

a Odds ratios from conditional logistic regression models stratified by matched set. Cases and controls were matched on sex, age (in 5-year groups), and geographical area. Odds ratios were adjusted for smoking status (never, ever, or current smoker), ever having worked in a profession with high risk for bladder cancer (yes, no), and quartile of fruit and vegetable consumption (g/day).

The PCA reduced the 4 THM into 2 components explaining 94% of the variance (component 1: 68.1%; component 2: 25.5%). We refer to the first component as “brominated THM PCA score” because of a higher correlation with the 3 brominated THM (r > 0.52) than for chloroform (r = 0.24). We call the second component score “chlorinated THM PCA score” given its high correlation with chloroform (r = 0.90), a lower correlation with BDCM (r = 0.14), and negative correlation with the other constituents. The Pearson correlation coefficient for correlation between the 2 components was zero. The partial PCA of residuals adjusted for area showed similar results. Odds ratios for bladder cancer according to PCA scores showed a monotonic increase with the brominated THM scores and a flat association for the chlorinated THM scores, with wide confidence intervals (Table 4).

The exposure-response curves obtained from models with splines showed nonlinear associations in most cases (Web Figures 1 and 2). However, our tests indicated that the fit of these models was not statistically better than the fit of models using linear terms (i.e., fitting a straight line).

DISCUSSION

We found substantial variability in THM concentration and composition between areas. Varying correlations between individual THM species were found, and these correlations differed by area. The estimation of bladder cancer risk with separate THM species was not feasible because of multicollinearity, yielding unstable results. PCA converged into 2 data-driven and independent components: One correlated with brominated THM constituents, while the other mainly correlated with chloroform. Bladder cancer risk showed increasing dose-response relationships in models based on TTHM. Bladder cancer risk for specific THM constituents differed between models, and no consistent pattern was observed.

Given the differences in THM levels by geographical area, the use of area-stratified analyses was warranted, but statistical power was insufficient. Matched analysis overcame the effects of potential nuisance parameters not related to the outcome (matching variables, including area) in the linear models, avoiding the bias from pooled unconditional analyses (28). A potential impact on reported risks resulting from analyzing a subset of the original data set due to exclusions is unlikely, since excluded and included subjects differed only in terms of age and smoking. Smoking was unrelated to the exposure, and the potential effect of age was minimized through the matched analysis.

Models with the single THM and models with TTHM showed different methodological limitations. The estimation of associations for separate species was not feasible because of multicollinearity, yielding invalid models with unstable results. Estimates showed wide confidence intervals, inconsistent estimator signs, and negative β estimations in the multicollinear components (35). Imprecise exposure information has probably contributed to a lack of precision in estimating exposure to each of the THM species. In our data, bladder cancer risk showed increasing dose-response relationships in models based on TTHM, and bootstrapped estimations were stable. The use of cytotoxicity weights did not modify the general trend of the results. Similarly, use of molar concentration gave similar results. Both cytotoxicity-weighted and molar models showed wider confidence intervals than TTHM. To our knowledge, these 2 approaches have not been attempted in previous studies.

Separation of chlorinated and brominated compounds may offer a biological and statistical solution to separate compound estimations. This separation has been applied to account for differential toxicity in other studies (36). This separation is biologically plausible because of these compounds’ probable differential mechanisms of action (22, 23, 37, 38). In addition, this is statistically a solution to multicollinearity. We explored 2 different options to separate chlorinated and brominated compounds, and these gave different results. Our first approach was to separate chloroform and the sum of 3 compounds with bromide. The second approach was data-driven, using a PCA. The PCA led to 1 score primarily representing brominated THM constituents and another that was almost exclusively associated with chloroform. In the former, a stronger bladder cancer risk was found for chloroform compared with brominated compounds, while the opposite occurred in the latter. Exposure-response curves using the sum of brominated compounds showed a flat association, while the PCA approach showed a steeper slope for brominated scores. Although we expected similar results from both approaches, the data appeared to be treated differently, leading to divergent results. Differences appeared because the data were treated in different ways. In the first approach, raw and area-adjusted data showed results for chloroform and brominated compounds totally separated. In the second, PCA components used predicted z scores, which do not separate the 4 compounds completely. The first component, “brominated THM PCA score,” actually included chloroform as part of the calculation. The second component was downscaled because of negative correlation for the brominated compounds (3941). In addition, the PCA approach is sensitive to normality issues and outliers. The 4 THM were not normally distributed, with long right tails and several outliers affecting PCA calculations. None of the results from the models were statistically significant.

The correlation between DBPs has been previously evaluated in different settings (25, 37, 42). Other studies with multiple areas have shown divergent correlations between compounds and poor correlation of bromoform levels with the other compounds (43). Brominated exposure metrics have been used more often in studies of fetal and gestational outcomes (36). PCA regression has been used only to predict formation of THM under specific conditions for water purveyors, but to our knowledge principal-components logistic regression has not been used in formal epidemiologic analyses of DBP (12, 44, 45). In studies of other environmental exposures, such as polychlorinated biphenyls, data-driven scores including PCA and Newton-Raphson search techniques have been used to weight the relative contributions of individual compounds (46).

We followed some of the strategies proposed by Samet (47) to analyze complex mixtures with different underlying toxicological assumptions. Analysis of complex mixtures is a challenge in environmental epidemiology and has been underexplored in the field of health risks related to DBPs (38). A major challenge is to improve the precision of historical DBP exposures, which vary in space and time, increasing misclassification of exposure (26, 48). In addition, it is uncertain how THM levels are correlated with other carcinogens present in the mixture (nitrogenated DBPs, mutagen X (MX), iodinated THM, etc.). All of these exposure assessment issues limit our ability to estimate risk more precisely. Other statistical approaches offer limited help in disentangling associations of cancer risk with individual compounds in the current setting. Multivariate techniques such as factor analysis, cluster analysis, and discriminant analyses are alternatives for use in larger data sets with multiple surrogates, as seen in the PCB literature (46). Artificial neural networks and semi-Bayesian approaches are promising alternatives for dealing with highly correlated compounds that deserve to be further explored in the future (36, 49).

To overcome these limitations, solutions go beyond statistical tools. DBPs appear in variable, complex, and diluted mixtures with an important unidentified fraction (50). Hence, improved exposure assessment is necessary, based on better surrogates or extensive data about other DBPs to refine the estimates. Furthermore, the retrospective assessment of lifetime exposures is prone to important biases. Information biases, including recall bias, hamper precise estimation of risks. In our study, we used reliable, high-quality interviews in an effort to minimize these biases. Finally, the lack of information on other compounds in the mixture hindered an evaluation of how much of the mixture was due to THM or other DBPs.

In summary, the estimation of risks for specific THM is hampered by the varying composition of the mixture, correlation between species, and imprecision of historical estimates. In the absence of better information, TTHM were a better proxy of DBP exposure than separated THM in our data. However, TTHM may convey bias due to varying composition in time and space. Toxicity adjustment using biology-based weights for the components of the mixture assumes extensive experimental data, which is not the case for THM. In addition, the predominance of 1 component (usually chloroform) in many areas gives results similar to those of TTHM. The use of other methods depends heavily on the distribution and correlation of specific constituents in each area. Given that results may differ considerably depending on the methods used, we would suggest that investigators analyzing water DBPs evaluate and present results from more than 1 model. The relationship of TTHM to the most toxic elements of the mixture may vary from region to region, and therefore among studies. We thus recommend that researchers in other studies explore a variety of models to select the best way to analyze their data, stating clearly the potential limitations and how the challenging statistical issues involved in exploring this question are handled.

Supplementary Material

Web Material

ACKNOWLEDGMENTS

Author affiliations: Centre for Research in Environmental Epidemiology, Barcelona, Spain (Lucas A. Salas, Manolis Kogevinas, Cristina M. Villanueva); Hospital del Mar Medical Research Institute, Barcelona, Spain (Lucas A. Salas, Manolis Kogevinas, Cristina M. Villanueva); Department of Experimental and Health Sciences, Pompeu Fabra University, Barcelona, Spain (Lucas A. Salas); Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland (Kenneth P. Cantor, Nathaniel Rothman, Debra Silverman); Department of Preventive Medicine and Public Health, Faculty of Medicine, University of Oviedo, Oviedo, Spain (Adonina Tardon); CIBER Epidemiología y Salud Pública, Barcelona, Spain (Adonina Tardon, Manolis Kogevinas, Cristina M. Villanueva); Centre of Research in Occupational Health, Universitat Pompeu Fabra, Barcelona, Spain (Consol Serra); Consorci Hospitalari del Parc Taulí, Sabadell, Spain (Consol Serra); Medical Oncology Department, Ramon y Cajal University Hospital, Madrid, Spain (Alfredo Carrato); Research Unit, Canarias University Hospital, La Laguna, Spain (Reina García-Closas); Spanish National Cancer Research Centre, Madrid, Spain (Núria Malats); and National School of Public Health, Athens, Greece (Manolis Kogevinas).

This study was supported in part by the Intramural Research Program of the US National Institutes of Health, National Cancer Institute, Division of Cancer Epidemiology and Genetics (contract NCI NO2-CP-11015). The project also received funding from the Spanish Health Ministry (grants FIS/Spain 00/0745 and ISIII-GO3/174) and the European Union (grant BMH4-98-3243). The current analyses were supported by an Erasmus Columbus Master Scholarship (grant 2009-5123/001-001-ECW to L. A. Salas) and a Colciencias PhD Scholarship (grant 529/2011 to L. A. Salas).

We thank Dr. Francisco X. Real for his contribution to the study design and Dr. Xavier Basagaña for statistical advice on the data analysis.

Conflict of interest: none reported.

REFERENCES

  • 1.Nieuwenhuijsen MJ, Grellier J, Smith R, et al. The epidemiology and possible mechanisms of disinfection by-products in drinking water. Philos Transact A Math Phys Eng Sci. 2009;367(1904):4043–4076. doi: 10.1098/rsta.2009.0116. [DOI] [PubMed] [Google Scholar]
  • 2.Krasner SW, Weinberg HS, Richardson SD, et al. Occurrence of a new generation of disinfection byproducts. Environ Sci Technol. 2006;40(23):7175–7185. doi: 10.1021/es060353j. [DOI] [PubMed] [Google Scholar]
  • 3.World Health Organization. Disinfectants and Disinfection By-Products. Geneva, Switzerland: World Health Organization; 2007. Water, Sanitation, and Health Programme, http://www.who.int/water_sanitation_health/dwq/S04.pdf. (Accessed October 31, 2012) [Google Scholar]
  • 4.King WD, Marrett LD. Case-control study of bladder cancer and chlorination by-products in treated water (Ontario, Canada) Cancer Causes Control. 1996;7(6):596–604. doi: 10.1007/BF00051702. [DOI] [PubMed] [Google Scholar]
  • 5.King WD, Marrett LD, Woolcott CG. Case-control study of colon and rectal cancers and chlorination by-products in treated water. Cancer Epidemiol Biomarkers Prev. 2000;9(8):813–818. [PubMed] [Google Scholar]
  • 6.Villanueva CM, Cantor KP, Grimalt JO, et al. Bladder cancer and exposure to water disinfection by-products through ingestion, bathing, showering, and swimming in pools. Am J Epidemiol. 2007;165(2):148–156. doi: 10.1093/aje/kwj364. [DOI] [PubMed] [Google Scholar]
  • 7.Cantor KP, Lynch C, Hildesheim M, et al. Drinking water source and chlorination byproducts in Iowa: I. Risk of bladder cancer. Epidemiology. 1998;9(1):21–28. [PubMed] [Google Scholar]
  • 8.Villanueva CM, Cantor KP, Cordier S, et al. Disinfection byproducts and bladder cancer: a pooled analysis. Epidemiology. 2004;15(3):357–367. doi: 10.1097/01.ede.0000121380.02594.fc. [DOI] [PubMed] [Google Scholar]
  • 9.Vinceti M, Fantuzzi G, Monici L, et al. A retrospective cohort study of trihalomethane exposure through drinking water and cancer mortality in northern Italy. Sci Total Environ. 2004;330(1-3):47–53. doi: 10.1016/j.scitotenv.2004.02.025. [DOI] [PubMed] [Google Scholar]
  • 10.Hildesheim M, Cantor K, Lynch C, et al. Drinking water source and chlorination byproducts. II. Risk of colon and rectal cancers. Epidemiology. 1998;9(1):29–35. [PubMed] [Google Scholar]
  • 11.DeAngelo AB, Geter DR, Rosenberg DW, et al. The induction of aberrant crypt foci (ACF) in the colons of rats by trihalomethanes administered in the drinking water. Cancer Lett. 2002;187(1-2):25–31. doi: 10.1016/s0304-3835(02)00356-7. [DOI] [PubMed] [Google Scholar]
  • 12.Platikanov S, Tauler R, Rodrigues P, et al. Factorial analysis of the trihalomethane formation in the reaction of colloidal, hydrophobic, and transphilic fractions of DOM with free chlorine. Environ Sci Pollut Res Int. 2010;17(8):1389–1400. doi: 10.1007/s11356-010-0320-4. [DOI] [PubMed] [Google Scholar]
  • 13.DeMarini DM. Mutation spectra of complex mixtures. Mutat Res. 1998;411(1):11–18. doi: 10.1016/s1383-5742(98)00009-x. [DOI] [PubMed] [Google Scholar]
  • 14.DeMarini DM, Lynge E. Identification of Research Needs to Resolve the Carcinogenicity of High-Priority IARC Carcinogens. Lyon, France: 2009. pp. 159–166. Chloroform International Agency for Research on Cancer; (IARC Technical Publication no. 42) [Google Scholar]
  • 15.Plewa MJ, Wagner ED, Muellner MG, et al. Comparative mammalian cell toxicity of N-DBPs and C-DBPs. In: Karanfil T, Krasner SW, Westerhoff P, editors. Occurrence, Formation, Health Effects and Control of Disinfection By-Products in Drinking Water. Washington, DC: American Chemical Society; 2008. pp. 36–50. [Google Scholar]
  • 16.Kargalioglu Y, McMillan BJ, Minear RA, et al. Analysis of the cytotoxicity and mutagenicity of drinking water disinfection by-products in Salmonella typhimurium. Teratog Carcinog Mutagen. 2002;22(2):113–128. doi: 10.1002/tcm.10010. [DOI] [PubMed] [Google Scholar]
  • 17.Coffin JC, Ge R, Yang S, et al. Effect of trihalomethanes on cell proliferation and DNA methylation in female B6C3F1 mouse liver. Toxicol Sci. 2000;58(2):243–252. doi: 10.1093/toxsci/58.2.243. [DOI] [PubMed] [Google Scholar]
  • 18.International Agency for Research on Cancer. Re-evaluation of Some Organic Chemicals, Hydrazine and Hydrogen Peroxide. Lyon, France: International Agency for Research on Cancer; 1999. Bromodichloromethane; pp. 1295–1304. (IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, vol 71) [Google Scholar]
  • 19.International Agency for Research on Cancer. Re-evaluation of Some Organic Chemicals, Hydrazine and Hydrogen Peroxide. Lyon, France: International Agency for Research on Cancer; 1999. Bromoform; pp. 1309–1316. (IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, vol 71) [Google Scholar]
  • 20.International Agency for Research on Cancer. Re-evaluation of Some Organic Chemicals, Hydrazine and Hydrogen Peroxide. Lyon, France: International Agency for Research on Cancer; 1999. Chlorodibromomethane; pp. 1331–1338. (IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, vol 71) [Google Scholar]
  • 21.International Agency for Research on Cancer. Some Chemicals That Cause Tumours of the Kidney or Urinary Bladder in Rodents and Some Other Substances. Lyon, France: International Agency for Research on Cancer; 1999. Chloroform; pp. 131–182. (IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, vol 73) [Google Scholar]
  • 22.International Agency for Research on Cancer. Some Drking-water Disinfectants and Contaminants, Including Arsenic. Lyon, France: International Agency for Research on Cancer; 2004. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, vol 84). [PMC free article] [PubMed] [Google Scholar]
  • 23.Plewa MJ, Wagner ED. Mammalian Cell Cytotoxicity and Genotoxicity of Disinfection By-Products. Denver, CO: Water Research Foundation; 2009. [Google Scholar]
  • 24.Krasner SW, Wright JM. The effect of boiling water on disinfection by-product exposure. Water Res. 2005;39(5):855–864. doi: 10.1016/j.watres.2004.12.006. [DOI] [PubMed] [Google Scholar]
  • 25.Bull RJ, Rice G, Teuschler L, et al. Chemical measures of similarity among disinfection by-product mixtures. J Toxicol Environ Health A. 2009;72(7):482–493. doi: 10.1080/15287390802608973. [DOI] [PubMed] [Google Scholar]
  • 26.Villanueva CM, Cantor KP, Grimalt JO, et al. Assessment of lifetime exposure to trihalomethanes through different routes. Occup Environ Med. 2006;63(4):273–277. doi: 10.1136/oem.2005.023069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Villanueva CM, Kogevinas M, Grimalt JO. Haloacetic acids and trihalomethanes in finished drinking waters from heterogeneous sources. Water Res. 2003;37(4):953–958. doi: 10.1016/s0043-1354(02)00411-6. [DOI] [PubMed] [Google Scholar]
  • 28.Breslow NE. Generalized linear models: checking assumptions and strengthening conclusions. Stat App. 1996;8(1):23–41. [Google Scholar]
  • 29.Harrell FE., Jr . Regression Modeling Strategies. Nashville, TN: Vanderbilt University; 2012. Department of Biostatistics, School of Medice, http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/rms.pdf. (Accessed November 22, 2012) [Google Scholar]
  • 30.Figueiras A, Cadarso-Suárez C. Application of nonparametric models for calculating odds ratios and their confidence intervals for continuous exposures. Am J Epidemiol. 2001;154(3):264–275. doi: 10.1093/aje/154.3.264. [DOI] [PubMed] [Google Scholar]
  • 31.Lopez-Abente G, Aragones N, Ramis R, et al. Municipal distribution of bladder cancer mortality in Spain: possible role of mining and industry. BMC Public Health. 2006;6(1):17. doi: 10.1186/1471-2458-6-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Feder PI, Ma ZJ, Bull RJ, et al. Evaluating sufficient similarity for drinking-water disinfection by-product (DBP) mixtures with bootstrap hypothesis test procedures. J Toxicol Environ Health A. 2009;72(7):494–504. doi: 10.1080/15287390802608981. [DOI] [PubMed] [Google Scholar]
  • 33.Hamilton LC. Stata, Release 12 (Base Reference Manual) College Station, TX: Stata Press; 2011. ootstrap—bootstrap sampling and estimation; pp. 193–214. [Google Scholar]
  • 34.Buis ML. POSTRCSPLINE: Stata Module Containing Post-Estimation Commands for Models Using a Restricted Cubic Spline. Boston, MA: Department of Economics,; 2008. Boston College; http://EconPapers.repec.org/RePEc:boc:bocode:s456928. ). (Accessed November 30, 2012) [Google Scholar]
  • 35.Chen X, Ender PB, Mitchell M, et al. Stata Web Books: Regression with Stata. Los Angeles, CA: Institute for Digital Research and Education, University of California, Los Angeles; 2003. Chapter 2—Regression diagnostics. http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm. (Accessed March 20, 2012) [Google Scholar]
  • 36.Hoffman CS, Mendola P, Savitz DA, et al. Drinking water disinfection by-product exposure and fetal growth. Epidemiology. 2008;19(5):729–737. doi: 10.1097/EDE.0b013e3181812bd4. [DOI] [PubMed] [Google Scholar]
  • 37.Bull RJ, Rice G, Teuschler LK. Determinants of whether or not mixtures of disinfection by-products are similar. J Toxicol Environ Health A. 2009;72(7):437–460. doi: 10.1080/15287390802608916. [DOI] [PubMed] [Google Scholar]
  • 38.Richardson SD, Plewa MJ, Wagner ED, et al. Occurrence, genotoxicity, and carcinogenicity of regulated and emerging disinfection by-products in drinking water: a review and roadmap for research. Mutat Res. 2007;636(1-3):178–242. doi: 10.1016/j.mrrev.2007.09.001. [DOI] [PubMed] [Google Scholar]
  • 39.Feder PI, Ma ZJ, Bull RJ, et al. Evaluating sufficient similarity for disinfection by-product (DBP) mixtures: multivariate statistical procedures. J Toxicol Environ Health A. 2009;72(7):468–481. doi: 10.1080/15287390802608965. [DOI] [PubMed] [Google Scholar]
  • 40.Marx BD, Smith EP. Principal component estimation for generalized linear regression. Biometrika. 1990;77(1):23–31. [Google Scholar]
  • 41.Escabias M, Aguilera AM, Valderrama MJ. Principal component estimation of functional logistic regression: discussion of two different approaches. J Nonparametr Stat. 2004;16(3):365–384. [Google Scholar]
  • 42.Singer PC, Chang SD. Correlations between trihalomethanes and total organic halides formed during water treatment. J Am Water Works Assoc. 1989;81(8):61–65. [Google Scholar]
  • 43.Keegan T, Whitaker H, Nieuwenhuijsen MJ, et al. Use of routinely collected data on trihalomethane in drinking water for epidemiological purposes. Occup Environ Med. 2001;58(7):447–452. doi: 10.1136/oem.58.7.447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Marhaba TF, Borgaonkar AD, Punburananon K. Principal component regression model applied to dimensionally reduced spectral fluorescent signature for the determination of organic character and THM formation potential of source water. J Hazard Mater. 2009;169(1-3):998–1004. doi: 10.1016/j.jhazmat.2009.04.047. [DOI] [PubMed] [Google Scholar]
  • 45.Platikanov S, Puig X, Martín J, et al. Chemometric modeling and prediction of trihalomethane formation in Barcelona's water works plant. Water Res. 2007;41(15):3394–3406. doi: 10.1016/j.watres.2007.04.015. [DOI] [PubMed] [Google Scholar]
  • 46.Yorita-Christensen KL, White P. A methodological approach to assessing the health impact of environmental chemical mixtures: PCBs and hypertension in the National Health and Nutrition Examination Survey. Int J Environ Res Public Health. 2011;8(11):4220–4237. doi: 10.3390/ijerph8114220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Samet JM. What can we expect from epidemiologic studies of chemical mixtures? Toxicology. 1995;105(2-3):307–314. doi: 10.1016/0300-483x(95)03227-7. [DOI] [PubMed] [Google Scholar]
  • 48.Rodriguez MJ, Sérodes J-B, Levallois P. Behavior of trihalomethanes and haloacetic acids in a drinking water distribution system. Water Res. 2004;38(20):4367–4382. doi: 10.1016/j.watres.2004.08.018. [DOI] [PubMed] [Google Scholar]
  • 49.Ye B, Wang W, Yang L, et al. Formation and modeling of disinfection by-products in drinking water of six cities in China. J Environ Monit. 2011;13(5):1271–1275. doi: 10.1039/c0em00795a. [DOI] [PubMed] [Google Scholar]
  • 50.Feron VJ, Cassee FR, Groten JP, et al. International issues on human health effects of exposure to chemical mixtures. Environ Health Perspect. 2002;110(suppl 6):893–899. doi: 10.1289/ehp.02110s6893. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Material

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES