Summary
In this paper we propose a Bayesian hierarchical model for the identification of differentially expressed genes in Daphnia Magna organisms exposed to chemical compounds, specifically munition pollutants in water. The model we propose constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification. We have data acquired from a purification system that comprises four consecutive purification stages, which we refer to as “ponds”, of progressively more contaminated water. We model the expected expression of a gene in a pond as the sum of the mean of the same gene in the previous pond plus a gene-pond specific difference. We incorporate a variable selection mechanism for the identification of the differential expressions, with a prior distribution on the probability of a change that accounts for the available information on the concentration of chemical compounds present in the water. We carry out posterior inference via MCMC stochastic search techniques. In the application, we reduce the complexity of the data by grouping genes according to their functional characteristics, based on the KEGG pathway database. This also increases the biological interpretability of the results. Our model successfully identifies a number of pathways that show differential expression between consecutive purification stages. We also find that changes in the transcriptional response are more strongly associated to the presence of certain compounds, with the remaining contributing to a lesser extent. We discuss the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference.
Keywords: Bayesian inference, Daphnia magna, Environmental toxicology, Probit prior, Transcriptomics, Variable selection
1. Introduction
Quality of water is a very important issue for modern society. The number and diversity of chemicals discharged into the environment is increasing with the size of the population and its diversity, posing a largely unknown hazard to the ecosystem and the human health. In the year 2000, a Water Framework Directive of the European Parliament and Council has committed European countries to achieve good surface water quality by 2015. Evaluating the effects of waste waters on aquatic organisms has therefore become an interesting challenge.
In the last decade, new water purification systems have been introduced and studied. Their aim is to improve the biological and chemical quality of the waters discharged from waste water treatment plants, before they are released into the fresh water ecosystem. One example is the Waterharmonica Improving Purification Effectiveness (WIPE) project (Kampf and Claassen, 2004; Kampf et al., 2005), in the Netherlands, which consists of artificially constructed wetland environments. The advantage of such bioremediation systems lies in their low costs and in the additional ecological value. Indeed, Sebire et al. (2011) give evidence that purification systems greatly reduce the risk for the environment.
In the past, whole-organism responses, such as mortality and reproduction, have been used in order to evaluate the biological effects of industrial pollutants (De Schamphelaere et al., 2004; Jemec et al., 2007). However, such responses are only the endpoints of variations at a molecular level and, as such, provide only limited information. For this reason, more recent studies have focused on evaluating changes in gene expression on various organisms, as caused by the presence of chemicals in the water. Daphnia magna, a cladoceran freshwater flea, has been largely used for testing toxicity of water (Soetaert et al., 2006; Jo and Jung, 2008). This organism plays a key role in the aquatic food chain, it is highly sensitive to chemicals, easy to culture in laboratory and it is a widely spread species.
Daphnia magna has been intensively studied using functional genomics techniques, which allow measuring thousands of cellular molecular components in single experiments. Jo and Jung (2008) studied the effect of exposure to rubber waste water on gene expression in Daphnia magna, while Scanlan et al. (2013) investigated the effects of the exposure to silver nanowire. Also, Antczak et al. (2013) used molecular toxicity identification evaluation to predict chemical exposure. In this paper, we investigate the effect of chemicals, in particular munition pollutants, on the gene expression of Daphnia magna using the data introduced by Garcia-Reyero et al. (2012) and available at the NCBI GEO site (http://www.ncbi.nlm.nih.gov/geo/) with accession number GSE13169. The bioremediation system we use comprises of 4 consecutive purification stages (which we refer to as “ponds”) of progressively more contaminated water. We look at the exposure to mixtures of six chemical constituents and consider an order of the ponds from the most pure water to the pond with the highest concentration of chemicals. Even though relatively simple, the model we present in this paper constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification.
The major interest of a water purification experiment is in the identification of the differentially expressed genes in consecutive ponds. For this, we propose a hierarchical Bayesian model of the expected change in expression. The model further incorporates a variable selection prior that accounts for the available information on the concentration of chemical compounds present in the water. This allows us to estimate the relative influence of single chemicals on the probability of a change in expression. In order to simplify the complexity of the gene expression profiling data, we group genes by their functional characteristics (defined by the biological pathway database KEGG) and then express the transcriptional activity of each pathway by means of its principal components. Our model successfully identifies a number of pathways that show differential expression between consecutive ponds. These mainly represent membrane, signaling, and translational and transcription pathways. We also find that changes in the transcriptional response are more strongly associated to the presence of TNT and 2,4-DNT, with the remaining compounds contributing to a lesser extent. We discuss the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference. We also compare the performances of our method with those obtained using the significance analysis of microarrays (SAM) method. Our model succeeds in identifying additional important pathways not identified by SAM, in addition to providing estimates of the effects of specific chemicals on the observed transcriptional response.
The model we propose in this paper constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification. The approach we take is general and can be applied in a variety of experimental settings. Unlike most of the common approaches, that look at the effect of single compounds (Falciani et al., 2008; Garcia-Reyero et al., 2012; Gust et al., 2013), our method considers mixtures of multiple compounds and helps identifying not only the associated molecular responses but also which compound is dominant in the mixture. This approach can be used not only to understand how water purification systems work, but also how dissipation of chemicals into the environment are affecting different areas of the ecosystem, possibly even biodiversity.
The rest of the paper is organized as follows: In Section 2 we introduce the hierarchical model of the expected change in expression between consecutive ponds. We also describe the variable selection prior and the MCMC algorithm for posterior inference. Section 3 gives details on the Daphnia magna experimental study and the results from the data analysis. Section 4 contains some final remarks.
2. Methods
In this paper, we propose a Bayesian hierarchical model to investigate the effect of chemical compounds, in particular munition pollutants, on the gene expression of Daphnia magna. The bioremediation system we use comprises of 4 consecutive purification stages (which we refer to as “ponds”) of progressively more contaminated water. We model the expected change in the expression of a gene between subsequent ponds. We further incorporate a variable selection mechanism for the identification of the differential expressions, with a prior distribution on the probability of a change that accounts for the available information on the concentration of chemical compounds present in the water.
2.1 Hierarchical Model
Let Yigt denote the expression measurement for gene g (g = 1, …, G) in pond t (t = 0, …, T ), from sample i (i = 1, …, Nt). Note that we allow each pond to have a different sample size. Let t = 0 indicate the blank pond, i.e., the purest one, and let us assume that the ponds are ordered from the cleanest water to the most dirty. We assume that the gene expression measurements Yigt are normally distributed,
(1) |
with μgt a gene-pond specific mean and a gene specific variance. The gene-pond specific mean μgt is then modeled as a function of the mean of the same gene in the previous pond plus a gene-pond specific difference in mean expression, αgt, as
(2) |
for t = 1, …, 4. Without loss of generality, we assume the mean of the blank pond μg0 to be zero and, for each gene in each pond, we center the expression data with respect to the mean of the same gene in the blank pond, i.e. Yigt − Ȳg0 for g = 1, …, G and t = 0, …, T.
2.2 Variable Selection Prior
We incorporate in the model a variable selection prior that accounts for the available information on the concentration of chemical compounds present in the water.
Let A the (G × T) matrix with elements αgt. For each gene we wish to find whether its mean expression in a specific pond changes with respect to the previous one. This is equivalent to inferring which αgt in model (2) are non-zero with high confidence. To address this goal, we introduce a (G × T) matrix Ω of binary indicators, that is, ωgt = 1 indicates that the corresponding αgt is different from zero. Otherwise, ωgt = 0 indicates that gene g in pond t has not changed its mean expression with respect to the previous pond, i.e., αgt = 0. Conditional on this latent matrix Ω, we assume that the elements of the matrix A are stochastically independent and have the following mixture prior distribution,
(3) |
with δ0 a Dirac spike and cα > 0 a hyperparameter to be set. We complete the prior model by assuming . In the variable selection literature the conjugate choice is often made for computational convenience, as it allows to marginalize some of the model parameters. Mixture priors of type (3) are known as spike and slab in the Bayesian variable selection literature, and have been used extensively in univariate and multivariate linear regression settings (George and McCulloch, 1997; Brown et al., 1998; Sha et al., 2004).
Model (3) requires a prior on ωgt. A simple choice in variable selection is to assume independent Bernoulli priors, i.e. ωgt ~ Bern(p), where p can be either a fixed hyperparameter or a random variable itself. Recently, some authors have suggested prior models that incorporate external information about the predictors, for example via Markov random field or logistic priors that capture correlation among the variables (Li and Zhang, 2010; Stingo et al., 2010). Here we use a probit-like prior that allows us to incorporate the information we have available on the concentrations of the chemical compounds present in the water. Probit-like priors have been recently proposed in the literature on Bayesian variable selection as a convenient way to incorporate external information to guide the selection of the predictors (Quintana and Conti, 2013; Cassese et al., 2014).
Let D be the (Q × T) matrix whose elements are the normalized absolute values of the difference in concentration of the individual chemical compounds with respect to the previous pond, that is,
(4) |
where cqt is the concentration of chemical q in pond t, and where cq0 = 0, for every chemical. Given the matrix D, we model the prior probability of a change in the mean expression of a gene as a function of D,
(5) |
with Φ the c.d.f. of a standard Normal distribution and η a hyperparameter to be chosen, and where we allow for different prior probabilities for each pond. The parameter θq in (5) captures the effect of a change in the concentration of the q-th chemical. Since we expect a change in concentration to result in a change in the expression of some genes, and thus, we expect π(ωgt = 1|θ) only to increase, we constrain θq > 0 by assuming θq ~ Ga(aq, bq). The choice of normalized absolute difference in (4) allows us to more reliably estimate θ even in situations where one chemical may dominate the others. Furthermore, it is worth noting that, in estimating θ, the number of ponds T plays the role of the sample size. The hyperparameter η in (5) regulates the prior probability of a change in gene expression when ignoring (or in the absence of) any information on the concentrations of the chemical compounds. More specifically, if θ = 0, or if all the chemicals do not change their concentration (i.e. Dqt = 0 for every q = 1, …, Q), equation (5) reduces to Φ(η).
2.3 Posterior inference
Our full joint model can be summarized as
(6) |
Our primary interest lies in the estimation of the presence/absence of a change in the mean expression of a gene between two adjacent ponds, i.e. the estimation of the matrix Ω. Since the posterior distribution is not available in closed form, we design a Markov Chain Monte Carlo algorithm based on stochastic search variable selection (SSVS) procedures (Savitsky et al., 2011). To simplify the sampling algorithm we integrate out and work with the marginalised likelihood,
(7) |
where
Note that f(Ygt|A, Ω) is the p.d.f. of a non standard non central t-distribution and that (Nt + ωgtcα) reduces to Nt when ωgt = 0.
A generic iteration of the MCMC algorithm consists of the following updates:
-
Update (A, Ω): We perform a between-model step by updating these two parameters jointly. First, a new value of Ω is proposed by either an add/delete (A/D), with probability ρ, or swap (S), with probability (1 − ρ), step. If an A/D step is chosen we simply select at random one element and change its value. If an S step is chosen we select independently at random a 1 and a 0 element and swap their values. When , we set the corresponding , otherwise if we propose a new value of αgt by sampling it from a Normal distribution. The mean of the proposal distribution is calculated with a random walk procedure, as the mean of the B previous iterations during the burn-in phase, and it is fixed to the last computed value afterwards, while the variance is fixed throughout the MCMC (Roberts and Rosenthal, 2009). The proposed values and are then accepted with probability
Note that the proposal distribution of ωgt drops out from the previous ratio since all moves are symmetric.
Update A: This within-model step is performed via a Gibbs sampler with the purpose of improving mixing. It consists of updating each αgt corresponding to ωgt = 1 by sampling from , with t() a non standard not central t-distribution, and where the degrees of freedom ν, the location parameter μt and the dispersion parameter are set to δ, and , respectively.
-
Update θ: We propose a new value for each θq by sampling from a Gamma distribution. As for step 1, the parameters of the Gamma proposal are chosen following a random walk procedure, during the burn-in, and are fixed to the last computed value afterwards. In particular, the mean of the Gamma distribution is set to the mean of θq in the previous B iterations, while the variance is fixed to .1. The proposed values are then accepted with probability
Given the MCMC output, we perform posterior inference on Ω by calculating the marginal posterior probability of inclusion (PPI) for each element, which we estimate as the number of iterations where that element was set to 1, after burn-in. Point estimates of each θq are computed as the mean of the sampled values, after burn-in.
3. Case Study
We are interested in investigating the effects of munition pollutants on the gene expression of Daphnia magna. We consider a purification system with four stages of contaminated water. Exposures are to four chemical mixtures considered by Garcia-Reyero et al. (2012). Below we describe the experiment in some details and then present the results from our analysis.
3.1 The Daphnia magna experiment
Daphnia magna, a cladoceran freshwater flea, has been largely used for testing toxicity of water (Soetaert et al., 2006; Jo and Jung, 2008). This organism plays a key role in the aquatic food chain, it is highly sensitive to chemicals, easy to culture in laboratory and it is a widely spread species. We have data available from an experiment that looks at the exposure to four mixtures, each characterising the chemical concentrations of a pond, of six munitions constituents. The six contaminants under study are 1,3,5-trinitroperhydro-1,3,5-tiazine (RDX), 2,4,6-trinitrotoulene (TNT), 2,4 and 2,6-dinitrotoulene (2,4-DNT and 2,6-DNT), 1,3,5-trinitrobenzene (TNB) and 1,3-dinitrobenzene (DNB). Table 1 reports their concentrations. We consider an order of the four ponds from the most pure water to the pond with the highest concentration of chemicals.
Table 1.
Chemical | Blank | Mix 1 | Mix 2 | Mix 3 | Mix 4 |
---|---|---|---|---|---|
TNT | 0 | .498 | .999 | 1.269 | 1.012 |
2,4-DNT | 0 | 0 | 1.242 | .044 | 1.245 |
2,6-DNT | 0 | 0 | 0 | 0 | 1.13 |
DNB | 0 | 0 | 0 | .157 | 1.107 |
TNB | 0 | 0 | 0 | .159 | .67 |
RDX | 0 | 0 | 0 | .093 | .334 |
Daphnia magna exposures were conducted on 6–8 old daphnids in 1L glass beakers with a 750ml exposure volume. After 24h of exposure RNA was isolated using RNeasy kits (Qiagen, Valencia, CA, USA). Quality was assessed with an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA, USA) and quantified using a Nanodrop ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, DE, USA). Microarray data were normalized using the normalize.quantile function of the preprocessCore package of the R programming language. More details can be found in the supplementary material of Garcia-Reyero et al. (2012).
In order to simplify the complexity of the gene expression profiling data, we grouped genes by their functional characteristics, as defined by the biological pathway database KEGG (Kanehisa and Goto, 2000), and then expressed the transcriptional activity of each pathway by means of its principal components. Methods that employ pathway-based scores of gene expression data have become quite popular in genomics as an effective way to reduce the dimensionality of the data, see for example Su et al. (2009); Drier et al. (2013). Here we used the same annotation and data reduction as published in Antczak et al. (2013). Specifically, 92 KEGG pathways were identified and associated to the microarray chip design. Then, for each pathway, we applied principal component analysis (PCA) to the gene expression data in each pond, using the prcomp function available in R, and selected the components that explained at least 75% of the observed variance. This choice accomplishes a good reduction of the complexity of the gene expression profiling data (from 1379 genes to 630 pathway components) while still retaining a large percentage of the observed variance.
3.2 Parameter settings
In our model formulation, the expression data are captured via the matrix Y in (1), with G = 92 × 2, T = 4 and Nt = (5, 3, 4, 3, 3). In each pond, we centered the data with respect to the purest one, i.e., the blank pond, as described in Section 2.1. In addition, given the concentrations in Table 1, we computed each element of the matrix D as in (4).
Results we report below were obtained by starting the MCMC chain from a matrix Ω with all its elements set to zero and by sampling the initial values for the parameters θq from their prior distributions. As for hyperparameter settings, we specified the shrinkage parameter cα of prior (3) in the range of variability of the data, so as to control the ratio of prior to posterior precision (Sha et al., 2004). Specifically, we set cα = .1. Furthermore, we specified a vague prior on by setting δ = 3, and choosing d such that the expected value of the variance parameter represents a fraction of the observed variance (5% for the results in this paper). In our model, the parameter η in (5) reflects the prior belief of a change in mean expression, in the absence of any information on the concentration of the chemical compounds, i.e. θ = 0. We performed sensitivity analysis to different settings of η. More specifically, we investigated η =−3.72, −3.09 and −2.32, which correspond to a 0.01%, 0.1% and 1% prior probability. Finally, we specified vague priors on θq by setting aq = bq = 1.
We ran the MCMC chains for 2, 000, 000 iterations, with a burn-in of 1, 000, 000 and random walk proposals centered on values calculated over the previous B = 50 iterations. Our C++ code performed 1,000 MCMC iterations in about 6 seconds on a double core Intel®Xeon®processor with 16GB of memory, 2.2GHz. We assessed convergence by visually inspecting the MCMC sample traces. Additionally, we tested convergence by applying the diagnostic test of Geweke (1992) for the equality of the means, based on the first 10% and the last 50% of the chain. We also used the Heidelberger and Welch (1981) test on the stationarity of the distribution to determine a suitable burn-in.
3.3 Results
We report results obtained with η = −3.09, i.e. a prior probability of 0.1%, and later comment on the sensitivity to this choice. Figure 1 shows plots of the PPIs of the elements ωgt of Ω. Each sub figure refers to the comparison of a given pond (t = 1, …, 4) with the previous one, with the x-axis showing the set of pathway components. Changes in expression across consecutive ponds can by detected by looking at the components with large PPI values. We notice that, as exposure to any given chemical compound can have a variety of effects on the molecular state of an organism, it is generally expected that a large number of pathway components will be significantly perturbed, as our results show (Williams et al., 2009; Hernndez et al., 2013; Antczak et al., 2013). Many more significant changes are observed in the third and fourth sub figures, as indicated by the large PPIs values, since there is more variation in the chemical compounds in the last two ponds (see the concentrations reported in Table 1).
A threshold of .99 on the PPIs identified 92, 156, 152 and 142 pathway components in the four consecutive ponds transitions, moving from the least to the most polluted. We analyzed the selected pathways by grouping them according to the top two levels of the KEGG pathway hierarchy (general terms and potential additional terms). The detailed breakdown is shown in the Supplementary Material. This analysis revealed that the transcriptional response linked to the transition between the blank and the first pond contains a large number of genetic information processing pathways, specifically in protein folding, sorting and degradation (18.1%), followed by translation (9%) and transcription (4.5%). In addition, a number of metabolic (22%), transport (20%) and signal transduction pathways (11.7%) were identified. The transition between ponds 1 and 2 follows a similar trend but with a greater focus on signal transduction (20%) and transcription pathways (9%). In the transition between ponds 2 and 3 we found an increased prevalence of metabolism pathways (31%), which mainly include carbohydrate metabolism (7.5%) and lipid (5%) and amino acid metabolism (7%). In addition, the number of signaling molecule pathways increased (5.8%) while transport and catabolism and nucleotide metabolism pathways were constant compared to the previous transition (20% and 1% respectively). Lastly, the transition between ponds 3 and 4 showed a decreased prevalence of genetic processing pathways (18%), and a similar profile for the others, with a high amount of metabolic pathways (30%), signal transduction (17%) and transport pathways (20%).
Our results contain a number of interesting findings. First, all 4 pond transitions showed same level of endocrine system related pathways (1.5%), including the first transition, where only TNT is present. This suggests an effect of TNT on the endocrine system. Similar links have already been observed in other species (Haerry et al., 1997; Kraut, 2011; Torre et al., 2008) but not yet in Daphnia magna. Another interesting result is that the number of identified membrane related pathways increases with the number of compounds within a pond, suggesting that mixtures of compounds have an increased effect on membrane components, and particularly on signal transduction pathways.
As a point of comparison with other methods, we looked at the results obtained using the significance analysis of microarrays (SAM) d-statistic of Tusher et al. (2001). This is a distribution free permutation based technique that measures the strength of the relationship between gene expression and a response variable. Figure 2 summarises the results, with each sub-plot showing a scatter-plot of the PPIs obtained with our method (x-axis) against the d-statistics obtained with SAM (y-axis), for the first pathway components only. Horizontal solid and dashed lines correspond to 1% and 5% FDR thresholds on the d-statistic, respectively. The figure shows that the performance of the two methods is broadly similar, as there is a large concordance in the selection. For example, looking at the modal model selected by our method, that is the set of pathway components that have PPIs greater than .5 (Barbieri and Berger, 2004), we find an overlap with those selected by SAM of 93.75% and 95.29%, using the FDR thresholds of 1% and 5%, respectively. However, there are also important differences in the selection done by the two methods. For example, Glycosphingolipid biosyntheses were among the pathways identified by our method but not by SAM. Genes involved in the Glycosphingolipid are important membrane building blocks and may play an important role in the composition of the membrane. As we have shown above, a number of membrane pathways are highly affected by exposure to the compounds we considered, and this effect could be facilitated through additional perturbation of the Glycosphingolipid biosynthesis pathway.
In addition to the inference on Ω, which allows to detect differential expressions, our model also returns the posterior distribution of the θq elements. These parameters measure the relative influence of the individual chemical compounds on the posterior inference. Figure 3 shows the kernel density estimates of all six parameters. Results clearly suggest that chemical TNT has the strongest influence, followed by 2,4-DNT and RDX. TNB, 2,6-DNT and DNB all have very little to negligible influence.
As for the results on the selected pathway expressions, our results are in line with what is known about the individual chemical compounds. Indeed, several studies have shown that TNT can cause oxidative stress (Cenas et al., 2001; Nemeikaite-Ceniene et al., 2004). Also, RDX is related with the nervous system and can cause seizures in vertebrates and invertebrates (Gust et al., 2009; Garcia-Reyero et al., 2011), while 2,4-DNT affects lipid metabolism in liver and oxygen transport (Wintz et al., 2006). Additional validation came from a one-class SAM analysis, with an FDR threshold at 20%, that we performed on gene expression data on exposures to single compounds, which we also have available from Garcia-Reyero et al. (2012). This analysis showed that TNT had the highest normalised count, followed by 2,4-DNT (result not shown).
Let us now comment on the sensitivity of the results to the choice of the parameter η in (5). This parameter represents the weight assigned to the data, as our prior belief of a change in expression in the absence of any information on the concentration of the chemical compounds. Some sensitivity to this parameter is therefore to be expected. In particular, since the parameters θ1, …, θ6 are the weights of the prior information derived from changes in the chemical concentrations, we expect that higher values of η will result in values of the θq parameters that are concentrated around smaller values. Indeed, as an example, Figure 4 shows the effect of different choices of η on the density kernel estimate of θ1 (the parameter associated to TNT). Notice how the posterior distribution tends to concentrate on lower values when η increases. We observed the same behavior for the other θq parameters (result not shown).
In spite of the evident sensitivity of the individual θq estimates to the choice of the η parameters, we found that the overall effect of the estimates on the posterior inference did not depend on the chosen η value. For example, in the second pond (the one where we observed the largest increases in posterior probability) we observed a 67.29% increase on the prior probability of p(ωgt = 1) for η = −3.72, of 66.56% for η = −3.09 and of 64.70% for η = −2.32. Of course, these probability values should be interpreted with caution, as inference on the θq parameters also depends on the sample size and the number of parameters to estimates, as well as the concordance between the data and the prior information.
4. Discussion
In this paper we have presented a simple approach to a rigorous modeling of the biological effects of water purification. For this, we have proposed a hierarchical Bayesian model of the expected change in gene expression between consecutive purification stages. We have also incorporated a variable selection prior that accounts for available information on the concentration of chemical compounds present in the water. Our modeling approach is general and can be applied to a variety of settings used in experimental studies to estimate the biological effects of water purification. Here we have presented an application to the identification of differentially expressed genes in Daphnia Magna organisms exposed to munition pollutants in water.
In order to simplify the complexity of the gene expression profiling data, we have grouped genes by KEGG pathways and then expressed the transcriptional activity of each pathway by means of its principal components. When applied to gene expression data from Daphnia Magna organisms exposed to munition pollutants, our model has successfully identified a number of pathways that show differential expression between consecutive purification stages. These mainly represent membrane, signaling, and translational and transcription pathways. In addition, our model allows for the estimation of the relative influence of single chemicals on the probability of a change in expression. We have found, in particular, that changes in the transcriptional response are more strongly associated to the presence of TNT and 2,4-DNT, with the remaining compounds contributing to a lesser extent. We have discussed the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference.
In the application, we have also looked into results of the SAM method, although it should be pointed out that this comparison only addressed one feature of the method we have developed, which is the ability to identify genes differentially expressed across a series of samples. Our method, in addition, allows to estimate the effect of individual chemicals on the observed transcriptional response. This feature of our model could be particularly important in the growing application of adverse outcome pathways (AOPs). At the moment, these AOPs are derived on a single exposure level - where one looks at the single effect that a compound with a set of characteristics may have on the organism. Our approach would allow to estimate the effects of multiple compounds and to prioritize pathways as a result of their interactions with the compounds. Our approach can be used not only to understand how water purification systems work, but also how dissipation of chemicals into the environment are affecting different areas of the ecosystem, possibly even biodiversity.
The modeling approach we have considered in this paper is general and can be applied to data collected, for example, from artificially constructed wetland environments, such as the WIPE project (Kampf and Claassen, 2004; Kampf et al., 2005). Such datasets typically involve a large number of chemical compounds. Also, it could be interesting to include interactions between compounds, as well as chemical features that can be calculated to provide a more informed assessment of how strong the chemical is affecting the pathways. With a large number of possible covariates, an interesting methodological extension of the methods would then be to also employ selection priors at the second stage of the model, for the identification of those chemicals inducing changes in the transcriptional response.
Supplementary Material
Footnotes
Supplementary Tables referenced in Section 3.3 are available with this paper at the Biometrics website on Wiley Online Library.
References
- Antczak P, Jo H, Woo S, Scanlan L, Poynton H, Loguinov A, Chan S, Falciani F, Vulpe C. Molecular toxicity identification evaluation (mTIE) approach predicts chemical exposure in Daphnia magna. Environ Sci Technol. 2013;47:11747–11756. doi: 10.1021/es402819c. [DOI] [PubMed] [Google Scholar]
- Barbieri M, Berger J. Optimal predictive model selection. The Annals of Statistics. 2004;32(3):870–897. [Google Scholar]
- Brown P, Vannucci M, Fearn T. Multivariate Bayesian variable selection and prediction. J of the Royal Statistical Society, Series B. 1998;60:627–641. [Google Scholar]
- Cassese A, Guindani M, Vannucci M. A bayesian integrative model for genetical genomics with spatially informed variable selection. Cancer Informatics. 2014;13(S2):29–37. doi: 10.4137/CIN.S13784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cenas N, Nemeikaite-Ceniene A, Sergediene E, Nivinskas H, Anusevicius Z, Sarlauskas J. Quantitative structure-activity relationships in enzymatic single-electron reduction of nitroaromatic explosives: implications for their cytotoxicity. Biochim Biophys ActaB. 2001;1528:31–38. doi: 10.1016/s0304-4165(01)00169-6. [DOI] [PubMed] [Google Scholar]
- De Schamphelaere K, Canli M, Van Lierde V, Forrez I, Vanhaecke F, Janssen C. Reproductive toxicity of dietary zinc to Daphnia magna. Aquat Toxicol. 2004;70:233–244. doi: 10.1016/j.aquatox.2004.09.008. [DOI] [PubMed] [Google Scholar]
- Drier Y, Sheffer M, Domany E. Pathway-based personalized analysis of cancer. Proceedings of the National Academy of Sciences. 2013;110(16):6388–6393. doi: 10.1073/pnas.1219651110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falciani F, Diab A, Sabine V, Williams T, Ortega F, George S, Chipman J. Hepatic transcriptomic profiles of European flounder (Platichthys flesus) from field sites and computational approaches to predict site from stress gene responses following exposure to model toxicants. Aquatic Toxicology. 2008;90(2):92–101. doi: 10.1016/j.aquatox.2008.07.020. [DOI] [PubMed] [Google Scholar]
- Garcia-Reyero N, Habib T, Pirooznia M, Gust K, Gong P, Warner C, Wilbanks M, Perkins E. Conserved toxic responses across divergent phylogenetic lineages: a meta-analysis of the neurotoxic effects of RDX among multiple species using toxicogenomics. Ecotoxicology. 2011;20:580–594. doi: 10.1007/s10646-011-0623-3. [DOI] [PubMed] [Google Scholar]
- Garcia-Reyero N, Escalon B, Loh P, Laird J, Kennedy A, Berger B, Perkins E. Assessment of chemical mixtures and groundwater effects on Daphnia magna transcriptomics. Environ Sci Technol. 2012;46:42–50. doi: 10.1021/es201245b. [DOI] [PubMed] [Google Scholar]
- George E, McCulloch R. Approaches for Bayesian variable selection. Statistica Sinica. 1997;7:339–373. [Google Scholar]
- Geweke J. Bayesian Statistics. University Press; 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments; pp. 169–193. [Google Scholar]
- Gust K, Pirooznia M, Quinn MJ, Johnson M, Escalon L, Indest K, Guan X, Clarke J, Deng Y, Gong P, Perkins E. Neurotoxicogenomic investigations to assess mechanisms of action of the munitions constituents RDX and 2,6-DNT in Northern bobwhite (Colinus virginianus) Toxicol Sci. 2009;110:168–180. doi: 10.1093/toxsci/kfp091. [DOI] [PubMed] [Google Scholar]
- Gust M, Fortier M, Garric J, Fournier M, Gagn F. Effects of short-term exposure to environmentally relevant concentrations of different pharmaceutical mixtures on the immune response of the pond snail Lymnaea stagnalis. Science of The Total Environment. 2013;445:210–218. doi: 10.1016/j.scitotenv.2012.12.057. [DOI] [PubMed] [Google Scholar]
- Haerry TE, Heslip TR, Marsh JL, O’Connor MB. Defects in glucuronate biosynthesis disrupt Wingless signaling in Drosophila. Development. 1997;124:3055–3064. doi: 10.1242/dev.124.16.3055. [DOI] [PubMed] [Google Scholar]
- Heidelberger P, Welch PD. A spectral method for confidence interval generation and run length control in simulations. Commun ACM. 1981;24(4):233–245. [Google Scholar]
- Hernndez AF, Parrn T, Tsatsakis AM, Requena M, Alarcn R, Lpez-Guarnido O. Toxic effects of pesticide mixtures at a molecular level: Their relevance to human health. Toxicology. 2013;307(0):136–145. doi: 10.1016/j.tox.2012.06.009. [DOI] [PubMed] [Google Scholar]
- Jemec A, Drobne D, Tisler T, Trebse P, Ros M, Sepcific K. The applicability of acetylcholinesterase and glutathione S-transferase in Daphnia magna toxicity test. Comp Biochem Physiol C Toxicol Pharmacol. 2007;144:303–309. doi: 10.1016/j.cbpc.2006.10.002. [DOI] [PubMed] [Google Scholar]
- Jo H, Jung J. Quantification of differentially expressed genes in Daphnia magna exposed to rubber wastewater. Chemosphere. 2008;73:261–266. doi: 10.1016/j.chemosphere.2008.06.038. [DOI] [PubMed] [Google Scholar]
- Kampf R, Claassen T. LET2004, WW5. IWA-Leading-Edge Technology; 2004. The use of treated wastewater for nature: The Waterharmonica, a sustainable solution as an alternative for separate drainage and treatment. [Google Scholar]
- Kampf R, Claassen THL, Dokkum VHP, Foekema EM, Graansma J. Increasing the natural values of treated wastewater. In: Bohemen EHD, editor. Ecological Engineering: Bridging between ecology and civil engineering. Aeneas: Technical Publishers: Delft; 2005. pp. 90–97. [Google Scholar]
- Kanehisa M, Goto S. Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kraut R. Role of sphingolipids in Drosophila development and disease. Journal of Neurochemistry. 2011;116:764–778. doi: 10.1111/j.1471-4159.2010.07022.x. [DOI] [PubMed] [Google Scholar]
- Li F, Zhang N. Bayesian variable selection in structured high-dimensional covariate space with application in genomics. Journal of the American statistical association. 2010;105:12021214. [Google Scholar]
- Nemeikaite-Ceniene A, Sarlauskas J, Miseviciene L, Anusevicius Z, Maroziene A, Cenas N. Enzymatic redox reactions of the explosive 4,6-dinitrobenzofuroxan (DNBF): implications for its toxic action. Acta Biochim Pol. 2004;51:1081–1086. [PubMed] [Google Scholar]
- Quintana MA, Conti DV. Integrative variable selection via Bayesian model uncertainty. Statistics in Medicine. 2013;32(28):4938–4953. doi: 10.1002/sim.5888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roberts G, Rosenthal J. Examples of adaptive MCMC. Journal of Computational and Graphical Statistics. 2009;18:349–367. [Google Scholar]
- Savitsky T, Vannucci M, Sha N. Variable selection for nonparametric Gaussian process priors: Models and computational strategies. Statistical Science. 2011;26:130–149. doi: 10.1214/11-STS354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scanlan L, Reed R, Loguinov A, Antczak P, Tagmount A, Aloni S, Nowinski D, Luong P, Tran C, Karunaratne N, Pham D, Lin X, Falciani F, Higgins C, Ranville J, Vulpe C, Gilbert B. Silver nanowire exposure results in internalization and toxicity to Daphnia magna. ACS Nano. 2013;7:10681–10694. doi: 10.1021/nn4034103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sebire M, Katsiadaki I, Taylor NG, Maack G, Tyler CR. Short-term exposure to a treated sewage effluent alters reproductive behaviour in the three-spined stickleback (gasterosteus aculeatus) Aquat Toxicol. 2011;105:78–88. doi: 10.1016/j.aquatox.2011.05.014. [DOI] [PubMed] [Google Scholar]
- Sha N, Vannucci M, Tadesse M, Brown P, Dragoni I, Davies N, Roberts T, Contestabile A, Salmon N, Buckley C, Falciani F. Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics. 2004;60(3):812–819. doi: 10.1111/j.0006-341X.2004.00233.x. [DOI] [PubMed] [Google Scholar]
- Soetaert A, Moens L, Van der Ven K, Van Leemput K, Naudts B, Blust R, De Coen W. Molecular impact of propiconazole on Daphnia magna using a reproduction-related cDNA array. Comp Biochem Physiol C Toxicol Pharmacol. 2006;142:66–76. doi: 10.1016/j.cbpc.2005.10.009. [DOI] [PubMed] [Google Scholar]
- Stingo F, Chen Y, Vannucci M, Barrier M, Mirkes P. A Bayesian graphical modelling approach to microRNA regulatory network inference. Annals of Applied Statistics. 2010;4(4):2024–2048. doi: 10.1214/10-AOAS360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su J, Yoon B-J, Dougherty ER. Accurate and reliable cancer classification based on probabilistic inference of pathway activity. PLoS ONE. 2009;4(12) doi: 10.1371/journal.pone.0008161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torre CD, Corsi I, Focardi S, Arukwe A. Effects of 2,4,6-trinitrotoluene (TNT) on neurosteroidogenesis in the European eel (Anguilla anguilla ; Linnaeus 1758) Chemistry and Ecology. 2008;24(sup1):1–7. [Google Scholar]
- Tusher V, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98:511651–21. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams TD, Wu H, Santos EM, Ball J, Katsiadaki I, Brown MM, Baker P, Ortega F, Falciani F, Craft JA, Tyler CR, Chipman JK, Viant MR. Hepatic transcriptomic and metabolomic responses in the stickleback (Gasterosteus aculeatus) exposed to environmentally relevant concentrations of dibenzanthracene. Environmental Science and Technology. 2009;43(16):6341–6348. doi: 10.1021/es9008689. [DOI] [PubMed] [Google Scholar]
- Wintz H, Yoo L, Loguinov A, Wu Y, Steevens J, Holland R, Beger R, Perkins E, Hughes O, Vulpe C. Gene expression profiles in fathead minnow exposed to 2,4-DNT: correlation with toxicity in mammals. Toxicol Sci. 2006;94:71–82. doi: 10.1093/toxsci/kfl080. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.