Skip to main content
Genomics Data logoLink to Genomics Data
. 2014 Jun 11;2:117–122. doi: 10.1016/j.gdata.2014.05.013

Gene expression analysis of livers from female B6C3F1 mice exposed to carcinogenic and non-carcinogenic doses of furan, with or without bromodeoxyuridine (BrdU) treatment

Anna Francina Webster a,b, Andrew Williams a, Leslie Recio c, Carole L Yauk a,
PMCID: PMC4536026  PMID: 26484082

Abstract

Standard methodology for identifying chemical carcinogens is both time-consuming and resource intensive. Researchers are actively investigating how new technologies can be used to identify chemical carcinogens in a more rapid and cost-effective manner. Here we performed a toxicogenomic case study of the liver carcinogen furan. Full study and mode of action details were previously published in the Journal of Toxicology and Applied Pharmacology. Female B6C3F1 mice were sub-chronically treated with two non-carcinogenic (1 and 2 mg/kg bw) and two carcinogenic (4 and 8 mg/kg bw) doses of furan for 21 days. Half of the mice in each dose group were also treated with 0.02% bromodeoxyuridine (BrdU) for five days prior to sacrifice [13]. Agilent gene expression microarrays were used to measure changes in liver gene and long non-coding RNA expression (published in Toxicological Sciences). Here we describe the experimental and quality control details for the microarray data. We also provide the R code used to analyze the raw data files, produce fold change and false discovery rate (FDR) adjusted p values for each gene, and construct hierarchical clustering between datasets.

Keywords: Furan, Microarray, Liver cancer, Dose–response, Statistical analysis in R


Specifications
Organism B6C3F1 mice
Sex Female
Array type Agilent SurePrint G3 Mouse GE 8x60K Microarray
Data format (in GEO) Raw data: TXT files; normalized data: TXT files
Experimental factors Furan exposed vs. un-exposed control
Experimental features Female B6C3F1 mice were sub-chronically exposed for 21 days to control (0 mg/kg bw), non-carcinogenic (1, 2 mg/kg bw), and carcinogenic (4, 8 mg/kg bw) doses of furan. Half of the mice in each group were also given 0.02% BrdU for five days prior to sacrifice (days 16–21). All non-BrdU mice as well as the 0, 1, and 8 mg/kg bw furan + BrdU mice were used for gene expression analysis. Necropsy occurred four hours after the final furan dosing. RNA was extracted from livers and changes in gene expression were analyzed using Agilent microarrays.
Consent All procedures were conducted in compliance with the Animal Welfare Act Regulations (9CFR1–4). Mice were handled and treated according to the guidelines provided in the National Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals (ILAR, 1996; http://dels.nas.edu/ilar/).
Sample source location 5–6 week old female specific pathogen free B6C3F1 mice were purchased from Charles River Breeding Laboratories (Portage, ME). Experiments were conducted at ILS, P.O. Box 13501, Research Triangle Park, NC 27709, USA.

Direct link to deposited data

The complete dataset is available through the Gene Expression Omnibus.

Non-BrdU dataset: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48644

BrdU dataset: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54078

Experimental design, materials and methods

Study design

5–6 week old female specific pathogen free B6C3F1 mice were purchased from Charles River Breeding Laboratories (Portage, ME) and were allowed to acclimatize for at least seven days prior to the start of the study. Feed (NIH-07; Zeigler Brothers, Inc., Gardners, PA) and tap water were available ad libitum up until the time of necropsy. Mice were housed five per cage in polycarbonate cages in a specific pathogen free (SPF) and Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC) accredited facility.

Female mice were dosed with furan (CAS no. 110-00-9) (> 99% pure) (Sigma-Aldrich Chemical Co., Milwaukee, WI) in Mazola corn oil at 0, 1, 2, 4, or 8 mg/kg bw per day by oral gavage for three weeks (n = 10 per dose groups). Upon necropsy, there remained n = 5 mice in each non-BrdU dose group. However, some mice were lost due to early (pre-BrdU treatment) mis-dosing or esophageal puncture, therefore the 0, 1, 2, 4, or 8 mg/kg bw + BrdU groups had n = 4, 5, 3, 4, or 5 mice, respectively. + BrdU mice were treated with 0.02% BrdU (Sigma Chemical Co., St. Louis, MO, USA) in drinking water for five days prior to sacrifice. Four hours after their final dosing, mice were anesthetized by carbon dioxide inhalation prior to euthanasia by exsanguination achieved by cutting the caudal vena cava after blood collection. One animal per group was killed and this continued until all mice had been sacrificed; this occurred over a period of 100 min (beginning at 1 pm). The left, median, right posterior and right anterior lobes of the liver were cut into 0.25–0.5 cm3 pieces that were either snap-frozen in liquid nitrogen and stored at or below − 70 °C or were formalin fixed and paraffin embedded.

Microarray: sample labeling and hybridization

RNA was extracted from ~ 100 mg frozen liver tissue using the RNeasy Midi RNA Extraction kit (Qiagen, Mississauga, ON, Canada). An Omni tissue homogenizer with a disposable 7 mm Omni generator probe was used (Omni #34750, Omni International, Marietta, GA). RNA was quantified using a NanoDrop Spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA), qualified using an Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Mississauga, ON, Canada), and stored at − 80 °C. Sample RNA integrity numbers (RINs) ranged from 8.9–10.

Sample RNA (200 ng) was used together with a mouse universal reference RNA (Stratagene by Agilent Technologies Inc.) to synthesize, amplify and label cRNA using the Low Input Quick Amp Labeling Kit (Agilent Technologies Inc.). Labeled cRNA was purified using the RNeasy Mini Kit (Qiagen). Amplification and labeling efficiency of cRNA were quantified using a NanoDrop spectrophotometer. Hybridization mixes were prepared using the Hi-RPM Gene Expression Hybridization Kit (Agilent Technologies Inc.). 300 ng of Cy3-labeled reference RNA and 300 ng Cy5-labeled sample cRNA were hybridized on SurePrint G3 Mouse GE 8 × 60 K microarrays (Agilent Technologies Inc.) at 65 °C for 17 h at 10 rpm. Samples were arranged on arrays according to a randomized block design (RBD). An RBD is used to control for a source of variability that is not related to the experimental question. The microarray slides used in this experiment each had eight arrays, which means that a total of eight samples could be hybridized to each slide. Since this experiment had more than eight samples, there were two blocking factors that could impact the results: (1) which slide the samples appeared on, and (2) which array the sample was assigned to (i.e. the sample's location on a microarray slide, of which there are eight options). All samples (control, treated, ± BrdU) were randomized across five slides. Slides were washed according to the manufacturer's specifications with Gene Expression Wash Buffers 1 and 2 (Agilent Technologies Inc.), scanned using an Agilent G2505B scanner at 5 μm resolution. Data were pre-processed using the Agilent Feature Extraction Software, version 11. All arrays met the minimum Agilent QA/QC standards. All kits were carried out according to the manufacturer's protocol.

Normalization of microarray data

Boxplots of the relative ratio and signal intensities (MA plots) were constructed and inspected. MA plots are used to identify systematic variations (including dye biases and poor hybridization efficiency) within arrays. ‘M’ is the log red/green intensity ratio (log(R/G), a measure of differential gene expression) and ‘A’ is the average log of the product of the two intensities (log(R⁎G), a measure of overall fluorescence intensity). Since the majority of probes are not expected to be differentially expressed in response to the treatment, the majority of points on the y-axis (M) should fall at zero. Data where the majority of points on the y-axis are not at zero must be LOWESS (LOcally WEighted Scatterplot Smoothing) normalized [10] (Fig. 1). Non-background-subtracted median signal intensities were normalized in R [7] using the transform.madata function in the MAANOVA library [9]. Probes with technical replicates were then averaged.

Fig. 1.

Fig. 1

A representative MA plot for (A) before and (B) after LOWESS normalization. Each dot represents a probe on the microarray. The red line is the line of best fit through all data points.

Quality assessment and control of microarray data

The background for each array was assessed using the negative control 3xSLv1 probe, which forms a hair pin and does not hybridize well with labeled samples. There are 182 technical replicates of this probe distributed across the microarray. The trimmed mean (trim = 0.05) and standard deviation was used to measure the background fluorescence and background variability for each array. Using trimmed statistics reduces the effects of outliers that may be present. This is done by removing a percentage of the largest and smallest values before calculating the statistics. Fig. 2 provides a visual representation of these estimates. Arrays with background fluorescence that fell outside of these ranges were repeated.

Fig. 2.

Fig. 2

Background estimates of each array for reference (Cy3, green) and sample (Cy5, red) mean fluorescent signal of the 3xSLv1 negative control probe (n = 182 probes). Error bars = standard deviation; FLU = fluorescent units.

To identify microarrays with poor data quality, hierarchical clustering analysis with average linkage (a method for calculating distance between clusters that averages over all pairs of objects between two clusters) was performed in R software (hclust function in the stats library) using the normalized log ratios (M values) for all probes (Fig. 3). Since the majority of probes will not be differentially expressed in response to the treatment, the expectation is that there will be no differences between samples. To obtain a fair inter-array comparison, signal intensities were adjusted to control for slide effect (i.e. to control for slide-to-slide variation in signal intensity that occurs as an artifact of experimental conditions, as opposed to a real treatment-related effect) using a linear model, which subtracted the estimated slide effect from the log ratios. One minus the Pearson correlation was used as the measure of dissimilarity. The original dendrogram was cut at 0.3, which has 6 clusters. The 5 samples that clustered separately (at the top) were considered outliers and were then repeated.

Fig. 3.

Fig. 3

Hierarchical clustering of all probes based on normalized signal intensity ratios. The red line and bracket delineate outlier arrays, which were repeated (repeated arrays are indicated by arrows). Colored boxes represent dose groups where orange, lime, green, aqua-green, and light blue are the 0, 1, 2, 4, and 8 mg/kg bw groups; and, royal blue, purple, and pink are the 0, 1, and 8 mg/kg bw + BrdU groups.

Upon repetition and/or removal of outlying arrays (arrays with high background, systematic dye variation, or poor hybridization efficiency), the final sample sizes used for gene expression analysis were n = 5, 4, 5, 4, and 5 for the non-BrdU 0, 1, 2, 4, and 8 mg/kg bw furan dose groups, respectively; and n = 4, 5, 5 for the 0, 1, and 8 mg/kg bw furan + BrdU groups, respectively. Gene expression analysis was not conducted for the 2 and 4 mg/kg furan + BrdU animals.

Differential gene expression analysis

Differential gene expression was determined in R using the MicroArray ANalysis Of VAriance (MAANOVA) library [9]. Included in the MAANOVA model are the block (slide) and the treatment effects. A typical ANOVA generates an F statistic, which is the ratio of inter- to intra-group variance. The MAANOVA uses a modified F statistic, called an Fs statistic [3], to determine gene-specific treatment effects. Using the James–Stein shrinkage concept [6], the Fs statistic has an improved estimation of error variance because it estimates intra-group variance based on global gene expression (as opposed to expression of individual genes, which have far fewer data points associated with them). The associated p-values were then estimated using the permutation method (i.e. bootstrapping) with 30,000 permutations and residual shuffling. In this instance, bootstrapping increases statistical power of gene expression associated p-values because the typical sample size for gene expression studies is quite low (n = ~ 3–6 per group). The p-values were adjusted for multiple comparisons using the Benjamini–Hochberg false discovery rate (FDR) approach [1]. This approach sets a more stringent significance threshold of α = 0.05 for all comparisons (as opposed to α = 0.05 for individual comparisons, which would result in a higher false positive rate). Individual gene fold changes were estimated using least square means of each pairwise comparison. Genes having an FDR-adjusted p ≤ 0.05 and a fold change ≥ ± 1.5 were deemed differentially expressed.

Gene expression meta-analysis in R

Data were obtained from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) and the European Bioinformatics Institute (http://www.ebi.ac.uk/arrayexpress/). All data generated using the one-color Affymetrix platform (E-MEXP-82; GSE13149; GSE18858; GSE20427; GSE26538) were background-subtracted and normalized using Robust Multi-array Average, RMA [12], using the ReadAffy() function in the affy library in R [4]. Since the RMA requires background subtraction, data generated by all other platforms were also background subtracted for consistency. Signal intensities generated on the Agilent two-color platform (GSE48644; BaP) were normalized using the LOWESS approach. For two-color experiments using other, non-Agilent platforms (GSE35934; GSE4874), signal intensities (with a dye adjustment) were normalized using cyclic LOWESS [2]. Normalized probe intensities for probes with common gene symbols were then averaged using the median normalized signal intensity. Normalized data from each data set were then merged together yielding 3190 common gene symbols. Hierarchical clustering was conducted using the hclust function with average linkage using one minus the Pearson correlation as the distance metric in R. The relevant figure can be viewed in [5].

Discussion

Here we have described the steps taken to analyze changes in gene expression in the livers of furan-exposed female B6C3F1 mice using Agilent microarrays and R (see Fig. 4 for summary). We have demonstrated that the data are of high quality and have provided all of the tools required for reproducibility. The biological significance of the study was previously reported: differential gene expression [5], differential long non-coding RNA expression [8], and effect of BrdU on gene expression [11]. We anticipate that the importance of toxicogenomics studies in chemical carcinogen assessment will continue to increase in the coming years and believe that the rate at which this occurs will be highly dependent upon ensuring public availability of these very powerful datasets.

Fig. 4.

Fig. 4

Summary of steps taken to generate, normalize and analyze two-color Agilent gene expression microarray data.

Footnotes

Appendix A

Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.gdata.2014.05.013.

Contributor Information

Anna Francina Webster, Email: Francina.Jackson@hc-sc.gc.ca.

Andrew Williams, Email: Andrew.Williams@hc-sc.gc.ca.

Leslie Recio, Email: lrecio@ils-inc.com.

Carole L. Yauk, Email: Carole.Yauk@hc-sc.gc.ca.

Appendix A. Supplementary data

Supplementary material 1

RCode for dendogram (.txt file).

mmc1.txt (2KB, txt)
Supplementary material 2

RCode for datapre-processing (.txt file).

mmc2.txt (10.8KB, txt)
Supplementary material 3

RCode for quality assessment and control (.txt file).

mmc3.txt (5.5KB, txt)
Supplementary material 4

RCode for differential gene expression analysis (.txt file).

mmc4.txt (4.8KB, txt)
Supplementary material 5

RCode for dendogram from Jackson et al. 2014 (.txt file).

mmc5.txt (3.1KB, txt)
Supplementary material 6

RCode for processing GEO dataset GSE13149 for use in dendogram in Jackson et al. (2014).

mmc6.txt (2.5KB, txt)
Supplementary material 7

RCode for processing GEO dataset GSE18858 for use in dendogram in Jackson et al. (2014).

mmc7.txt (10.4KB, txt)
Supplementary material 8

RCode for processing GEO dataset GSE20427 for use in dendogram in Jackson et al. (2014).

mmc8.txt (2.6KB, txt)
Supplementary material 9

RCode for processing GEO dataset GSE26538 for use in dendogram in Jackson et al. (2014).

mmc9.txt (2.4KB, txt)
Supplementary material 10

RCode for processing GEO dataset GSE35934 for use in dendogram in Jackson et al. (2014).

mmc10.txt (3KB, txt)

References

  • 1.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 1995;57(1):289–300. (289) [Google Scholar]
  • 2.Bolstad B.M., Irizarry R.A., Åstrand M., Speed T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
  • 3.Cui X., Hwang J.T., Qiu J., Blades N.J., Churchill G.A. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics (Oxford, England) 2005;6(1):59–75. doi: 10.1093/biostatistics/kxh018. [DOI] [PubMed] [Google Scholar]
  • 4.Gautier L., Cope L., Bolstad B.M., Irizarry R.A. Affy — analysis of affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20(3):307–315. doi: 10.1093/bioinformatics/btg405. [DOI] [PubMed] [Google Scholar]
  • 5.Jackson A.F., Williams A., Recio L., Waters M.D., Lambert I.B., Yauk C.L. 2014. Case study on the utility of hepatic global gene expression profiling in the risk assessment of the carcinogen furan. [DOI] [PubMed] [Google Scholar]
  • 6.James W., Stein C. Vol. 1. 1961. Estimation with quadratic loss; pp. 361–379. (Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability: Contributions to the Theory of Statistics, Berkeley, California, USA.). (361) [Google Scholar]
  • 7.R-Core-Development-Team . R Foundation for Statistical Computing; Vienna, Austria: 2012. R: a language and environment for statistical computing. [Google Scholar]
  • 8.Recio L., Phillips S.L., Maynor T., Waters M., Jackson A.F., Yauk C.L. Differential expression of long non-coding RNAs in the livers of female B6C3F1 mice exposed to the carcinogen furan. Toxicol. Sci. 2013 doi: 10.1093/toxsci/kft153. [DOI] [PubMed] [Google Scholar]
  • 9.Wu H., Kerr K.K., Cui X., Churchill G.A. 2003. MAANOVA: a software package for the analysis of spotted cDNA microarray experiments. [Google Scholar]
  • 10.Yang Y.H., Dudoit S., Luu P., Lin D.M., Peng V., Ngai J., Speed T.P. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002;30(4) doi: 10.1093/nar/30.4.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Webster A.F., Williams A., Recio L., Yauk C.L. Bromodeoxyuridine (BrdU) treatment to measure hepatocellular proliferation does not mask furan-induced gene expression changes in mouse liver. Toxicology. June 2014 doi: 10.1016/j.tox.2014.06.002. [DOI] [PubMed] [Google Scholar]
  • 12.Irizarry R.A., Hobbs B., Collin F., Beazer-Barclay Y.D., Antonellis K.J., Scherf U., Speed T.P. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (Oxford, England) 2003;4(2):249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
  • 13.Francina Webster Anna, Williams Andrew, Recio Leslie, Yauk Carole L. Gene expression analysis of livers from female B6C3F1 mice exposed to carcinogenic and non-carcinogenic doses of furan, with or without 5 days Bromodeoxyuridine (BrdU) treatment. Toxicology. June 2014 doi: 10.1016/j.gdata.2014.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1

RCode for dendogram (.txt file).

mmc1.txt (2KB, txt)
Supplementary material 2

RCode for datapre-processing (.txt file).

mmc2.txt (10.8KB, txt)
Supplementary material 3

RCode for quality assessment and control (.txt file).

mmc3.txt (5.5KB, txt)
Supplementary material 4

RCode for differential gene expression analysis (.txt file).

mmc4.txt (4.8KB, txt)
Supplementary material 5

RCode for dendogram from Jackson et al. 2014 (.txt file).

mmc5.txt (3.1KB, txt)
Supplementary material 6

RCode for processing GEO dataset GSE13149 for use in dendogram in Jackson et al. (2014).

mmc6.txt (2.5KB, txt)
Supplementary material 7

RCode for processing GEO dataset GSE18858 for use in dendogram in Jackson et al. (2014).

mmc7.txt (10.4KB, txt)
Supplementary material 8

RCode for processing GEO dataset GSE20427 for use in dendogram in Jackson et al. (2014).

mmc8.txt (2.6KB, txt)
Supplementary material 9

RCode for processing GEO dataset GSE26538 for use in dendogram in Jackson et al. (2014).

mmc9.txt (2.4KB, txt)
Supplementary material 10

RCode for processing GEO dataset GSE35934 for use in dendogram in Jackson et al. (2014).

mmc10.txt (3KB, txt)

Articles from Genomics Data are provided here courtesy of Elsevier

RESOURCES