Newman et al. 10.1073/pnas.0510909103. |
Supporting Figure 5
Supporting Table 1
Supporting Text
Supporting Figure 6
Supporting Table 2
Supporting Table 3
Supporting Table 4
Supporting Figure 7
Supporting Table 5
Supporting Figure 8
Supporting Table 6
Supporting Table 7
Fig. 5.
Phenotypic rescue of CSB-null cells by expression of wild-type CSB. (a) Western blot for CSB shows that transient expression of wild-type protein in CSB-null cells is similar to endogenous expression in other fibroblast cell lines (HT1080 and WI-38/htert). Western blot on the stable rescued lines demonstrated similar CSB expression (not shown). (b) Stable expression of wild-type CSB rescues the chromosomal fragility phenotype of CSB-null cells. (c) Cells stably expressing wild-type CSB (squares) are significantly less sensitive to UV irradiation than CSB-null cells (circles).Fig. 6.
Overabundance of significant changes in CSB microarray data. Number of probes in the actual data (blue) that had a given number of identical change calls among the nine microarray comparisons, compared to the results of a label-randomization simulation (red) or binomial calculations that assume random distribution of the total change calls only among probes with at least one change call in the actual data (green) or among all probes called "present" in the actual data (yellow).Fig. 7.
Validation of microarray results by quantitative real-time PCR (Q-RT-PCR). Microarray signal log2 ratio is plotted on the abscissa against log2 of the Q-RT-PCR fold change on the ordinate. Genes with identical fold changes by microarray and RT-PCR would be found along the diagonal. RNA was prepared with either TRIzol reagent (a and b) or RNeasy (c and d). Relative expression was normalized using GAPD (a and c) or total weight of RNA (b and d). All combinations of preparation and normalization showed fold changes similar to the microarrays for at least 21 of the 26 genes tested.Fig. 8.
Other lists from the L2L Microarray Database that overlap significantly (P < 0.02) with genes up- or down-regulated by CSB. Symbols and nomenclature are as in Fig. 2. (a) Chromatin-disrupting conditions overlap consistently with CSB. (b) Various DNA-damaging agents overlap inconsistently with CSB. (c) Genes related to inflammation and hypoxia are down-regulated by CSB, whereas genes related to growth and proliferation are up-regulated by CSB. (d) IFN-induced genes are up-regulated by CSB. (e) Genes regulated by CSB overlap with several lists of genes regulated by normal aging.Supporting Text
Cockayne syndrome group B (CSB) Western blot
. GM00739B/hTERT cells were transfected in 100-mm tissue culture plates with 10 mg of plasmid constructs by using 15 ml of Fugene 6 reagent (Roche Applied Sciences, Indianapolis). After 48 h, cells were washed with PBS and harvested by scraping. Cell pellets were resuspended in 100 ml of SDS loading buffer (25 mM Tris, pH 6.8/2% SDS/0.1% bromophenol blue/10% sucrose/0.12 M 2-mercaptoethanol), sonicated to shear DNA, and denatured by heating at 95°C for 10 min. Nontransfected plates of HT1080 and WI-38/hTERT cells were harvested in the same manner. Proteins were separated on a 6% gel by SDS/PAGE by using the MiniProtean 3 Cell (BioRad) in a Tris/glycine/SDS buffer (1.5 g/liter Tris base/7.2 g/liter glycine/1% SDS). Proteins were transferred to a poly(vinylidene difluoride) (PDVF) membrane in 25 mM Tris/192 mM glycine/20% methanol buffer by using a Mini TransBlot Cell (BioRad). After transfer, the PVDF membranes were blocked for 2 h at room temperature in TBST (50 mM Tris, pH 7.4/150 mM NaCl/0.05% Tween 20) plus 5% nonfat dry milk. The membrane was then incubated at room temperature in TBST plus 5% nonfat dry milk for 2 h with a 1:1,000 dilution of primary antibody, washed twice for 10 min each, incubated for 1 h with a 1:5,000 dilution of horseradish-peroxidase-conjugated secondary antibody (Santa Cruz Biotechnology), and finally washed four times for 10 min each in TBST alone. Chemiluminescent detection was performed using the ECL Plus Western Blotting Detection System (Amersham Pharmacia) and Kodak X-Omat Blue film. The anti-CSB antibody is a rabbit polyclonal raised to the C-terminal 158 amino acids of CSB expressed as a GST fusion protein. Anti-GST antibodies were removed from the serum by passage over a GST column.Verifying rescue of CSB cell lines.
Chromosomal fragility at the RNU2 locus was assayed as described (1). To determine UV sensitivity, CSB-null, and CSB-wt cells at 4050% confluence were exposed to 2.5 and 20 J/m2 of broad-spectrum UV light. Plates were washed with Pucks EDTA and refed. After 3 d, cells were collected by trypsinization, fixed in ethanol, and stained with propidium iodide by using a standard protocol (2). DNA content was determined by using the BD FACScan flow cytometer in the Cell Analysis Facility of the Department of Immunology, University of Washington. UV sensitivity was calculated as the enrichment of cells with subG1 DNA content, compared to unirradiated controls.Statistical validation of microarray results.
To calculate false discovery rates (3) for each number of identical change calls observed within a set of pairwise comparisons, we performed a series of data-randomization calculations and a label-randomization simulation. The probability that a given probe set will accumulate seven or more identical change calls, out of a possible nine, can be calculated from the binomial distribution using the overall frequency of change calls in the data. We first performed this calculation under the assumption that change calls would be randomly distributed among all probe sets on the array but rejected this assumption (and the resulting P value of 4e-7) as unrealistically permissive: Many probe sets on the array are not detectable at all in the data, and certain probe sets might be predisposed to artifactual change calls. We therefore repeated the calculation with two stricter assumptions: that (i) change calls are distributed only among probe sets that are called "present" at least once in the actual CSB data, and (ii) change calls are distributed only among probes sets with at least one change call in the actual data. The resulting P values were 0.00005 and 0.003, respectively. The corresponding false discovery rates can be calculated by dividing the number of probe sets expected to accumulate each number of change calls (based on the binomial calculations) by the number of probe sets in the actual data that accumulated the same number of change calls. This generates false discovery rates of »0.5% and 30%, respectively.To determine which of these assumptions was most realistic, we also performed a label-randomization simulation using the actual CSB data. A set of six biological samples (such as our three CSB-wt and three CSB-null replicates) can be compared in 30 different directional pair-wise combinations. In each of 12,543 trials, we randomly selected nine of these comparisons and determined the frequency of change calls for each probe set. True random numbers for the selection were obtained from Random.org. The results of the simulation demonstrated a P value of »0.0002 for probe sets with seven of more identical change calls, corresponding to a false discovery rate of 1%. Fig. 6 displays a summary of the results of the three methods: the number of probe sets expected to accumulate each number of change calls by each method, and the number of probe sets with the same number of change calls in the actual data.
Accounting for cell-cycle status.
Importantly, using a cutoff of seven identical change calls mitigates the spurious effects of cell-cycle status. Our first microarray replicate of CSB-wt cells showed a significant overabundance of many ribosomal protein transcripts compared to CSB-null cells. Although this might have been explained by the role of CSB in RNA polymerase I transcription (4), we wanted to rule out any confounding effects of cell-cycle status by analyzing the DNA content of the cell harvests used in the two subsequent replicates. In one of these, the CSB-wt sample had a higher proportion of S phase cells, along with an overabundance of ribosomal protein transcripts. In the second, however, the fraction of S phase cells was similar in CSB-wt and -null cells, and there was no difference in ribosomal protein transcript levels. We conclude that differential expression of ribosomal protein genes reflects cell-cycle status. Using seven identical change calls as the cutoff, rather than six, eliminates any cell-cycle-related transcripts from our list of reliable changes.Statistical validation of L2L results.
The L2L program uses the binomial distribution to calculate a P value for the significance of overlap between the data and each list in the database. In our initial description of L2L (5), we validated these nominal P values by random-data simulation for analysis of a published diabetic nephropathy data set. We created 10,000+ random data sets of the same size as the diabetic nephropathy data set, analyzed all with L2L, and calculated how frequently binomial P values of a given significance occurred in the random data. This analysis suggested that the cutoff for true significance among the binomial P values should be roughly between 0.01 and 0.05, depending on the particular data set. Here, we performed a similar random-data simulation to validate our CSB results, but one that accounts for the nonrandomness of real data caused by concordance of multiple probe sets corresponding to a single gene. Many genes on the Affymetrix arrays are represented by multiple probe sets; these generally represent variant transcripts that may be differently regulated, and L2L therefore treats each probe set individually when calculating P values. However, this has the potential to inflate a reported P value, if several probe sets representing the same gene are all found in the data.To determine how to account for concordance of related probe sets in our validation experiments, we examined the pattern of concordance in our CSB data. Unexpectedly, we found that concordance was relatively uncommon and was largely independent of the number of probe sets per gene. Among all of the genes with multiple probe sets that appear in a given data set, 70% were represented by only one probe set. The remaining 30% had more than one probe set in the data. Surprisingly, this ratio remained constant as the number of probe sets per gene increased; if each additional probe set had an equal chance of appearing in the data, we would expect them to follow a normal distribution, so that the odds of having only one probe in the data would decrease significantly as the number of probe sets increased. For genes with three, four, five, or more probe sets, the additional probe sets are substantially underrepresented relative to this expectation. Among the 30% of genes with multiple probe sets in the data, however, the likelihood of finding each additional probe set in the data did seem to follow a normal distribution, with odds that varied somewhat between data sets but that were generally similar to the odds that a gene with two probe sets on the array had both probe sets appear in the data (the "two-probe concordance odds"). In other words, genes with multiple probe sets appear to follow two rules: first, 70% contributed only one probe set to the data; second, each probe set for the remaining 30% had an equal chance of appearing the data. A possible explanation for this behavior is that although »30% of genes appear to have two abundant variants that are similarly regulated, additional probe sets may often represent variants that are less abundant, less likely to be coregulated or not as well characterized.
The two observed concordance parameters were incorporated into the iterative algorithm we used to create random data sets. A probe set was selected from the named genes on the array using true random numbers from Random.org and added to the random data set. If the probe set corresponded to a gene with other probe sets on the array, we gave it a 70% chance of remaining the sole representative of that gene in the data. For the remaining 30%, we gave each of the other probe sets for that gene an equal and independent chance of being included in the data, using the two-probe concordance odds. If no additional probe sets were selected, we repeated the contest until at least one probe set was selected. All of the probe sets for this gene were then excluded from participating in the contest again. Another probe set was then randomly selected from the array and the process repeated until the random data set grew to the same size as the real data set being modeled. The random data sets generated by this process closely approximated the pattern of probe set concordance in the real data set.
To validate the binomial P values generated by L2L, we created 10,000 random data sets for each real data set (CSB-Up and CSB-Down), analyzed all with L2L, and tabulated the results (Table 5). We used two techniques to validate the nominal binomial P values from this simulation data. First, we determined how frequently a specific list from the L2L database generated, in the random data, a P value equal to or less than the P value for that list from our real data. This is the list-specific simulation-adjusted P value. Each list from the database performs differently with random data, because of its gene composition and especially how many genes are represented by multiple probe sets on the array. The list-specific P value is therefore the best measure of true significance, but because it is specific to each data point, it does not lend itself readily to calculation of a false-discovery rate. Second, we determined how frequently a particular binomial P value occurred in the random data, regardless of which list produced it. This is the global simulation-adjusted P value. Although this number is inferior to the list-specific P value for determining the true significance of a particular data point, it permits us to easily calculate the false-discovery rate. We proceed down the list of binomial P values, first determining the number of expected false positives by multiplying the current global simulation-adjusted P value by the total number of lists in the database. We then divide the number of expected false positives by the number of all positives (lists with a binomial P value less than or equal to the current binomial P value). The result is the expected proportion of false positives in the data, or the false-discovery rate.
Our simulation data suggested 0.02 as an appropriate cutoff for true significance among the nominal binomial P values generated by L2L. This value corresponded approximately to a global simulation-adjusted P value of 0.03 and a list-specific simulation-derived P value of 0.05. The false discovery rate for this cutoff is 10% (see Table 5 for details). We did not perform a similar simulation analysis for Gene Ontology terms but elected to use the same cutoff for significance.
Throughout the manuscript, we organized our discussion of significant L2L results according to what appear to be biologically meaningful themes. To put the significance of these themes on a quantitative basis, we tabulated all of the related database lists that might fit into each theme (aging, chromatin disruption, UV damage, etc.), distinguishing up- and down-regulated lists where possible. We then asked how many lists within each theme overlapped significantly with our CSB data (P < 0.02). Finally, we mined our 10,000-trial simulation data to determine how frequently that many or more lists within a theme achieved significant P values among the random data sets. The results are summarized in Table 6. For the chromatin-disruption theme, we also tabulated all of the probe sets that were found on any of the lists for the theme, determined how many were found in the CSB data, and mined the simulation data to determine how frequently that many or more probe sets from the theme were found in random data sets.
Real-time RT-PCR.
Total RNA was harvested either directly from adherent cells with TRIzol reagent (Ambion, Austin, TX) or from trypsinized cells with RNeasy columns (Qiagen, Valencia, CA). For inhibitor studies, cells were treated with 1 mM trichostatin A (TSA) or 5 mM 3-aminobenzamine (3AB) for 24 h before harvest. Synthesis of cDNA was primed with oligo(dT) and carried out by using Superscript II reverse transcriptase (Invitrogen). Each real-time reaction consisted of cDNA template from 50 ng of total RNA, 300 nM 5' and 3' gene-specific primers, and 1× SYBR green master mix (Applied Biosystems) in 20 ml of total reaction volume. All reactions were performed in triplicate by using a DNA Engine Opticon real-time PCR system (MJ Research, Cambridge, MA). Relative differential expression was calculated from mean threshold cycle difference among the three reactions, normalized either to total weight of template or to mean GAPD threshold cycle (both for microarray validation; the latter for investigation of TSA and 3AB effects).Colony-forming assay.
To test survival after drug treatment, cells were plated at a density of »100 per 10-cm plate in media containing either TSA (50, 200, 500, and 1,000 nM), 3AB (0.5 and 1.0 mM), or 0.1% ethanol from 1,000× stocks (all drugs from Sigma). After 2 d, cells were fed with fresh medium and refed every 4 d until harvest on day 16. 3AB was refreshed at each feeding; prolonged TSA exposure prevented colony formation at all doses, so treatment was limited to the first 2 d. Culture plates were fixed with 10% formaldehyde, stained with 0.1% crystal violet, and scanned as full-color 300 dpi TIFF images. Images were imported into IMAGEJ (National Institutes of Health), and colonies containing at least 100 pixels were counted for three replicate plates.1. Yu, A., Fan, H. Y., Liao, D., Bailey, A. D. & Weiner, A. M. (2000) Mol. Cell 5, 801-810.
2. Robinson, J. P. (2001) Curr. Protocols Cytom. (Wiley, New York).
3. Benjamini, Y. & Hochberg, Y. (1995) J. R. Stat. Soc. B 57, 289-300.
4. Bradsher, J., Auriol, J., Proietti de Santis, L., Iben, S., Vonesch, J. L., Grummt, I. & Egly, J. M. (2002) Mol. Cell 10, 819-829.
5. Newman, J. C. & Weiner, A. M. (2005) Genome Biol. 6, R81.