Abstract
Background
Microarray Comparative Genomic Hybridization (array CGH) provides a means to examine DNA copy number aberrations. Various platforms, brands and underlying technologies are available, facing the user with many choices regarding platform sensitivity and number, localization, and density distribution of probes.
Results
We evaluate three different platforms presenting different nature and arrangement of the probes: The Agilent Human Genome CGH Microarray 44 k, the ROMA/NimbleGen Representational Oligonucleotide Microarray 82 k, and the Illumina Human-1 Genotyping 109 k BeadChip, with Agilent being gene oriented, ROMA/NimbleGen being genome oriented, and Illumina being genotyping oriented. We investigated copy number changes in 20 human breast tumor samples representing different gene expression subclasses, using a suite of graphical and statistical methods designed to work across platforms. Despite substantial differences in the composition and spatial distribution of probes, the comparison revealed high overall concordance. Notably however, some short amplifications and deletions of potential biological importance were not detected by all platforms. Both correlation and cluster analysis indicate a somewhat higher similarity between ROMA/NimbleGen and Illumina than between Agilent and the other two platforms. The programs developed for the analysis are available from http://www.ifi.uio.no/bioinf/Projects/.
Conclusion
We conclude that platforms based on different technology principles reveal similar aberration patterns, although we observed some unique amplification or deletion peaks at various locations, only detected by one of the platforms. The correct platform choice for a particular study is dependent on whether the appointed research intention is gene, genome, or genotype oriented.
Background
Microarray technology has become a powerful tool for many scientific and diagnostic applications. In cancer research the detection of genomic aberrations is crucial for associating copy number changes with cancer phenotypes or critical genes. For array Comparative Genomic Hybridization (array CGH), several methods and platforms have been developed (see reviews [1,2]). Microarray copy number detection systems differ in their probe origin (BAC, cDNA or oligonucleotides [3-6]), production (spotting, polymerization or microbeads), gene density (coverage of probes per gene or physical intercept), hybridization (digestion, hybridization to reference), and labeling technique (single or two-color systems). Laboratories are often required to evaluate the diverse microarray formats, considering different biological questions, experimental designs, material restrictions, and resolutions or data processing challenges. Comparability and reproducibility of results have always been important issues. Hence, it is important to evaluate microarray platforms not only based on their production characteristics but also using a variety of analytical and statistical methods. A comparative analysis of expression platforms has previously been performed for gene expression measurements [7-10]. However, to our knowledge this is one of the first publications validating different array CGH formats using tumors as material.
In this report, we compare three major DNA microarray platforms: The Agilent Human Genome CGH Microarray 44 k, the ROMA/NimbleGen Representational Oligonucleotide Microarray 82 k, and the Illumina Human-1 Genotyping 109 k BeadChip. Oligonucleotide probes used for the Agilent array cover both coding and non-coding sequences, and most reporters are located in genes (gene oriented arrangement). Oligonucleotides in the ROMA/NimbleGen technology are based on Bgl II cutting sites, hence reporters are more or less randomly distributed across the entire genome providing a detailed picture of the structure and organization of the complete genome (genome orientated arrangement). The Illumina platform on the other hand provides a dense, exon-centric view of the genome (genotyping arrangement). The three platforms were tested with a set of 20 primary breast tumor samples. The samples are part of a larger cohort of stage I and II primary tumors [11], including several distinct expression subtypes expected to present with a number of common aberrations for human breast cancer [12]. Results achieved were validated using different graphical and statistical methods, many performed with the CGH-Explorer analysis tool [13]. One goal of our study was to investigate to what extent platforms of different nature and design perform differently in terms of detecting aberrant structures, regarding both size and amplitude of copy number changes. The results of the analysis were evaluated to investigate whether the number of probes, density distribution, probe localization, sensitivity and aberration calling method had any effect on the overall performance of the platform. Overall, we found that all platforms included in this study give a similar general picture of the DNA rearrangements in the tumors, including genomic instability profiles, although some details differ substantially.
Results
Whole genome analysis reveals overall similarity between platforms
The overall pattern is similar for all three platforms, as confirmed by the results displayed in Figure 1. Here, the overall frequency of amplification and deletion events in the tumor samples was estimated for the three platforms using the PCF (Piecewise Constant Fit) algorithm (see Methods). Analysis of copy number changes using a method based on a different principle (ACE – Analysis of Copy Number Errors [13]) showed the same overall pattern (see Additional file 1). All platforms detect the same previously described common aberrations [14-16], like high frequency of amplification of chromosomal regions 1q, 8q, 17q and deletions at 16q and 17p (Figure 1 and see Additional file 1). Figure 2 illustrates that the platforms reveal similar results with respect to amplification peaks, though with some minor differences in amplitude height and/or number of events, amplification region size or pattern.
Figure 3 shows histograms for the distributions of the PCF-fitted copy numbers and scatterplots comparing all three platforms. There is strong correlation between platforms, though with a notable difference in scale between Agilent and the two other platforms, in accordance with what other authors have reported on cell lines [17]. Computing correlation coefficients between the platforms for the tumor samples confirms this picture, with median Pearson correlation of 0.77 for Illumina versus ROMA/NimbleGen and close to 0.6 for Agilent versus the other two platforms (see Additional file 2). We also applied Total Least Squares (TLS) regression to fit regression lines with zero intercept to the data in Figure 3. This analysis yielded a median slope of 0.47 (IQR = 0.45) for Agilent versus ROMA/NimbleGen, 0.55 (IQR = 0.26) for Agilent versus Illumina, and 0.99 (IQR = 0.52) for ROMA/NimbleGen versus Illumina. In Figure 4 the PCF estimated log ratios are used to cluster the 60 cases obtained from the 3 platforms used on the 20 tumor samples. For 14 of the 20 tumors, the three platforms are clustered together at the lowest possible cluster level and for 4 of the remaining tumors Illumina and ROMA/NimbleGen cluster together.
Figure 5 shows ROC curves comparing the platforms pairwise with respect to aberration calling. Not surprisingly, the curves reveal an increasing correspondence as the threshold for calling an aberration is increasing. This is confirmed by inspecting the Area Under Curves (AUCs) for the ROC analysis (see Additional file 3). A somewhat better correspondence for Illumina versus ROMA/NimbleGen than for Agilent versus the other platforms is seen.
Additional file 4 shows the degree of concordance between the platforms with respect to the classification of probes as amplifications, deletions or normal. The table is based on one particular selection of detection thresholds, the relative size of the thresholds reflecting the relative scale of the log ratios of the different platforms. In all platforms, the majority of probes are classified as being normal (i.e. neither amplified nor deleted). Probes that are called as amplified (or deleted) on one platform are most often called likewise or as normal on the platform considered for comparison. Opposite decisions, in the sense that one platform calls an event as amplification and the other platform calls a deletion, are very rare. Hence, in terms of the direction of aberrations, the platforms are in large agreement with each other. Nevertheless, there is a substantial proportion of probes that are called on one platform and not on another. Note that the detection thresholds used here are not optimized with respect to the number of concordant classifications of probes.
Instability indices, scoring the presence of localized regions of clustered, narrow amplification peaks on a chromosome arm, were determined (see Methods) for each platform in each of the 20 tumor samples. In all platforms, very high instability indices for chromosome 10 to 18 were found for patient 085 (green) on chromosome 11 and 12, and for patient 053 (red) on chromosome 17 (Figure 6). Additionally, high instability indices were detected for patient 053 on chromosome 11, on chromosome 15 for patient 148 and on chromosome 14 for patient 263.
Differences between the various platforms due to gene density or other factors
Despite the general consistency of the platforms, specific variations in frequency and/or in aberration length are visible (see PCF analysis in Figure 1 and the detailed plot for chromosome 8 in see Additional file 5). Most of these differences are due to variance in probe location or density. Probe location is depending on design (evenly distributed probes or clusters of several probes per gene), which may be based on an automatic or manual strategy. Reporters are not evenly spaced in any of the platforms, accounting for differences in genome structure and natural variance in gene density. The distribution of probes for the three different platforms is illustrated for the complete genome (see Additional file 6) and in a close-up for chromosome 8 (see Additional file 7). Of all platforms, ROMA/NimbleGen comes closest to a uniform probe distribution, while Agilent shows high local variation of number of reporters, particular for areas in 1q, 3p, 6p, 11q centromeric, 12q centromeric, 16, 17 and 19. The Illumina platform shows high-density islands of reporters at 6p, 11p telomeric, 12q centromeric, 17p telomeric,19p, and 19q (see Additional file 6).
Interestingly, platform specific small high frequency amplification or deletion peaks were found, as indicated by (*) in Figure 1. Many were nearby the centromeric or telomeric regions. The Agilent platform shows unique small high frequency aberrations of amplification at 3p and of deletions at 4p, twice at 5q, and 9q, and a larger amplification increase towards the telomere of 20q. ROMA/NimbleGen exhibits unique small high frequency deletion at 14q and an amplification at 15q. Increasing the aberration detection sensitivity, some extra platform dependent unique small high frequency amplifications or deletions are observed, e.g. at 14q for the Illumina platform.
We suspect some of the observed unique differences in copy number aberration calling to be of biological importance. We therefore examined examples of these features (Figure 1, indicated by blue or brown bars over chromosomes 3, 14, 15, and 20) in more detail (in close-ups in Figure 7 and by providing detailed information see Additional files 8, 9, 10, 11, 12): At chromosome 3, between position 50.36–50.64 Mb in over 75% of all samples the Agilent platform identifies a region with small amplifications (Figure 7a, for exact probe localization see Additional file 8). Two strongly amplified reporters covering the genes CACNA2D2 (H.s. calcium channel, alpha 2/delta subunit 2) and CISH (H.s. cytokine inducible SH2-containing protein) cause the amplification detection in the Agilent platform. The Agilent platform further detects a larger unique region with amplifications for telomeric region 20q, between 60.00–60.50 Mb (Figure 7b). Genes in this region (see Additional file 12) include SS18L1 (H.s. synovial sarcoma translocation gene on chromosome 18-like), OSBPL2 (H.s. oxysterol binding protein-like 2) and LAMAY5 (H.s. laminin, alpha 5), a gene of potential importance as it is found in the intrinsic gene list used for the classification of breast cancer subtypes [18]. Further, a unique short deletion is detected in over 50% of all samples using the ROMA/NimbleGen platform for the centromeric region of chromosome 14, at 19.00–20.00 Mb (see Additional file 9), covering a region of genes including CCNB1IP1 (H.s. cyclin B1 interacting protein 1), APEX1 (H. s. nuclease multifunctional DNA repair enzyme 1) and TEP1 (H. s. telomerase-associated protein 1). Interestingly, an adjacent unique region detected solely by the Illumina platform in chromosome 14 stretches from 21.20–22.00 Mb (see Additional file 10). Unique small amplification peaks exclusively detected by one of the platforms are likely to be due to differences in reporter density, as seen for the ROMA/NimbleGen platform at chromosome 15, between 18.40–20.40 Mb (Figure 7c). This centromeric peak is densely covered by 19 ROMA/NimbleGen reporters: However, the Agilent platform has a single reporter and Illumina platform provides only 2 reporters for this area (see Additional file 11).
Discussion
Microarray-based comparative genomic hybridization allows the construction of high-resolution maps of genome-wide copy number alterations. Array CGH enables localization of genomic aberrations in tumors, identification of critical genes, and classification of chromosomal changes [1], indicating susceptibility or activation of tumor initiation and progression [19]. Different arrays have been used for CGH-studies starting with cDNA-, followed by BAC-, and more recently by high density oligonucleotide-arrays [3-5,20,21]. Only few comparisons of the various array CGH platforms have been performed and those we are aware of [17,22,23] were based on other platforms and/or cell lines rather than on tumor data. For example, in [17] the focus is primarily on reproducibility, signal-to-noise ratio and resolution differences.
We compared the Agilent Human Genome CGH 44 k oligonucleotide Microarray, the Representational Oligonucleotide Microarray ROMA/NimbleGen 82 k array, and the Illumina SNP-CGH Human-1 (109 k) BeadChip array platforms using human breast tumor samples. Whole genome analysis of called copy number alterations reveals a great overall similarity and strong correlation between platforms. However, in concordance with [17] we detected a notable difference in the scale of the log ratios between Agilent and the two other platforms. In their cell line study with known relative copy number, [17] found markedly higher signals for the Agilent 44 k than for the high-resolution ROMA/NimbleGen 1500 k. Numerically, the factors detected in their study correspond well to the factor of about 0.5 found in our TLS analysis of our tumor samples. Agilent 44 k CGH arrays have a gene oriented arrangement being enriched particularly for cancer relevant genes with local high variation of number of reporters. The combination of these two features may be the reason for the high number of specific small high frequency amplification or deletion peaks particular in the Agilent platform (Figure 1 and see Additional file 6). The ROMA technology, invented at Cold Spring Harbor Laboratory and by NimbleGen [20], is based on representative oligonucleotide probes designed for fragments of the human genome sequence, which are more or less randomly distributed across the genome. The ROMA/NimbleGen arrays provide a gene-independent arrangement of the structure of the complete genome at a high resolution. The bead-based Human-1 109 k arrays from Illumina provide an exon-centric view of the genome through their 109 k SNP markers, of which 70% are located in gene exons or within 10 kb of transcripts. This platform employs an allele-specific primer extension assay using two probes (bead types) in one color channel to score Single-Nucleotide Polymorphisms (SNPs) [24,25].
It is important to be able to distinguish between real copy number variation and variation related to the technical processing and analysis of the arrays. Various programs, web-based tools and statistical software packages are available for exploring and analyzing array CGH data, e.g. [13,26,27], with different accuracy for the estimation of aberrations callings [26]. A major advantage of the aberration calling method used here (PCF), as compared to many other aberration calling algorithms, is that its mathematical form makes it easy to adjust parameters to adapt to the specific platforms to obtain an appropriate comparison.
In comparing platforms, one should be aware that higher density of probes not necessarily implies an improved effective resolution. The effective resolution of a platform, defined as the smallest genomic range for which a reliable classification can be made with respect to aberration status (deleted, normal or amplified), clearly depends on the platform's probe density and on the platform's SNR (Signal-to-Noise Ratio). Let D denote the absolute change in true intensity log-ratio corresponding to the smallest copy number alteration that we want to be able to detect, and suppose the observed intensity log-ratios are normally distributed around the true log-ratio with standard deviation SD. Then, we may define SNR = D/SD. To be able to detect the smallest alteration while controlling the Type I and Type II error rates, the number of probes in the region must exceed a number that scales as the inverse square of SNR. Thus, the effect of doubling the signal-to-noise ratio is comparable to that of increasing the probe density by a factor of four. Of course, an added effect of increasing the probe density, as opposed to improving the SNR, is that local aberration details may be revealed that are partly or completely absent in lower resolution scans.
The content and performance characteristics of a particular platform influence its applicability for a certain study type. We conclude that for gene detection and gene-oriented research the Agilent or the Illumina platform are to be preferred. On the other hand, the ROMA/NimbleGen approach, showing a compact picture of the entire genome structure, is the method of choice for exploration of the various mechanisms leading to different types of genomic instability such as Chromosomal Instability (CIN) or Microsatellite Instability (MIN). The Illumina SNP-CGH arrays can be utilized for detecting LOH and allelic ratios in detecting aberrations. We are aware that several newer platforms with increased reporter numbers exist. However, the aim of this study was to assess to what extent platforms of different nature and design can result in differences in detection of some copy number aberration patterns.
In general, when comparing samples hybridized to different platforms the following steps should to be taken, supposing the data have been normalized according to the platform used: Fit a regression model suitable for copy number data, such as PCF, to obtain estimated log ratios for a desired set of genomic loci. Parameters (for PCF: The penalty parameter and the lower limit on the size of a plateau) should reflect the probe density and noise level. If some samples have been hybridized to both (all) platforms, frequency plots (like Figure 1) should be used to estimate the difference in scale. Then perform aberration calling on the basis of the above fit, compensating for scale differences when estimates are available. Finally, caution should be taken when comparing the small-scale structure of aberrations, since our analysis indicates that the similarities found by different platforms in the higher-level structure are not necessarily accompanied by correspondence in the detailed structure.
Conclusion
Using 20 breast tumor samples and adjusted analytical methods, we observed high overall concordance between the three platforms evaluated, despite substantial differences in the platform composition. Both correlation and cluster analysis indicate a somewhat higher similarity in results obtained by ROMA/NimbleGen and Illumina than between Agilent and the other two platforms. Some short amplifications or deletions of potential biological importance were revealed by only one of the platforms. Detailed examination of these sites indicated that the discrepancy was mainly due to the density and spatial distribution of probes, and other platform specific features. Further studies are needed to verify on the potential biological importance of the sites. The correct platform for a particular study is dependent on the research intention, whether it is gene, genome, or genotype oriented, and on region or location of interest. A complete genomic tiling array including high density gene oriented reporters may be the ultimate goal for the study of genomic alterations in cancer.
Methods
Patient samples
From May 1995 to December 1998, 920 patients referred for surgical treatment of breast cancer were included from five different hospitals in the Oslo region in a large study on detection of isolated tumor cells in bone marrow [28]. From theses 920 patients, tissue samples from 20 breast carcinomas were selected for this study. All 20 breast carcinomas contained >40% tumor cells, the majority of the tumor specimens represent tumor size T1/T2, node status N0/N1 (9/11) and histological grade 2 or 3. The 20 samples have been classified into five clinically relevant tumor subclasses [18]. Tumor DNA was extracted using an ABI 341 Nucleic Acid Purification System (Applied Biosystems) according to the manufacturer's protocol.
Agilent platform
Agilent's Human Genome CGH Microarray 44 k contains 44,255 in situ synthesized 60-mer probes (3,877 controls) designed for studying copy number changes and representing most of the known or predicted human genes. The probes are, after manufacturers description, enriched for cancer relevant genes representing both coding and non-coding sequences on the chromosomes [29]. Experiments using Agilent arrays were performed at the Department of Genetics at The Norwegian Radium Hospital, Oslo, Norway using female human genomic DNA (Promega) as reference and following the Agilent recommended standard protocol (see Additional file 13). The arrays were scanned using an Agilent scanner, data extraction, filtering and normalization were conducted using the Feature Extraction software version A.7.5.1 (Agilent Technologies). The CGHAnalytics program version 3.4.40 (Agilent technologies) was used to export the array CGH data for usage in other analytic programs.
ROMA/NimbleGen platform
The Representational Oligonucleotide Microarray Analysis (ROMA) has been developed at Cold Spring Harbor Laboratory [20]. The procedure is based on representative oligonucleotide probes designed by analyzing Bgl II restriction fragments of the human genome sequence. Approximately 85,000 70-mer probes are randomly combined on a single chip, providing a more or less even distribution across the human genome. ROMA experiments were prepared and analyzed at the Cold Spring Harbor Laboratory, New York, USA using male reference DNA (CHP-SKN-1 = 46, XY male) and following the standard ROMA/NimbleGen protocol (see Additional file 13). Arrays were immediately scanned using an Axon GenePix 4000b scanner (pixel size as set to 5 μm). The GenePix Pro 4.0 software was used for identification and quantification of probe intensities. Measured intensities without background subtraction were used to calculate ratios. ROMA/NimbleGen data were normalized using an intensity-based lowness curve fitting algorithm [30].
Illumina platform
The SNP-CGH experiments were performed using the Infinium™ I assay on the Human-1 Genotyping BeadChip representing 109,365 loci (~109 k) [21,24]. Each allele is represented by two unique beads, having an average of 30-fold redundancy per unique bead. The BeadChips are constructed by attaching 50-mer probes to 3 μm-diameter beads, which are randomly assembled onto the chips containing ~3 μm diameter wells. In addition to the 50-mer probe sequence, a ~30-mer address sequence is present on each bead to allow identification of each bead by decoding [24]http://www.illumina.com/home.ilmn. The Illumina experiments were prepared and analyzed at Uppsala University, Uppsala, Sweden. For the Illumina Human-1 BeadChip (109,365 loci) samples were prepared and processed according to the manufacturer's protocol (see Additional file 13). Signal detection was conducted using the Illumina BeadArray Reader (Infinium I FastScan scanner protocol) while identification of bead positions and raw-data extraction were performed using the BeadScan software. Following data acquisition, data from patient blood samples (of 112 corresponding blood-tumor pairs) were subjected to clustering using the algorithm supplied in the BeadStudio application. These clusters were furthermore applied to the tumor arrays, and manual review of peripheral GenCall (GC) and Cluster Separation (CS) scores was performed. After clustering and QC-review, we extracted the log R-ratios for the tumor data. This ratio results from dividing the normalized R-value (observed) by the expected normalized R-value [21].
Statistical methods and analytical tools
The PCF algorithm
The PCF algorithm is an extension of the Potts filter method described by Winkler et al. [31] and seeks the best possible fit to the data using one or more constant plateaus. Let D = {(xi, yi), i = 1,..., n} be copy number data for one chromosome in one individual, where a = x1 ≤ x2 ≤ ⋯ ≤ xn = b are the locations of the probes along the chromosome and y1,..., yn are the corresponding log-transformed copy number ratios. Then the PCF filtering algorithm computes the solution to the penalized optimization problem
the first term in braces being the goodness of fit and the second term being a penalty proportional to the number of discontinuities (jumps) in the function. The constant λ > 0 controls the trade-off between the two terms. Observe that the transformation (y, λ) → (σy, σ2λ) induces a corresponding transformation of the solution. Letting the penalty coefficient be λ = τσ2 where τ > 0 is a constant and σ2 is the variance of the log ratios, the number of discontinuities or their locations will not be scale dependent. To compare different platforms we select a platform-independent value of τ (say τ = 9) and for any particular chromosome and array let λ = τ where is the estimated variance of the log ratios. The PCF algorithm used in this paper also allows the user to specify a lower limit on the size (number of probes) of a plateau in the piecewise constant function to be determined. To compensate for the platform differences in average probe density, the limit was set to 10 probes for Agilent, 18 for ROMA/NimbleGen and 25 for Illumina.
Cross-platform copy number comparison
Several of the analyses in this paper involve the comparison of copy number measurements across platforms. As the actual measurement probes for one platform differ in number and genomic locations from that of another platform, some assumptions must be made about the copy number ratio between neighboring probes in order to carry out a meaningful comparison. The PCF algorithm provides a useful starting point, as it eliminates (or reduces) through smoothing the random variability owing to the measurement process, while at the same time it fits a piecewise constant regression function to the log ratios which is defined everywhere on the genomic range of the data. Specifically, the PCF solution may be extended to a function defined on the whole range of the data:
where ui = (xi + xi+1)/2. This means that solutions obtained for different platforms (with different probe locations) can be directly compared through interpolations of the PCF curves in identical genomic loci chosen to be identical for all platforms. For the analyses in this paper, the loci were chosen to be uniformly spaced across the whole genomic, with a distance of 1 Mb between neighboring loci (leading to a total of 2936 genomic loci in which the PCF function value was determined by the above interpolation formula). Thus, for every array a vector z of PCF function values of length m = 2936 was found and was used to produce scatterplots comparing different platforms, to cluster the arrays, to compare aberrations across platforms, etc.
Aberration calling
To call aberrations in a tumor, a two-step procedure is applied to the log-transformed copy number ratios. First, a piecewise constant regression function is fitted to the log ratios, using the PCF algorithm described above. Next, a gene is called if the fitted regression value for the gene is above a certain positive threshold T (amplification call) or below -T (deletion call). To validate the results obtained from PCF, we also applied an unrelated aberration calling algorithm called ACE [13] to the data. Overall, the aberration patterns found by the two methods are very similar.
ROC curves
To assess the degree of similarity between the aberration patterns found with different platforms, we consider the ability of each platform to mimic the other two platforms' classification of genomic loci. For any platform and any sample, we first apply the PCF algorithm to the log-transformed copy number ratios to fit a piecewise constant function. Let T be a fixed positive threshold. For 2936 genomic uniformly spaced genomic loci across the genome, we classify the locus as a deletion, if the corresponding PCF value is less than T, and as amplification if the PCF value is larger than T. Otherwise, the genomic locus is classified as normal. This is done for various choices of T, for every one of the 20 samples and for every platform. For any given pair of platforms, we consider the classification based on one platform as the correct (true) classification and calculate an ROC curve for the classification based on the other platform relative to the first one. The points on an ROC curve correspond to different threshold values T for the classification based on the second platform (T ranging from 0 to the maximal PCF value for that platform). Different ROC curves may be produced by varying the threshold used to define the correct classification.
Total least squares regression
To investigate possible differences in scale between the log ratio measurements obtained for the different platforms, we fitted for every pair of platforms regression lines with no intercept to PCF interpolation values, obtained on a regular genomic grid consisting of 2936 loci. Total Least Squares regression was used for this purpose. While an ordinary least squares regression fit takes into account measurements errors in the dependent variable only, TLS also accounts for measurement errors in the independent variable. This leads to an improved estimate of the slope for data in cases, where both platforms are subject to substantial measurement error. Estimates of slopes were found separately for each array, and the median slope is reported together with a robust measure of spread called Interquartile Range (IQR), defined as the difference between the 3rd and 1st quartile of the 20 slopes.
Detection of firestorms
Hicks et al. (2006) observed in some tumors what they referred to as a firestorm in the array CGH profile. A firestorm is the presence of at least one localized region of clustered, relatively narrow peaks of amplification, with each cluster being confined to a single chromosome arm [30]. In order to detect firestorms, we define a mathematical measure called a firestorm index, designed to quantify the degree of instability in a chromosome. As before, let D = {(xi, yi), i = 1,..., n} be the copy number observations for one particular chromosome and let be the piecewise constant fit found by the PCF algorithm described above. Suppose is an estimate of the variance of the data around the true mean. Select the points xi for which and at least one of the values are outside the interval , and denote these points t1 < ... <td. Let a1 < ... <ad be the corresponding log ratios. For a window w spanning s bases of the chromosome (this paper uses s = 35 Mb), define
where N(ω) is the number of turning points (i.e. local minima and maxima) in the sequence {(x, ), xi ∈ ω}. The firestorm index is defined as the maximum of γ(ω) as ω ranges over all windows that cover s bases of the chromosome.
Authors' contributions
LOB: carried out the Agilent experiments together with JA, participated in the analytical method development, performed data analysis, and writing of the manuscript including preparation of the majority of the figures. JA: carried out the Agilent experiments together with LOB, drafted the methods section, and critical reading of the manuscript. FEJ: carried out the Illumina experiments, participated in the discussions and drafted the Illumina part of the methods section in the manuscript. JH carried out the ROMA/NimbleGen experiments, participated in the discussions. HS: provided support and discussion of the Agilent array experiments, and critical revision of the manuscript. LKB: provided support and discussion of the Agilent array experiments, and critical revision of the manuscript. KG: provided support and discussion of the Illumina experiments, and critical revision of the manuscript. BN: provided the breast cancer material and clinical data, and critical revision of the manuscript. VNK: participated in the design of this study, coordinated the Illumina collaboration, and critical revision of the manuscript. KL: participated in the development of the analysis methods, as well as in discussing, writing, and critical reading of the manuscript. ALBD: conceived the study, and participated in its design and coordination, participated in the discussions and critical reading of the manuscript. OCL: participated in the development of the analysis methods, as well as in discussing, writing, and critical reading of the manuscript.
All authors read and approved the final manuscript.
Supplementary Material
Acknowledgments
Acknowledgements
The authors wish to thank Silje H. Nordgard and Grethe I. G. Alnæs for their contributions in the Illumina experiments and for clustering of the Illumina data. Illumina, Infinium, and BeadArray are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
The project has received research funding from The National Programme for Research in Functional Genomics in Norway (FUGE) in The Research Council of Norway and from the Communities Sixth Framework Programme, project: DISMAL, contract no.: LSHC-CT-2005-018911. The publication reflects only the authors views and the Community is not liable for any use that may be made of the information contained therein.
Contributor Information
LO Baumbusch, Email: lars.o.baumbusch@rr-research.no.
J Aarøe, Email: jorgen.aaroe@rr-research.no.
FE Johansen, Email: Fredrik.Ekeberg.Johansen@rr-research.no.
J Hicks, Email: hicks@cshl.edu.
H Sun, Email: hailing_sun@agilent.com.
L Bruhn, Email: laurakay_bruhn@agilent.com.
K Gunderson, Email: KGunderson@illumina.com.
B Naume, Email: Bjorn.Naume@radiumhospitalet.no.
VN Kristensen, Email: vessela@ulrik.uio.no.
K Liestøl, Email: knut@ifi.uio.no.
A-L Børresen-Dale, Email: a.l.borresen-dale@medisin.uio.no.
OC Lingjærde, Email: ole@ifi.uio.no.
References
- Albertson DG, Collins C, McCormick F, Gray JW. Chromosome aberrations in solid tumors. Nat Genet. 2003;34:369–376. doi: 10.1038/ng1215. [DOI] [PubMed] [Google Scholar]
- van Beers EH, Nederlof PM. Array-CGH and breast cancer. Breast Cancer Res. 2006;8:210. doi: 10.1186/bcr1510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20:207–211. doi: 10.1038/2524. [DOI] [PubMed] [Google Scholar]
- Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999;23:41–46. doi: 10.1038/14385. [DOI] [PubMed] [Google Scholar]
- Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet. 2001;29:263–264. doi: 10.1038/ng754. [DOI] [PubMed] [Google Scholar]
- Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer. 1997;20:399–407. doi: 10.1002/(SICI)1098-2264(199712)20:4<399::AID-GCC12>3.0.CO;2-I. [DOI] [PubMed] [Google Scholar]
- Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 2005;33:5914–5923. doi: 10.1093/nar/gki890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–1161. doi: 10.1038/nbt1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shippy R, Sendera TJ, Lockner R, Palaniappan C, Kaysser-Kranich T, Watts G, et al. Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations. BMC Genomics. 2004;5:61. doi: 10.1186/1471-2164-5-61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan PK, Downey TJ, Spitznagel EL, Jr, Xu P, Fu D, Dimitrov DS, et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 2003;31:5676–5684. doi: 10.1093/nar/gkg763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Naume B, Zhao X, Synnestvedt M, Borgen E, Russnes EG, Lingjærde OC, et al. Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Molecular Oncology. 2007;1:160–171. doi: 10.1016/j.molonc.2007.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sørlie T, Wang Y, Xiao C, Johnsen H, Naume B, Samaha RR, et al. Distinct molecular mechanisms underlying clinically relevant subtypes of breast cancer: gene expression analyses across three different platforms. BMC Genomics. 2006;7:127. doi: 10.1186/1471-2164-7-127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lingjærde OC, Baumbusch LO, Liestøl K, Glad IK, Børresen-Dale AL. CGH-Explorer: a program for analysis of array-CGH data. Bioinformatics. 2005;21:821–822. doi: 10.1093/bioinformatics/bti113. [DOI] [PubMed] [Google Scholar]
- Bergamaschi A, Kim YH, Wang P, Sørlie T, Hernandez-Boussard T, Lønning PE, et al. Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer. 2006;45:1033–1040. doi: 10.1002/gcc.20366. [DOI] [PubMed] [Google Scholar]
- Naylor TL, Greshock J, Wang Y, Colligon T, Yu QC, Clemmer V, et al. High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization. Breast Cancer Res. 2005;7:R1186–R1198. doi: 10.1186/bcr1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richard F, Pacyna-Gengelbach M, uns K, Fleige B, Winzer KJ, Szymas J, et al. Patterns of chromosomal imbalances in invasive breast cancer. Int J Cancer. 2000;89:305–310. doi: 10.1002/1097-0215(20000520)89:3<305::AID-IJC15>3.0.CO;2-8. [DOI] [PubMed] [Google Scholar]
- Greshock J, Feng B, Nogueira C, Ivanova E, Perna I, Nathanson K, et al. A comparison of DNA copy number profiling platforms. Cancer Res. 2007;67:10173–10180. doi: 10.1158/0008-5472.CAN-07-2102. [DOI] [PubMed] [Google Scholar]
- Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001;98:10869–10874. doi: 10.1073/pnas.191367098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human cancers. Nature. 1998;396:643–649. doi: 10.1038/25292. [DOI] [PubMed] [Google Scholar]
- Lucito R, Healy J, Alexander J, Reiner A, Esposito D, Chi M, et al. Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. Genome Res. 2003;13:2291–2305. doi: 10.1101/gr.1349003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 2006;16:1136–1148. doi: 10.1101/gr.5402306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL. Resolving the resolution of array CGH. Genomics. 2007;89:647–653. doi: 10.1016/j.ygeno.2006.12.012. [DOI] [PubMed] [Google Scholar]
- Hehir-Kwa JY, Egmont-Petersen M, Janssen IM, Smeets D, van Kessel AG, Veltman JA. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 2007;14:1–11. doi: 10.1093/dnares/dsm002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gunderson KL, Kruglyak S, Graige MS, Garcia F, Kermani BG, Zhao C, et al. Decoding randomly ordered DNA arrays. Genome Res. 2004;14:870–877. doi: 10.1101/gr.2255804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet. 2005;37:549–554. doi: 10.1038/ng1547. [DOI] [PubMed] [Google Scholar]
- Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005;21:3763–3770. doi: 10.1093/bioinformatics/bti611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipson D, Aumann Y, Ben-Dor A, Linial N, Yakhini Z. Efficient calculation of interval scores for DNA copy number data analysis. J Comput Biol. 2006;13:215–228. doi: 10.1089/cmb.2006.13.215. [DOI] [PubMed] [Google Scholar]
- Wiedswang G, Borgen E, Kåresen R, Kvalheim G, Nesland JM, Qvist H, et al. Detection of isolated tumor cells in bone marrow is an independent prognostic factor in breast cancer. J Clin Oncol. 2003;21:3469–3478. doi: 10.1200/JCO.2003.02.009. [DOI] [PubMed] [Google Scholar]
- Barrett MT, Scheffer A, Ben Dor A, Sampas N, Lipson D, Kincaid R, et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci USA. 2004;101:17765–17770. doi: 10.1073/pnas.0407979101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 2006;16:1465–1479. doi: 10.1101/gr.5460106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winkler G, Liebscher V. Smoothers for discontinous signals. J Nonpar Stat. 2002;14:203–222. doi: 10.1080/10485250211388. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.