Abstract
In the 2007 Association of Biomolecular Resource Facilities Microarray Research Group project, we analyzed HL-60 DNA with five platforms: Agilent, Affymetrix 500K, Affymetrix U133 Plus 2.0, Illumina, and RPCI 19K BAC arrays. Copy number variation was analyzed using circular binary segmentation (CBS) analysis of log ratio scores from four independently assessed hybridizations of each platform. Data obtained from these platforms were assessed for reproducibility and the ability to detect formerly reported copy number variations in HL-60. In HL-60, all of the tested platforms detected genomic DNA amplification of the 8q24 locus, trisomy 18, and monosomy X; and deletions at loci 5q11.2~q31, 9p21.3~p22, 10p12~p15, 14q22~q31, and 17p12~p13.3. In the HL-60 genome, at least two of the five platforms detected five novel losses and five novel gains. This report provides guidance in the selection of platforms based on this wide-ranging evaluation of available CGH platforms.
Keywords: arrayCGH, BAC array, HL-60
INTRODUCTION
Comparative genomic hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase chromosome spreads in fluorescent in situ hybridization (FISH) assays. This technology later evolved so that the DNA targets are hybridized to microarrays containing cDNA fragments or bacterial artificial chromosomes (BACs). Recent commercial offerings from Agilent, Affymetrix, and Illumina derive copy number differences using oligonucleotide microarrays representing 500,000 or more loci. In most commercial assays, genomic DNA is labeled and hybridized to microarrays designed for single nucleotide polymorphism (SNP) genotyping analyses. Interestingly, Auer et al.1 have recently shown that expression microarrays, such as the Affymetrix U133 series can also be used to identify copy number differences.
It has become apparent that copy number variants are quite common in the human genome and can have dramatic phenotypic consequences as a result of altering gene dosage, disrupting coding sequences, or perturbing long-range gene regulation. These DNA anomalies are associated with many genetic diseases including congenital anomalies, developmental delay, and mental retardation. As a result, many arrays have been designed to diagnose these DNA alterations as well as to detect gains and losses of tumor suppressor and oncogenes.2,3 Identifying the specific segmental genomic alterations and the genes they contain will yield molecular targets for diagnostics and therapy. For these treatments to be effective, a reliable and accurate identification of genomic alterations associated with a given disease is essential. The following paragraphs identify recent studies that characterize the reliability and accuracy of various CGH technologies.
Huang et al.4 identified multiple regions of amplification and deletion using whole genome sampling analysis (WGSA) on a panel of human breast cancer cell lines. Their WGSA simultaneously genotyped over 10,000 SNPs by allele-specific hybridization to perfect match and mismatch probes synthesized on a single array. With a mean inter-SNP distance of 244 kb, they obtained a resolution primarily attributed to their high-density oligonucleotide array.4
A research group including Wellcome Trust, Affymetrix, the University of Tokyo, and others employed Affymetrix GeneChip Human Mapping 550K Early Access arrays and clone-based CGH on 270 HapMap samples. They identified 1447 copy number variations (CNVs) ranging in size from 960 bp to 3.4 Mb. These CNVs contained hundreds of genes, disease loci, functional elements, and segmental duplications, and provided the framework for the first comprehensive global map of human CNVs.5,6
By hybridizing genomic representations of breast and lung carcinoma cell line and lung tumor DNA to SNP arrays, and measuring locus-specific hybridization intensity, Zhao et al.7 detected both known and novel genomic amplifications and homozygous deletions in these cancer samples. Comparison with BAC and cDNA array analysis showed that the three platforms gave generally comparable results. The BAC arrays showed the highest signal-to-noise ratios, making them better suited to detect single-copy alterations. However, the SNP arrays allow copy number changes and genotype to be measured in the same experiment.7
Recent advances in array-based CGH technology have refined the determination of chromosomal gains and losses. These refinements are dependent on improved arrayCGH performance characteristics that have been evaluated in recent review articles.8–10 Coe et al.9 have defined a “functional resolution” for arrayCGH technology that incorporates the size and uniformity of element spacing on the array as well as the sensitivity of each platform to single-copy alterations. They propose that the detection sensitivity of an array is best described by the probability of detecting any alteration of a given size.
The goal of the 2007 Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) project was to assess the ability of current technologies to detect chromosomal aberrations. For this assessment we selected five CGH platforms, a test genome with a variety of known gains and losses, and analysis software that would facilitate comparison of the resolution of each platform. At the time of the study, the five platforms represented the state of the art for detecting chromosomal aberrations: Agilent CGH 44B Microarray, Illumina HumanHap 550 BeadChip, Affymetrix GeneChip Human Mapping 500K Array Set, a human BAC19K array developed by Roswell Park Cancer Institute, and the Affymetrix Human Genome U133 Plus 2.0 gene expression array. Each platform was assessed on its repeatability between replicates and on detection of the reported gains and losses in the HL-60 cell line compared with reference DNA.
MATERIALS AND METHODS
Genomic DNA was isolated from HL-60 leukemia cells11 and human female normal lung DNA was purchased from the Biochain Institute (Hayward, CA). DNA purity was assessed by measurement of the 260/280-nm absorption ratio. The HL-60 tumor line was derived from the bone marrow cells of a patient with acute myelogenous leukemia.12
We relied on two CGH studies to compile the reference set of HL-60 genetic alterations. To characterize the HL-60 cell line genetic alterations, Ulger et al.13 employed microarrays with 1003 non-overlapping BAC and PAC clones that provided an average resolution of 3 Mb. These investigators detected 10 copy number changes in the HL-60 cell line: amplification at 8q24, trisomy 18, and monosomy X; as well as deletions at loci in 5q11.2~q31, 6q12, 9p21.3~p22, 10p12~p15, 14q22~q31, 16q21, and 17p12~p13.3. A more recent paper by Peiffer et al.,14 which used whole-genome BeadChip microarrays (Illumina, Hayward, CA) to assay up to 317,000 SNP loci, detected 8 of the 10 chromosomal aberrations reported in the Ulger et al.13 research. HL-60 loss at chromosomes 6q12 and 16q21 was not detected by Peiffer et al.14 and therefore not included in our reference set. The nucleotide start-and-stop positions of the 8 reference variants used in this study are listed in Supplemental Material.
Detailed descriptions of each of the selected platforms are provided below and summarized in Table 1. The physical locations for all probes are reported as NCBI Build 35 genome assembly, also referred to as HG17 (http://genome.ucsc.edu). For the Affymetrix (Santa Clara, CA) U133 platform, the physical position is reported as the central nucleotide for a probe set, after performing a reannotation based on probe sequences.1 No location is reported for probe sets with less than 50% homology to Build 35.
TABLE 1.
ArrayCGH Platform Descriptions
| Microarray | No. Probes or Probe Sets | Probe Types | Test Site | Protocol | Number of Hybridizations | Reference | 
|---|---|---|---|---|---|---|
| Custom BAC 19K | 17,006 | PCR from BAC spotted in duplicate | RPCI | Two color | 4 HL-60/lung | Cohybridized lung sample | 
| Agilent 44B | 38,752 | 60-mer oligos 1–2 probes/locus | MSK | Two color | 4 HL-60/Lung | Cohybridized lung sample | 
| Affymetrix 500K | 500,506 | 25mer oligos sets of probes/locus | MSK | One color | 4 HL-60 | 270 HapMap data set | 
| Illumina Hap550 | 555,298 | 50-mer oligos on replicate beads | MSK | One color | 4 HL-60 | 120 HapMap data set | 
| Affymetrix U133 | 51,387 | 25-mer oligos sets of probes/locus | CPH | Modified one color | 4 HL-60 | 4 Lung samples | 
The hybridizations were performed at either Roswell Park Cancer Institute (RPCI), Memorial Sloan-Kettering Institute Genomic Core (MSK), or Columbus Pediatric Hospital (CPH).
BAC 19K Microarrays and Analysis
DNA printing solutions were prepared from sequence-connected Roswell Park Cancer Institute-11 BAC clones (RPCI; Buffalo, New York) by ligation-mediated polymerase chain reaction (PCR) as described previously.15–17 The minimal tiling RPCI BAC array contains ~19,000 BAC clones that were chosen by virtue of their sequence tagged site (STS) content, paired BAC end-sequence, and association with heritable disorders and cancer. The backbone of the array consists of ~4600 BAC clones that were directly mapped to specific, single-chromosomal positions by FISH.15 Each clone is printed in duplicate on amino-silanated glass slides (Schott Nexterion type A+) using a MicroGrid ll TAS arrayer (Genomic Solutions, Ann Arbor, MI). The BAC DNA products have ~80 μm diameter spots with 150 μm center-to-center spacing creating an array of ~39,000 elements. The printed slides dry overnight and are UV-crosslinked (350 mJ) in a Stratalinker 2400 (Stratagene, La Jolla, CA) immediately before hybridization. A complete list of the RPCI-11 BAC clones spotted on the 19K microarray can be found at http://microarrays.roswellpark.org.
Genomic DNA (1 μg) was fluorescently labeled using random primers and Klenow in the BioArray CGH Labeling System (Enzo Life Sciences, Farmingdale, NY). The four HL-60 targets were labeled with Cy3-modified nucleotides. The four normal lung targets were labeled with Cy5-modified nucleotides. Prior to hybridization, each HL-60 target was coprecipitated with each normal lung target in the presence of 100-μg human fluorimetric Cot-1 DNA (Invitrogen, Carlsbad, CA) to block repetitive elements. The four targets were hybridized to four 19K BAC microarrays in SlideHyb Buffer #3 (Ambion, Austin, TX) including yeast tRNA (Invitrogen). Hybridization and washing occurred in a GeneMachine hybridization station (Genomic Solutions) as described.18 The slides were scanned immediately at 5-μm resolution using a GenePix 4200AL laser scanner (Molecular Devices, Sunnyvale, CA) in both Cy3 (HL-60) and Cy5 (normal lung) channels.
Image analysis was performed using the ImaGene version 7.0.1 software (BioDiscovery, El Segundo, CA). Low-intensity and low-quality spots were flagged and excluded from further processing. Copy number was estimated using the log2 ratios of the HL-60/normal lung signals which were normalized using a subgrid LoESS procedure, with the clones on the sex chromosome given a weight of 0. Replicate values on the same microarray were averaged. Regions of segmental duplications and regions of large scale variation in the human genome were identified as previously described.19–22
Affymetrix U133 Microarrays and Analysis
The protocol for CGH analysis using gene expression microarrays has been described elsewhere.1 Briefly, four HL-60 targets and four normal targets were generated by DNaseI fragmentation of genomic DNA (10 μg) followed by biotin labeling using terminal transferase. The eight targets were individually hybridized to eight GeneChip® Human Genome U133 Plus 2.0 arrays (Affymetrix). The hybridized arrays were washed, stained with streptavidin phycoerythin and scanned for fluorescent signals using standard gene expression protocols (Affymetrix). Signals for individual probe sets were calculated using the robust multiarray average algorithm.23 After log2 transformation, the median signal for each set of four replicates was determined. Copy number was estimated from the difference between the individual replicate log2 HL-60 signals and the median log2 normal lung signal, which is equivalent to the HL-60/normal lung ratio. Statistical significance was defined as p < 0.05 using a two-tailed test assuming equal variance.
Affymetrix 500K Microarrays and Analysis
The Affymetrix GeneChip Human Mapping 500K Array Set is comprised of two arrays, each capable of genotyping on average 250,000 SNPs (approximately 262,000 for Nsp arrays and 238,000 for Sty arrays). The probe sequences are based on restriction enzyme digests. The resulting arrays have a nonuniform probe distribution with a median physical distance between SNPs of 2.5 kb and an average distance between SNPs of 5.8 kb. Eighty-five percent of the human genome is within 10 kb of an SNP. The assay requires a sample size of 250 ng of raw genomic DNA. Four replicates of the HL-60 sample were hybridized to individual arrays. The platform-specific analysis relied on Partek Genomics Suite software (http://www.affymetrix.com/products/software/compatible/index.affx).
For the CBS analysis, a random set of 40 samples taken from the HapMap cohort were used as a reference. Copy number was estimated from log2 ratios which were calculated for each HL-60 replicate relative to the mean of the HapMap controls. All initial processing including normalization and construction of ratios was performed in Genotyping Console v. 2.1 (Affymetrix) using default parameters for the 500K microarray.
Agilent 44B Microarray and Analysis
Agilent Technologies has developed a 44B arrayCGH platform with 60-mer oligonucleotide probes synthesized in situ using inkjet technology. It includes 40,000 probes that span the human genome with an average spatial resolution of approximately 75 kb, including coding and noncoding sequences. It includes one probe per gene for refseq and genbank known genes and three probes for each of approximately 1100 known cancer genes of importance. This platform requires 25 ng of total genomic DNA to detect chromosomal changes across the entire genome.
Agilent Human Genome CGH Microarray 44B
Three pairs of HL-60 and normal lung targets were prepared from 0.3 μg of input DNA, while the fourth pair was prepared on a separate day from 3.0 μg of DNA. Copy number was estimated using normalized log ratios for each replicate and these were obtained from Agilent CGH Analytics software using the aberration detection method 1 (ADM-1 or “adamone”) algorithm.
ADM-1 is an aberration algorithm that identifies all aberrant intervals in a given sample with consistently high or low log ratios based on the statistical score. The ADM algorithms search for intervals in which the average log ratio of the sample and reference channels exceeds a user-specified threshold. In contrast to the Z-score algorithm, the ADM algorithms do not rely upon a set window size, instead sampling adjacent probes to arrive at a robust estimation of the true range of the aberrant segment. The output differs from that of the CBS algorithm by reporting statistically significant aberrant regions, allowing rapid genomic assessment. The ADM-1 statistical score is computed as the average normalized log ratios of all probes in the genomic interval multiplied by the square root of the number of these probes. It represents the deviation of the average of the normalized log ratios from its expected value of zero. The ADM-1 score is proportional to the height h (absolute average log ratio) of the genomic interval, and to the square root of the number of probes in the interval. Roughly, for an interval to have a high ADM-1 score, it should have high height and/or include a large number of probes.
Illumina
The Illumina Infinium Whole-Genome Genotyping assay provides high-resolution profiling of both loss of heterozygosity and DNA copy number changes. This technology relies on BeadChips that are constructed by random assembly of bead pools into microwell patterned stripes on a silicon substrate. Each stripe is loaded with a unique bead pool composed of tens of thousands of different bead types for a total complexity of hundreds of thousands of bead types across the BeadChip. Processed measurements from these BeadChips provide normalized intensity and allelic ratio for each SNP. The platform performs a parallel assay of approximately 550K locations in the genome. Similar to the Affymetrix platform, a single sample is hybridized to each array. Four replicates of the HL-60 sample were hybridized to individual arrays. Copy number was estimated using the log2 ratio of each hybridization relative to a control dataset included in the Illumina BeadStudio software. This software was utilized for all normalization and ratio construction for the four Illumina replicates using manufacturer recommended settings.
CBS Analysis of All Platforms
Software has been developed by many groups for data visualization and statistical analysis as a result in the increase in arrayCGH applications and the diversity of platform technologies. A recent article has identified the breadth of this software development2 and another has evaluated several analytical techniques.24 Lai et al. compared 11 different algorithms for analyzing arrayCGH data including both segment detection methods and smoothing methods.24 They found that some segmentation methods, especially CGHseg25 and circular binary segmentation (CBS),26 appear to perform consistently well.24 For consistency in comparisons, we chose CBS to identify gains and losses from all of our selected platforms.
All CBS analyses were performed using the DNA copy package in R/BioConductor (DNA copy library). Log ratios were used as input and each replicate on each platform was run independently. All runs were implemented with the default CBS settings. The final set of segments, segment scores (smoothed log ratios for the segment), and associated markers were output from each run. Copy number calls (gain or loss) were assigned to a segment after applying a threshold to each segment (described below).
Reproducibility of CBS segment score across the four replicates summarized in Figure 1 was examined in the following fashion. Platform-to-platform variation in CBS segment score indicating a potential variant was normalized across platforms (by scaling). Scaled CBS segment scores were then mapped to each marker associated with the segment. The mapped scores were then averaged across four replicates for each marker and treated as the “fitted” value. Root mean squared error (RMSE) was computed for each individual segment from each array by comparing the segment score relative to the fitted value for each marker in the segment. Log10(RMSE for a segment) was then plotted versus the log10(# of markers in the segment). As this creates thousands of points for the Illumina and Affymetrix array systems, we then implemented a lowess smoothing procedure per platform to visualize the results.
FIGURE 1.
Repeatability of circular binary segmentation (CBS) segment score. Segment scores for individual hybridizations of the same array-CGH platform were derived from common CBS analysis and then scaled as described in Materials and Methods. Root mean squared error (RMSE) values for each segment are plotted as a function of either the number of markers per segment (A) or the length of the segment (B). All values are plotted on log10 scale.
To determine optimal thresholds for terming a segment as a putative variant, we examined plots of CBS segment scores relative to segment size as well as cumulative distributional plots of CBS segment scores (see Supplemental Material). Change point analysis of ordered CBS segment scores related to loss was relatively straightforward for all platforms. However, for gains, change point analysis did not have the same clarity. Some platforms (such as Agilent 44B) did not have an obvious threshold for gains. Therefore, assuming integer gains and losses and using a simple model of segment score = log2 (d + Bg)/(2 + Bg), where Bg stands for general background and d represents either the smallest loss (1 copy) or a gain (3 copies) relative to typical (2 copies), then if lt is the threshold for loss, then gt = log2 (2 – 2lt) is the threshold for gain. This simple model was used for Agilent 44B, Affymetrix U133 Plus 2.0, and Affymetrix 500K. However, the data strongly suggested different gain thresholds for BAC and Illumina arrays. The final thresholds chosen as well as CBS segment scores relative to segment length are shown in Supplemental Material.
The value for each replicate in a reference region was assigned by calculating the mean of segments that surpassed the threshold, weighted by their segment length (number of markers in the segment). This value was retained along with the percentage of the region that is covered after applying the threshold. Both measures are presented in Figure 2.
FIGURE 2.
Reference region using circular binary segmentation (CBS) analysis. Reference regions of copy number variant (CNV) in HL-60 cells were examined using CBS analysis of log ratio scores from four replicate hybridizations of each platform in the study. Thresholds for identifying CNV are described in Materials and Methods. The plot on the left illustrates the weighted segment mean (weighted by segment length) of all CBS segments per region per chip surpassing thresholds indicative of CNV. If no segments passed the threshold for a given hybridization in a given region, then that hybridization was omitted. This is indicated by the absence of a dot in the far right portion of the graph. Each dot represents a different array. Dots in the same relative position and the same color come from the same array. Colored lines in the left panel indicate the range of weighted CBS segment scores across the hybridizations that surpassed the threshold. Negative weighted scores indicate a loss while positive scores indicate a gain. CBS scores for region 8q24 were truncated at 1 for many platforms. The plot on the right indicates the percentage of filtered segments that map to the reference region for each replicate. Colored lines in the right panel represent the range in the proportion of filtered segments that map to the reference region for each replicate.
RESULTS
The data files generated from CGH consist of log ratios of signal intensities from disease or selected DNA compared with a control DNA, indexed by physical location of each probe on the array. The goal is to detect discontinuities in signal intensities that represent break points in copy number of physically contiguous genomic regions. In our study, five platforms were assessed for repeatability of the log-ratio signal and CNV detection. The probe annotation on all platforms was standardized to sequence information in NCBI Build 35 / HG17 [http://genome.ucsc.edu].
The CGH platforms were selected to represent both traditional microarrays containing large BAC clones and more recent high-density oligonucleotide microarrays (Table 1). Most of the oligonucleotide arrayCGH platforms were designed for DNA analysis. However, we included a novel approach that calculates DNA copy number using microarrays designed for RNA expression profiling.1 Genomic DNA from HL-60 leukemia cells11,27 and normal human female lung (Biochain Institute) was distributed to three sites for replicate CGH analyses on five different platforms. Each site relied on platform-specific protocols for target preparation, hybridization, quality assessment, and image analysis methods.
The number of hybridizations and the source of normal reference data varied between platforms (Table 1). Two platforms used traditional (two-color) CGH methods, where both HL-60 and normal lung targets were hybridized to the same microarray. Three platforms hybridized HL-60 or lung targets to separate (one-color) microarrays. For the Illumina and Affymetrix genotyping microarrays, gains and losses were evaluated by comparing HL-60 signals with reference datasets provided by the manufacturers (see Materials and Methods). For the novel approach using Affymetrix gene expression microarrays, HL-60 signals were compared with a reference value based on the median of four hybridizations from normal human DNA.
Repeatability Using Platform-Specific Analysis
We initially compared the repeatability of the five array-CGH platforms by calculating standard deviations in the HL-60 ratios for each probe on the microarrays across the four replicate hybridizations (Table 2). The BAC technology shows approximately a three- to fivefold less median standard deviation than the oligonucleotide arrayCGH platforms. However, when the standard deviation is normalized by the square root of the number of measurements (a proxy for a type of standard error of the mean under certain assumptions), then all of the platforms had similar levels of variation per probe.
TABLE 2.
Repeatability of HL-60 Ratio Using Platform-Specific Analyses
| ArrayCGH Platform | No. Probes or Probe Sets (N) | Median SD | Median SD/SR N | 
|---|---|---|---|
| Custom BAC 19K | 17,006 | 0.019 | 1.467E-04 | 
| Agilent 44B | 38,752 | 0.083 | 4.238E-04 | 
| Affymetrix 500K | 500,506 | 0.101 | 0.995E-04 | 
| Illumina Hap550 | 555,298 | 0.059 | 0.798E-04 | 
| Affymetrix U133 | 51,387 | 0.073 | 3.212E-04 | 
Reproducibility is measured as the standard deviation (SD) between four replicate hybridizations The median deviation across all probes on the microarray is presented, along with a normalized median divided by the square root (SR) of the number (N) of probes on the microarray.
CNV Detection using Platform-Specific Analysis
Next we determined the ability of the five arrayCGH platforms to detect a reference set of HL-60 gains and losses. This evaluation used circular binary segmentation (CBS) as well as many manufacturer-recommended analysis methods for each platform (see Materials and Methods). The Supplemental Material contains chromosome depictions for some of the arrayCGH platforms. With the exception of the gene expression platform, individual hybridizations were evaluated separately. Previous CGH analyses of the HL-60 cell line detected monosomy X, trisomy 18, five regions of genetic loss, and one region of amplification in 8q24.13,14,28 Each of these reference changes were detected in all array-CGH platforms by platform-specific and CBS analyses (data not shown). In addition, the lower density microarrays reported further chromosome alterations (see Supplemental Material) that were sometimes detected by a single platform. The absence of additional changes in the high-density platforms using the platform-specific analysis methods may be related to the stringent thresholds and/or the data filtration processes (see Materials and Methods). For example, the Genomic Suite (Partek) software used for Affymetrix 500K analysis included repeated data smoothing and was set to exclude copy number variations less than 0.5 Mb.
Repeatability Using CBS Analysis
While the platform-specific analyses are tailored for individual platforms, the differences in data filtering and smoothing approaches limited the extent of our arrayCGH comparison. Therefore, we also compared arrayCGH results derived from a common analysis method, CBS.26 Based on signal ratios of HL-60 to either reference data or cohybridized normal lung DNA, the CBS algorithm examines regions across multiple consecutive probes and divides chromosomes into contiguous segments with similar copy number ratios. The score for each chromosome segment reflects both the direction and the magnitude of any copy number change. Chromosome regions with no copy number changes have a CBS score near 0. Negative scores reflect chromosome loss and positive scores indicate gain.
For each hybridization in the five arrayCGH platforms, CBS scores and chromosome segments were calculated using the R/BioConductor software (DNA copy). The numbers of CBS segments per hybridization are listed in Table 3 and the distributions of segment sizes are presented in Figure 1B. Individual hybridizations using the same arrayCGH platform generated a similar number of chromosome regions. Interestingly, the number of probes on the arrayCGH platform did not predict the number of chromosome segments. Low-density platforms divided the genome into larger chromosome segments. For example, the BAC and Agilent platforms lacked any chromosome segments less than 100,000 kb, while the segment size for other platforms varied from 1000 kb to over 100 Mb (Fig. 1B).
TABLE 3.
Repeatability of CBS Chromosome Segment Analysis
| arrayCGH Platform | No. CBS Chromosome Segments | ||||
|---|---|---|---|---|---|
| Replicate 1 | Replicate 2 | Replicate 3 | Replicate 4 | Total | |
| Custom BAC 19K | 316 | 322 | 349 | 320 | 1307 | 
| Agilent 44B | 160 | 120 | 120 | 123 | 523 | 
| Affymetrix 500K | 297 | 305 | 264 | 251 | 1117 | 
| Illumina Hap550 | 2217 | 2135 | 1996 | 2016 | 8364 | 
| Affymetrix U133 | 124 | 126 | 147 | 167 | 564 | 
While probes in the same CBS chromosome segment have the same CBS score, the endpoints of that segment may vary slightly between individual hybridizations. Therefore, we developed a novel RMSE approach to evaluate repeatability of CBS scores between replicates using the same arrayCGH platform (see Materials and Methods). Log10 plots of the RMSE results illustrate the relative variance in segments with the same number of probes (Fig. 1A) or segments of the same length (Fig. 1B). Large chromosome segments and those containing the most probes tended to be less variable. Given the same number of markers on a segment, the BAC arrayCGH platform generally had 10-fold lower RMSE scores indicating the highest repeatability between replicates (Figure 1A). This result may be somewhat expected given that the BAC microarrays had the lowest median standard deviation in the platform-specific analyses. However, all platforms had similar RMSE scores when segments of equal length were considered (Figure 1B).
For each arrayCGH platform, the distribution of CBS scores for all segments identified in the four hybridizations is presented in the Supplemental Material. Interestingly, all platforms reported more negative CBS scores, indicative of chromosome losses, than positive scores. This finding coincides with previous findings of more losses than gains in the HL-60 cells.13,14
CNV Detection Using CBS Analysis
Detection of the reference set of HL-60 alterations was investigated using CBS analysis of log ratios from four technical replicates from each platform. Unique thresholds for defining chromosome loss or gain were developed for each platform based on its distribution of CBS scores as described in Materials and Methods and Supplemental Material.
The value for each replicate in a reference region was assigned by calculating the mean of segments that surpassed the threshold, weighted by their segment length (number of markers in the segment). This value was retained along with the percentage of the region that is covered after applying the threshold. Both measures are presented in Figure 2.
All platforms detected eight of the eight reference gains and losses in most of the individual hybridizations. In addition, for the regions where there is strong segment score agreement, some indicate different levels of the portion of the region affected. For example, for the 5q11.2–5q31 loss, all but one platform indicates that there is approximately a 90% loss in that region. The lone exception is the Affymetrix U133 Plus 2.0 system which indicates a wide range of the region portion having a loss (0%–75%). We see similar behavior for 9p21.3–9p22. However, at 17p12–17p13.3, the Illumina array shows the widest range. In general, the BAC array shows the most repeatability for the eight regions of HL-60 examined when it comes to estimating the portion of the reference region affected, followed closely by the Affymetrix 500K.
Further genetic alterations were detected in some, but not all, of the platforms studied. For example, a review of five gain and five loss events is presented in Supplemental Material, S11 in the CBS Analysis. It is likely that several of these novel CNVs represent true alterations in the HL-60 cell line, given that the identification of these novel regions was conducted in the same way as for the reference regions.
Practical Metrics
There are other platform features to consider when selecting a platform for arrayCGH research. Some practical features are detailed in Table 4. Sample-to-protocol completion times were accomplished within 2 days for most platforms. Similarly, target labeling prices ranged from $125 to $180 per sample (varied by a factor of 1.4) depending on an institution’s pricing schedule. one feature that may distinguish these platforms is the volume of resulting data generated from one sample and the capacity for each laboratory to process the data into copy number values. The highest density microarrays, Illumina Hap550 and Affymetrix 500K, each contain more than 500,000 probes resulting in relatively large data files to be used in downstream analysis.
TABLE 4.
Practical Considerations
| Platform and Microarray | Price/Microarray | Price/Target Labeling | Process Time (d) | 
|---|---|---|---|
| RPCI BAC19K | 1.0 | 1.2 | 1 | 
| Agilent 44K | 1.5 | 1.2 | 2 | 
| Affymetrix 500K | 1.7 | 0.8 | 2 | 
| Affymetrix U133 | 2.7 | 0.8 | 2 | 
| Illumina 550K | 2.7 | 1.0 | 2 | 
Price per array and price per target labeling are based on X and 1.5X, as the cost of these steps varies according to an institution’s price schedule provided their respective vendors.
DISCUSSION
This paper is the first to compare CGH results from a BAC and three genotyping oligonucleotide microarray platforms. It also includes a novel arrayCGH platform using an RNA expression microarray to detect DNA copy number variation.1 Using both platform-specific analyses and a common CBS approach, we found that all five of the arrayCGH platforms detected 100% of the eight previously reported CNVs in almost all replicates. our results suggest that, at this level of resolution, the selection of an arrayCGH platform may depend more on practical considerations such as price than on a substantial difference in technical performance.
CGH analysis methods are still evolving and are often optimized to particular platform features or applications. In order to provide a less biased review, the CGH results for each platform in this study were analyzed by two different approaches. The platform-specific analyses detected all of the eight reference regions in all of the platforms. Likewise, the common CBS analysis method detected the eight reference regions. It should be noted that the common analysis method was based on the well-characterized CBS algorithm26 and included unique detection thresholds for each platform, but has not been optimized or repeatedly validated. The same default software settings were used for both oligonucleotide and BAC platforms, which resulted in approximately 2000 segments per Illumina Hap550 hybridization, but less than 350 segments per hybridization with the other arrayCGH platforms, including the Affymetrix 500K microarray which has a similar probe density as the Illumina microarray.
Unlike microarray-based expression profiling, most arrayCGH studies do not include replicate hybridizations of the same DNA sample. Therefore, the repeatability of the individual probe or overall analysis results is a key consideration during platform selection. This study includes four replicate hybridizations for each platform and the reproducibility of the results was confirmed at multiple levels. As shown in previous research,29 the BAC platform showed the smallest level of probe variation when the overall standard deviation (Table 2) or RMSE values (Fig. 1A) were considered. However, all platforms showed a similar level of variation per probe when normalized to the number of measures on each platform (Table 2 and Fig. 1B). Detection of reference CNV regions and the number of CBS segments per hybridization (Table 3) were generally consistent between replicates on all platforms, although the weighted segment scores and region detected percentages derived from the common CBS analysis sometimes fluctuated (Fig. 2). overall, these results are encouraging for the use of single hybridizations on any of the arrayCGH platforms.
Other arrayCGH studies have relied on male versus female differences in X chromosome probes to model known copy number differences.30 In contrast, we examined normal versus tumor DNA derived from HL-60 cells. This comparison may have been especially challenging given the choice of the HL-60 cell line, which contains multiple CNVs, and because subpopulations in the cell line may result in noninteger changes in copy number.
The capability of arrayCGH to detect changes in genomic regions throughout the genome is dependent on the size and positioning of clones on the array. BAC arrays, with a resolution in the 150-kb range, typically have large segments from 100 to 160 kb whereas array oligonucleotide CGH platforms contain shorter segment sizes in the order of 50 to 100 kb. The high-density oligonucleotide arrays theoretically offer higher resolution, but reportedly may not be as robust as the BAC platform.24
Theoretically, oligonucleotide array platforms should provide improved detection of gains and losses compared with BAC arrays because the spatial resolution is in the 35-kb range and the number of probes per array is increased compared with BAC arrays. Due to their small target size, however, oligonucleotide arrays suffer from poorer signal-to-noise ratios that often results in a significant number of false-positive outliers. Typically 20–50 adjacent oligonucleotides are necessary for a reliable call (i.e., >90%). Thus, identification of regions of CNV requires the use of statistical tools and more complex algorithms. In this report, the evaluation of oligonucleotide arrays, including Agilent, Illumina550K, Affymetrix 500K, and Affymetrix U133 Plus 2.0, and a custom BAC array showed good detection of the previously reported gains and losses for the HL-60 cell line as well as finding a number of novel CNVs.
In the absence of any substantial technical differences, platform selection will also be based on practical considerations and intended application. In this study, novel HL-60 deletions were detected by some, but not all, platforms (see Supplemental Material). This detection difference may be related to platform resolution and/or analysis settings. Therefore, some arrayCGH platforms may be best suited for discovery research, while others are preferable for diagnostic applications. In this study, the gene expression platform1 generated unexpectedly comparable results to other arrayCGH methodologies. This alternative approach may be particularly useful for genomes where genotyping arrays are not yet available or in combined expression-genotype studies where a single set of probes might be useful.
This study was initiated in December 2006 and so utilized microarray products available at that time. Significant advances in the array products and CGH software have sine been made; for example, commercial arrays with more than 1 million probes (or probe sets) are now available. However, the methods and results in this study will be useful in evaluating these new arrayCGH products, as basic microarray hybridization technology remains the same.
Identifying the specific segmental genomic alterations and the genes they contain will enrich our understanding of disease processes and may identify molecular targets for improved diagnosis and subsequent therapeutic strategies. As CGH technologies continue to evolve, it is important to evaluate their performance so that practitioners with limited resources may have guidance in platform selection.
SUPPLEMENTAL MATERIAL
TABLE S 1.
Position of Reference Regions Used in CBS Analysis
| Reference Region | Chromosome | Start Nucleotide | End Nucleotide | Variant Type | 
|---|---|---|---|---|
| 5q11.2–5q31 | 5 | 50500001 | 143100000 | Loss | 
| 8q24 | 8 | 117700001 | 146274826 | Gain | 
| 9p21.3–p22 | 9 | 14100001 | 25500000 | Loss | 
| 10p12–p15 | 10 | 1 | 670000 | Loss | 
| 14q22–q31 | 14 | 48300001 | 88900000 | Loss | 
| 17p12–p13.3 | 17 | 1 | 1590000 | Loss | 
| Trisomy 18 | 18 | 1 | 76117153 | Gain | 
| Monosomy X | X | 1 | 154913754 | Loss | 
TABLE S 2.
Detection of Additional CNV in HL–60 Cells Using Platform-Specific Analysis
| ArrayCGH Platform | Loss 2q | Loss 4p | Loss 9p12~q21 | Loss 16q23 | Loss 16 | Gain 17 | Gain 19 | 
|---|---|---|---|---|---|---|---|
| Custom BAC 19K | + | + | + | + | − | − | − | 
| Agilent 44B | − | − | + | +/− | − | − | − | 
| Affymetrix 500K | − | − | − | − | − | − | − | 
| Illumina Hap550 | − | − | − | − | − | − | − | 
| Affymetrix U133 | − | − | − | − | + | + | + | 
HL-60 CGH Results Using BAC 19K Microarray. Regions with gains and losses in more than one replicate are listed. Red lines mark CBS segments.
HL-60 CGH Results Using Affymetrix 500K Microarray. Regions with statistically significant gains and losses using Partek software are highlighted.
HL-60 CGH Results for Chromosome 9 Using Agilent 44K Microarray. Regions with gains and losses using ADM-1 algorithm are shaded and marked by bottom line.
HL-60 CGH Results for Chromosome 9 Using Illumina Hap550 Microarray.
HL-60 CGH Results for Chromosome 7.
Distribution of CBS Segments for the Affymetrix U133 Microarray.
Distribution of CBS Segments for the Agilent Microarray.
Distribution of CBS Segments for the BAC 19K Microarrays.
Distribution of CBS Segments for the Affymetrix 500K Microarrays.
Distribution of CBS Segments for the Illumina Hap550 Microarrays.
Additional Regions Using Common CBS Analysis.
Acknowledgments
The authors gratefully acknowledge the key assistance of the following individuals: Agnes Viale for processing Agilent, Affymetrix, and Illumina arrays; Herbert Auer for DNA preparation and hybridization to the Affymetrix U133 Plus microarrays; Devin McQuaid and Jeff Conroy at RPCI for their guidance and CGH technical advice; Xiaowen Wang and Tom Downey for assistance with CNV analysis using the Partek software; as well as Anniek De Witte for analysis of Agilent microarrays. We also appreciate financial support provided by ABRF and supplies generously donated by Affymetrix. The research described in this article has been reviewed by the National Health and Environmental Effects Research Laboratory, US Environmental Protection Agency, and approved for publication. Approval does not signify that the contents necessarily reflect the views and the policies of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.
REFERENCES
- 1.Auer H, Newsom DL, Nowak NJ, et al. Gene-resolution analysis of DNA copy number variation using oligonucleotide expression microarrays. BMC Genomics. 2007;8:111. doi: 10.1186/1471-2164-8-111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lockwood W, Chari R, Chi B, Lam W. Recent advances in array comparative genomic hybridization technologies and their applications in human genetics. Eur J Hum Genet. 2005;14:139–148. doi: 10.1038/sj.ejhg.5201531. [DOI] [PubMed] [Google Scholar]
- 3.Stranger BE, Forrest MS, Dunning M, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–853. doi: 10.1126/science.1136678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Huang J, Wei W, Zhang J, et al. Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics. 2004;1:287–299. doi: 10.1186/1479-7364-1-4-287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Komura D, Shen F, Ishikawa S, et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 2006;16:1575–1584. doi: 10.1101/gr.5629106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Redon R, Ishikawa S, Fitch KR, et al. Global variation in copy number in the human genome. Nature. 2006;444(7118):444–454. doi: 10.1038/nature05329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhao X, Li C, Paez JG, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 2004;64:3060–71. doi: 10.1158/0008-5472.can-03-3308. [DOI] [PubMed] [Google Scholar]
- 8.Albertson DG, Pinkel D.Genomic microarrays in human genetic disease and cancer Hum Mol Genet 200312Spec No 2R145–152. [DOI] [PubMed] [Google Scholar]
- 9.Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL. Resolving the resolution of array CGH. Genomics. 2007;89:647–53. doi: 10.1016/j.ygeno.2006.12.012. [DOI] [PubMed] [Google Scholar]
- 10.Davies JJ, Wilson IM, Lam WL. Array CGH technologies and their applications to cancer genomes. Chromosome Res. 2005;13:237–248. doi: 10.1007/s10577-005-2168-x. [DOI] [PubMed] [Google Scholar]
- 11.Okazaki Y, Okuizumi H, Sasaki N, et al. An expanded system of restriction landmark genomic scanning (RLGS Ver. 1.8) Electrophoresis. 1995;16:197–202. doi: 10.1002/elps.1150160134. [DOI] [PubMed] [Google Scholar]
- 12.Birnie GD. The HL60 cell line: a model system for studying human myeloid cell differentiation. Br J Cancer (Suppl) 1988;9:41–45. [PMC free article] [PubMed] [Google Scholar]
- 13.Ulger C, Toruner GA, Alkan M, et al. Comprehensive genome-wide comparison of DNA and RNA level scan using microarray technology for identification of candidate cancer-related genes in the HL-60 cell line. Cancer Genet Cytogenet. 2003;147:28–35. doi: 10.1016/s0165-4608(03)00155-9. [DOI] [PubMed] [Google Scholar]
- 14.Peiffer DA, Le JM, Frank J, et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 2006;16:1136–1148. doi: 10.1101/gr.5402306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nowak NJ, Snijders AM, Conroy JM, Albertson DG.The BAC resource: Tools for array CGH and FISH Haines JL, Bruce R, Korf BR, Morton CC, et al.Current Protocols in Human Genetics NY: Wiley; 2005. Unit 4.13 [DOI] [PubMed] [Google Scholar]
- 16.Nowak NJ, Gaile D, Conroy JM, et al. Genome-wide aberrations in pancreatic adenocarcinoma. Cancer Genet Cytogenet. 2005;161:36–50. doi: 10.1016/j.cancergencyto.2005.01.009. [DOI] [PubMed] [Google Scholar]
- 17.Snijders AM, Nowak N, Segraves R, et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet. 2001;29:263–264. doi: 10.1038/ng754. [DOI] [PubMed] [Google Scholar]
- 18.Miliaras D, Conroy J, Pervana S, Meditskou S, McQuaid D, Nowak N. Karyotypic changes detected by comparative genomic hybridization in a stillborn infant with chorioangioma and liver hemangioma. Birth Defects Res A Clin Mol Teratol. 2007;79:236–241. doi: 10.1002/bdra.20332. [DOI] [PubMed] [Google Scholar]
- 19.Iafrate AJ, Feuk L, Rivera MN, et al. Detection of large-scale variation in the human genome. Nat Genet. 2004;36:949–951. doi: 10.1038/ng1416. [DOI] [PubMed] [Google Scholar]
- 20.Sebat J, Lakshmi B, Troge J, et al. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. doi: 10.1126/science.1098918. [DOI] [PubMed] [Google Scholar]
- 21.Sharp AJ, Locke DP, McGrath SD, et al. Segmental duplications and copy-number variation in the human genome. Am J Hum Genet. 2005;77:78–88. doi: 10.1086/431652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tuzun E, Sharp AJ, Bailey JA, et al. Fine-scale structural variation of the human genome. Nat Genet. 2005;37:727–732. doi: 10.1038/ng1562. [DOI] [PubMed] [Google Scholar]
- 23.Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
- 24.Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005;21:3763–3770. doi: 10.1093/bioinformatics/bti611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Picard F, Robin S, Lavielle M, Vaisse C, Daudin JJ. A statistical approach for array CGH data analysis. BMC Bioinformatics. 2005;6:27. doi: 10.1186/1471-2105-6-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Olshen AB, Venkatraman ES. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5:557–572. doi: 10.1093/biostatistics/kxh008. [DOI] [PubMed] [Google Scholar]
- 27.Gallagher R, Collins S, Trujillo J, et al. Characterization of the continuous, differentiating myeloid cell line (HL-60) from a patient with acute promyelocytic leukemia. Blood. 1979;54:713–733. [PubMed] [Google Scholar]
- 28.Roschke AV, Tonon G, Gehlhaus KS, et al. Karyotypic complexity of the NCI-60 drug-screening panel. Cancer Res. 2003;63:8634–8647. [PubMed] [Google Scholar]
- 29.Wicker N, Carles A, Mills IG, et al. A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH. BMC Genomics. 2007;8:84. doi: 10.1186/1471-2164-8-84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bauters M, Van Esch H, Marynen P, Froyen G. X chromosome array-CGH for the identification of novel X-linked mental retardation genes. Eur J Med Genet. 2005;48:263–275. doi: 10.1016/j.ejmg.2005.04.008. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
TABLE S 1.
Position of Reference Regions Used in CBS Analysis
| Reference Region | Chromosome | Start Nucleotide | End Nucleotide | Variant Type | 
|---|---|---|---|---|
| 5q11.2–5q31 | 5 | 50500001 | 143100000 | Loss | 
| 8q24 | 8 | 117700001 | 146274826 | Gain | 
| 9p21.3–p22 | 9 | 14100001 | 25500000 | Loss | 
| 10p12–p15 | 10 | 1 | 670000 | Loss | 
| 14q22–q31 | 14 | 48300001 | 88900000 | Loss | 
| 17p12–p13.3 | 17 | 1 | 1590000 | Loss | 
| Trisomy 18 | 18 | 1 | 76117153 | Gain | 
| Monosomy X | X | 1 | 154913754 | Loss | 
TABLE S 2.
Detection of Additional CNV in HL–60 Cells Using Platform-Specific Analysis
| ArrayCGH Platform | Loss 2q | Loss 4p | Loss 9p12~q21 | Loss 16q23 | Loss 16 | Gain 17 | Gain 19 | 
|---|---|---|---|---|---|---|---|
| Custom BAC 19K | + | + | + | + | − | − | − | 
| Agilent 44B | − | − | + | +/− | − | − | − | 
| Affymetrix 500K | − | − | − | − | − | − | − | 
| Illumina Hap550 | − | − | − | − | − | − | − | 
| Affymetrix U133 | − | − | − | − | + | + | + | 
HL-60 CGH Results Using BAC 19K Microarray. Regions with gains and losses in more than one replicate are listed. Red lines mark CBS segments.
HL-60 CGH Results Using Affymetrix 500K Microarray. Regions with statistically significant gains and losses using Partek software are highlighted.
HL-60 CGH Results for Chromosome 9 Using Agilent 44K Microarray. Regions with gains and losses using ADM-1 algorithm are shaded and marked by bottom line.
HL-60 CGH Results for Chromosome 9 Using Illumina Hap550 Microarray.
HL-60 CGH Results for Chromosome 7.
Distribution of CBS Segments for the Affymetrix U133 Microarray.
Distribution of CBS Segments for the Agilent Microarray.
Distribution of CBS Segments for the BAC 19K Microarrays.
Distribution of CBS Segments for the Affymetrix 500K Microarrays.
Distribution of CBS Segments for the Illumina Hap550 Microarrays.
Additional Regions Using Common CBS Analysis.


