Abstract
The standard (STD) 5 × 5 hybrid median filter (HMF) was previously described as a nonparametric local backestimator of spatially arrayed microtiter plate (MTP) data. As such, the HMF is a useful tool for mitigating global and sporadic systematic error in MTP data arrays. Presented here is the first known HMF correction of a primary screen suffering from systematic error best described as gradient vectors. Application of the STD 5 × 5 HMF to the primary screen raw data reduced background signal deviation, thereby improving the assay dynamic range and hit confirmation rate. While this HMF can correct gradient vectors, it does not properly correct periodic patterns that may present in other screening campaigns. To address this issue, 1 × 7 median and a row/column 5 × 5 hybrid median filter kernels (1 × 7 MF and RC 5 × 5 HMF) were designed ad hoc, to better fit periodic error patterns. The correction data show periodic error in simulated MTP data arrays is reduced by these alternative filter designs and that multiple corrective filters can be combined in serial operations for progressive reduction of complex error patterns in a MTP data array.
Introduction
Variations in robotic handling, instrumentation, and experimental conditions can contribute to a number of systematic errors in the assay system of primary screens. Depending on the source, the distortions result in patterns consistent over all screening data or vary spatially across plate data arrays. These factors make it difficult to redress the entire sample population with a uniform and unbiased approach while producing the favorable result of optimizing the dynamic range of affected microtiter plates (MTPs). As systematic errors represent spatial deviation of assay metrics over the MTP surface, they can often be characterized using regional statistics and other pattern recognition algorithms.1–9
Interplate systematic error (variation from plate to plate) can result from imprecise execution of the assay schedule with regard to reagent addition, incubation time, and end point. Intraplate systematic error (variation across wells of the same plate) can result from the robotic devices and instrumentation used to address the MTP or from regional environmental differences across the plate during preparation.10–15 Thus, systematic error can be discrete or result from multiple sources such as row bias imparted by a fluid dispenser compounded with edge effects accumulated during a long assay time course. Together these error sources account for much of the systematic distortions observed in primary screening campaigns and epitomize two discrete classes of patterning error: periodic (row/column) and gradient vector (continuous directional sloping) distortion of assay data. Each contribution can be thought of as discretely additive to the total MTP data variation. The resulting variation in data background values can significantly reduce statistical confidence in primary screen hit identification.5,9,15,16
The validation of the 5 × 5 hybrid median filter (HMF) on a 236 441–point primary screen described here represents the first such correction applied and submitted to the public domain (see Materials and Methods). The first step in the data correction process involves identification and classification of the systematic error afflicting the screen.5,9,15 To this end, the experimental primary screen data and mathematically simulated MTP data were profiled using regional statistics based on assay metrics across the MTP surface area. Systematic errors were successfully identified in both the experimental primary screening data and simulated MTPs and classified as gradient vectors, periodic patterns, or a combination of both. Although the 5 × 5 HMF was an effective tool in the primary screen data set tested here, it will not remedy all systematic errors in primary screening campaigns, such as specific row and column bias in assay data. Addressing this issue, alternative median filter (MF) kernels were designed—the 1 × 7 MF and a row/column 5 × 5 HMF (RC 5 × 5 HMF)—and applied respectively to simulated striping and quadrant patterns to test for corrective performance. Finally, a workflow is devised to illustrate serial application of different MFs to correct a simulated MTP data array with a complex systematic error produced by combining gradient vectors and periodic patterns. Each applied filter addressed discrete components of the complex distortion pattern, resulting in progressive improvement of the MTP data dynamic range and background variation. Taken together, these data support the role of median filters as corrective tools for primary screens suffering from systematic error and underscore the flexibility of median filter kernel design for targeting specific distortion patterns revealed by descriptive statistics of MTP raw data. A summary of the correction process and derivation of the hybrid median and non–hybrid median statistics can be found in Supplementary Figures S1 to S3 and in Bushway et al.16
Materials And Methods
Primary screening data correction
The 5 × 5 HMF was applied to mitigate systematic distortions observed in a primary screening campaign using a high-throughput imaging assay for hepatic lipid droplet formation. In total, 236 441 compounds were tested in this cell-based high-content imaging screen performed in a 384-well format (690 MTPs in 384-well plates). A detailed description of the primary screen assay protocol can be found on PubChem under BioAssay ID 1656 (http://www.ncbi.nlm.nih.gov/pcassay?term=1656). Briefly, hepatocytes were seeded in 384-well plates and incubated overnight. Compounds were added followed by addition of oleic acid, and cells were incubated for ~24 h. Cells were fixed and stained with the lipid dye BODIPY 493/503 to visualize lipid droplet formation and with the nuclear dye DAPI (both Invitrogen, Carlsbad, CA). Plates were prepared two to three times per week in batches of up to 120 plates/day. Imaging was done using the Opera QEHS system (PerkinElmer, Waltham, MA) with a 20× 0.45NA air objective. Image analysis was performed using CyteSeer software (Vala Sciences, San Diego, CA; http://www.valasciences.com/) with the “Lipid Droplets” algorithm. The primary assay readout used the “TIILiLm” parameter measuring the integrated intensity of the lipid droplets per cell. Data were deposited, and % inhibition was calculated and normalized to each plate’s positive and negative control wells using CBIS software (Cheminnovation Software, San Diego, CA; http://www.cheminnovation.com/). Statistical evaluation of the MTP data on the plate and screen level was performed in the analytics software Spotfire (TIBCO Software, Palo Alto, CA; http://www.tibco.com/). HMF correction was done in a customized batch process in Matlab (The Mathworks, Natick, MA; http://www.mathworks.com/). The previously described16 STD 5 × 5 HMF was applied to all compound wells and negative control wells (columns 2–24 of the MTP). The positive control wells (column 1 of the MTP) were corrected using a modified 5 × 5 HMF, which was designed to exclude elements belonging to the control group in the estimation of the background as they had inherently extreme values. This special HMF kernel was constructed similarly to the STD HMF (spatial layout seen in Fig. 6B, Suppl. Fig. S1), but with the median of the cross-elements replaced by the median of the elements not belonging to the center, diagonal, or cross in the kernel (i.e., corresponding to the unsampled elements in the STD 5 × 5 HMF). Both uncorrected and HMF-corrected data were deposited to PubChem under BioAssay ID 1656 with “assayMetric_HMF” indicating corrected data columns. Calculations to obtain the Z′ factor and Z factor for the screening data statistics shown in Table 1 are defined elsewhere.17
Fig. 6.
Compound systematic error resulting from multiple sources can be addressed by serial application of appropriate filters for progressive improvement in data array background coefficient of variation (CV) and outlier dynamic range. High-amplitude (white, gray text) and low-amplitude hit locations (black, white text) are mapped (A) along with periodic (discontinuous dark gray stripes) and gradient vector patterns (bold outlined wells, lower half of array) to track location before and after correction. The filter set and process priority for the two-step correction are depicted in B (bottom). Statistical reduction of the compound error by the filter application is tracked in B (top). Note that the decrease in array background CV (bkCV) is concordant with increased dynamic range. The combined error in C was first addressed with the 1 × 7 median filter (MF) correcting the periodic error and a significant contribution of the imposed gradient vector. Residual edge-proximal gradients persisted and were addressed with subsequent application of the 5 × 5 hybrid median filter (HMF) in a secondary correction (D). Array dynamic range and bkCV improve progressively after successive filter implementations (B, top). Row and column binned median values (horizontal tick) and interquartile range (25%–75% of data range; vertical line; IQR) before correction (E) profile the simulated systematic error by subdivision over the array area. After sequential application of the 1 × 7 MF and STD 5 × 5 MF (F), the median and IQR suggest smoothing of the array background, as indicated by IQR convergence around a common median value in all binned subdivisions. Q1 to Q4 represent subsequent 25% grouping of rows or columns (see Materials and Methods).
Table 1.
Primary Screen Statistics for the Hepatocyte Lipid Droplet Assay
| Compounds |
(–) Controls |
(+) Controls |
||||||
|---|---|---|---|---|---|---|---|---|
| % inhibition | Mean | SD | Mean | SD | Mean | SD | Z' | Z |
| Uncorrected | 9.33 | 25.25 | 0 | 13.79 | 100 | 5.32 | 0.43 | –0.01 |
| HMF corrected | –1,15 | 16.67 | 0 | 9.65 | 100 | 5.58 | 0.54 | 0.34 |
HMF, hybrid median filter; SD, standard deviation; (–), negative: (+), positive
Model data arrays
Normal simulated arrays were generated in Microsoft Excel using a random number generator fixed to a narrow range (95–105) yielding a standard deviation of ±5 for the quadrant error simulation and ±15 for all other simulated arrays. Multiplying regions of random number arrays with linear scalars was sufficient to generate array gradients, periodic patterns, and outlier groupings. Low- and high-amplitude outliers were not co-clustered to avoid balancing the median filter rank order (see Suppl. Fig. S4). Charts and spatial maps rendered in two and three dimensions were generated in Microsoft Excel.
Filter design
Median filters were constructed based on median sample size, median component localization, and the systematic error pattern to be addressed. The requirement for synchrony between error pattern and filter pattern dictated the design of the 1 × 7 MF and RC 5 × 5 HMF–corrective filters targeting periodic error patterns. The operation of median and hybrid median filters was previously described, and a spatial rendering of the STD 5 × 5 HMF is shown later in Figure 6B as the secondary correction.16 Briefly, a global median (G) is obtained for the total MTP data set and remains constant throughout the corrections. A hybrid median or median (Mh) is the middle value obtained from the component medians (m1, m2, m3) in the case of the STD 5 × 5 HMF and RC 5 × 5 HMF or single rank-order list (m1) in the case of the 1 × 7 MF (see Suppl. Figs. S1–S3). For each element (n) of the MTP data array, the corrected value (Cn) is expressed as
| (1) |
Statistical calculations
Primary screening “hits” are points of interest nested within a uniform and unresponsive data population and thus analogous to “point noise” or “outliers” in the language of image processing. Furthermore, the row–column dimensions of an MTP array are analogous to an image with equivalent pixel dimensions and organization. For this reason, the terms outliers and hits refer to the same data. Absolute Deviation (AbsDev) was calculated as the mean outlier (AVGhit) distance from the data array background mean (AVGbk), such that
| (2) |
where Lo and Hi refer to outliers (hits) considered to be inhibitors or enhancers of the background signal. As such, the mean distance between outliers (hits) below and above the data array background mean is a measure of the total data range. This metric is a useful diagnostic indicator for outlier amplitude distortion after HMF corrections, which could happen if the HMF filter was overwhelmed by outliers. The dynamic range was calculated by dividing the data array Outlier AbsDev by the background standard deviation (bkSD) devoid of outliers, yielding a multiple of the bkSD such that
| (3) |
The array background CV (bkCV) was calculated in the absence of outliers and is by definition = (bkSD/AVGbk)·100.
384-Well MTP data array profiling with regional statistics
The median statistics of odd and even row and column numbers were obtained to visualize row/column bias in background values. The medians of the first to fourth 25% of columns and rows (Q1-Q4) track spatial trends over the array in the sequentially relevant columns and rows not previously sampled in a quartile grouping. This bins the array values into four horizontal and four vertical regions for assessment. For example, medians for the first 25% grouping (first quartile; Q1) of rows are derived from row numbers 1 to 4, and the column-wise median would reference columns 1 to 6. The medians for the second 25% grouping (second quartile; Q2) are then 5 to 8 for rows and 7 to 12 for columns. This would continue to the fourth 25% grouping, where medians would be obtained from rows 13 to 16 and columns 21 to 24, respectively. The median and interquartile range (IQR) of these regional partitions, or bins, permit visualization of unusual spatial patterns in the data distribution.
Results
STD 5 × 5 HMF application to primary screening data affected by systematic assay distortion
The first application of the STD 5 × 5 HMF to high-content primary screening data was performed at the Conrad Prebys Center for Chemical Genomics at the Sanford-Burnham Medical Research Institute and deposited to the NCBI PubChem Assay database (AID: 1656). This hepatocyte lipid droplet screen was performed on 236 441 compounds and yielded 953 hits as determined from the HMF-corrected data (corresponding to a hit rate of 0.4%). Results of the primary assay readout as shown by percent inhibition versus MTP identifier ( Fig. 1A,B) and count versus binned percent inhibition ( Fig. 1C,D) suggest the data distribution was more tightly focused about the assay background value after HMF correction ( Fig. 1B,D), as shown by the convergence of compound and negative control values (for STD 5 × 5 HMF correction, see Suppl. Fig. S1). Furthermore, the compound values do not trend symmetrically on both sides of the histogram in Figure 1C. For comparison, the binned data distribution of a different primary screen unaffected by systematic error (an image-based high-content G-protein-coupled receptor assay; PubChem AID 2058) is shown (inset Fig. 1C), highlighting a symmetric Gaussian distribution tightly focused around the negative control values. The binned percent inhibition of the uncorrected data shows convergence of positive controls (shown at 100% inhibition) and compound values apparent in histogram bulging to the right and resulted in a hit rate of 3.6%. The hit acceptance criteria of 50% inhibition in MTP well areas containing >300 cells decreased from 8606 hits in the uncorrected data to 953 after HMF treatment. The Z′ and Z factors suggest significant improvement in the primary screen data quality after correction with the STD 5 × 5 HMF (Z′ factor improvement from 0.43 to 0.54; Z factor improvement from −0.01 to 0.34), as shown in Table 1 . In addition, the assay variability decreased after correction, as indicated by the standard deviations in the negative controls (reduced from 13.79 to 9.65) and compounds tested (reduced from 25.25 to 16.67).
Fig. 1.
Visual statistics characterize the overall performance of a primary screen for inhibitors of lipid droplet formation in hepatocytes. Raw data statistics indicating percent inhibition versus microtiter plate (MTP) ID and data point count versus binned percent inhibition (A and C, respectively) show a high frequency of compounds (black) mixing with the values of positive controls (light gray). Negative controls (mid-gray) indicate the range of values expected in the absence of inhibition and the expected value of 99% of the screening data. The 5 × 5 hybrid median filter (HMF) correction focuses the compound values around the negative control cluster, suggesting the correction of systematic error (B, D). For the raw (uncorrected) data (C), the compound population histogram (black) is broadening toward the positive controls (light gray, centered on 100% inhibition), indicating insufficient separation of the compound population from the hits. After HMF correction (D), the compound population histogram is well separated from the positive control population. For comparison, the normal Gaussian distribution of an image-based high-content screen without detectable spatial patterning is shown in the inset in C. HMF correction reduced the hit frequency based on 50% inhibition and 300 cells/well from ~3.5% in the uncorrected data to 0.4%, in line with normal primary screen hit rates.
The MTP spatial patterns identified and HMF corrected in the lipid droplet screen were suggestive of variable assay kinetics resulting from the incubation time and temperature of wells over the MTP area. The patterns manifested as general depressions in assay metrics in central regions of the MTP and mimicked biological inhibition. The systematic error was detected in the raw data using descriptive statistics and MTP heat maps. Row and column trends suggested the lipid mask total integrated intensity was diminished in the central region of many MTPs with respect to the sample population as a whole ( Fig. 2A,C). Importantly, the values described here as “depressed” appear falsely as regions where lipid droplet formation is inhibited and, as a consequence, potential screening hits. Severe systematic error was common in columns 9 to 16 and rows 5 to 12, indicating a continuous downward slope of background values from the MTP periphery to center. Thus, the distortion pattern was classified as a gradient vector—a class of systematic error that can be addressed with the 5 × 5 HMF. Trends observed in the row- and column-wise analysis of the raw data were resolved by application of the correction filter (compare Fig. 2A,C with Fig. 2B,D). Importantly, the systematic error present in the screening population did not affect all plates to the same extent, and some plates were unaffected. Visualization of two MTP raw data arrays as heat maps illustrates an array weakly affected ( Fig. 2E , top) and an array with extreme systematic distortion ( Fig. 2E , bottom). The 5 × 5 HMF correction of both arrays resolves the systematic error present in the corrupt array while making only modest adjustment to the array with negligible distortion (compare Fig. 2E with Fig. 2F). This demonstrates the capacity of the filter to preserve the data integrity of plates unaffected or only weakly affected by systematic distortions (see Suppl. Fig. S6 for larger panel). Taken together, systematic error patterns can be visualized in both the screening population with descriptive statistics and on individual MTPs with the aid of heat maps. This is a critical first step toward the proper selection of a corrective median filter.
Fig. 2.
Profiling raw primary screen assay metrics with visual statistics highlights systematic error in the data set. On the x-axis, row- and column-wise data bins (slightly jittered for display) describe the assay performance of positive controls (light gray), negative controls (mid-gray), and compound values (black) as a function of % inhibition based on the total integrated intensity of the lipid mask shown on the y-axis (A, C). A bell-shaped curve in rows and columns suggests a general depression in background values best described as a vector gradient of decreasing values (corresponding to increasing % inhibition) from the microtiter plate (MTP) periphery to center. After correction with the 5 × 5 hybrid median filter (HMF), the row and column gradients are resolved (B, D). MTP heat maps of a plate weakly affected (E, top) and strongly affected (E, bottom) by systematic error were taken as examples from the set of primary screen arrays to illustrate that all MTPs were not affected equally. The MTP-corrected values show the weakly affected plate retains data integrity (F, top), and the extreme gradient is removed from the strongly affected plate (F, bottom). Note that the heat map (E–F) range is reduced to four grayscales for pattern clarity and display purposes and as a consequence bins the data into broad quantiles. Visually, this overrepresents the number of wells with 100% inhibition (hits; black).
The confirmation screening showed 75% conservation of hits from the HMF-corrected data (data available on PubChem, Bioassay AID 463183). Addressing more specifically the quality of the hits and the intersection of the 953 HMF-corrected hits with the 8606 uncorrected hits, the hit frequency was mapped to a 384 MTP coordinate map ( Fig. 3A−C). The HMF-corrected hits were further qualified by a minimum cell number to avoid issues with cell toxicity, and some hits were flagged due to fluid dispense failures (discussed below). The uncorrected data hits accumulated in central regions of the primary screen MTPs ( Fig. 3A ), whereas the HMF-corrected data appeared as a near-random arrangement of hit frequencies over the MTP surface ( Fig. 3B ). The hit intersection of the HMF-corrected and uncorrected data indicated 95.0% agreement (953 total or 804 qualified hits) with only 5.0% of hits (52 total or 41 qualified hits) unique to the HMF-corrected data ( Fig. 3D ). HMF-corrected unique hit locations showed a 41% confirmation rate and mapped largely to the MTP periphery ( Fig. 3C ). Interestingly, MTP peripheral areas are more responsive to rapid temperature change as compared to central regions where cell metabolism kinetics contributed to the majority of observed systematic error. Thus, the overwhelming number of robust hits clustering in central MTP regions of the uncorrected data appears to have displaced more subtle hits located at the MTP periphery to a data range below the hit threshold. Further improving the data quality, the HMF correction resolved an unusual hit accumulation (56 hits) at MTP well location M14 (row 13, column 14) measuring >33 standard deviations from the mean MTP hit frequency (see Suppl. Fig. 7 and Fig. 3B ). Consequently, hits from point M14 were removed from the HMF-corrected data shown in Figure 3B (953 hits – 56 artifacts = 897 hits) because (1) it was a statistically separable artifact, and (2) the hit frequency at M14 dwarfed the representation of all remaining MTP coordinates (Suppl. Fig. S7). The uncorrected data hit accumulation at M14 was only 2.5 standard deviations from the mean hit accumulations in MTP central regions ( Fig. 3A ). This aberrant hit accumulation resulted from liquid aspirate/dispense failures at one position on the fluid handler’s 384-pin manifold used to dispense the assay substrate. Fifty-three of 56 total hits located at position M14 in the HMF-corrected MTP data were subjected to further testing, resulting in only two confirmations—a number identical to the average 75% confirmation rate for any single well in the HMF-corrected data. In contrast, if the same criteria are observed for the uncorrected data, the mean theoretical expected 75% confirmation is about 9 hits per well (for rationale, see Suppl. Fig. S7). These calculations suggest the uncorrected hit data confirmation rate is less than 75%. The HMF-corrected and uncorrected data relating to this screen can be retrieved at http://www.ncbi.nlm.nih.gov/pcassay?term=1656. The entire MTP collection was corrected equally with a version of the HMF standalone application optimized for batch processing.16 The standalone program for single-plate correction can be found at http://bccg.burnham.org/HTS/HMF_Download_Page.aspx.
Fig. 3.
Hit profiling and intersection of uncorrected and hybrid median filter (HMF)–corrected data. The hit frequency per microtiter plate (MTP) well location is described in 3D MTP surface maps (A−C). The uncorrected data hits accumulate in central MTP regions (A), whereas HMF-corrected hit data adopt a more random arrangement across the MTP surface (B). Note that the 56 hit artifacts located at MTP position M14 were identified and removed (953 hits – 56 artifacts = 897 hits) to better display the data range of the HMF-corrected hits (see Suppl. Fig. S7 for comparison). The 5.0% or 41 unique hits from the HMF-corrected data trend toward the MTP periphery as a consequence of the uncorrected hit bias toward central regions of the MTP for the uncorrected data hits (C). A Venn diagram describes the 95.0% hit intersection of the HMF-corrected and uncorrected data (gray). The 7705 hits unique to the uncorrected data are shown in the large open circle, and the 52 hits unique (41 qualified hits) to the HMF-corrected data are shown just right of the gray area as an open circle associated with the HMF-corrected hits. The hit areas are drawn to scale.
Application of a 1D median filter for discontinuous periodic striping patterns
Periodic error patterns are also common in primary screen raw data arrays and cannot be addressed with the 5 × 5 HMF, so a simulated error model was developed to test the capacity of alternative median filter designs to correct periodic spatial patterns. Simulated row-wise periodic error was imposed with frequency (2n + 1), where n is a whole-number series [0−7] indicating affected odd-numbered rows. Importantly, the row-wise error was made discontinuous with respect to its coverage spanning the MTP column series, thus challenging the responsiveness of a given filter to localized value changes. A wide array of filter designs was tested and a (1D) filter kernel composed of seven sample elements appeared to be the most robust tool for addressing vertical or horizontal “striping” patterns, partly because single-dimension filters are insensitive to period frequency. Consequently, a 1 × 7 MF (row × column; Fig. 4B , bottom) was applied to the row-wise “striping” pattern (for correction process, see Suppl. Fig. S2). The spatial map outlining error and hit locations ( Fig. 4A ) shows significant improvement after treatment with the 1 × 7 MF ( Fig. 4C,D). Furthermore, the data array coefficient of variation (CV) decreased from 32.0% in the raw data array to 10.6% after correction with the 1 × 7 MF ( Fig. 4B, top). The dynamic range of the raw data array was improved by 3.1-fold after correction with the 1 × 7 MF. Some correction artifacts are present at border regions of the “striping” pattern and indicated by black arrows ( Fig. 4C,D; for filter corruption notes, see Suppl. Figs. S2 and S5). Regional median values and IQR of the error model before and after correction were plotted to help define the error and demonstrate its removal ( Fig. 4E,F). The regional statistics from the striping error model show that the spatial distortion resides mainly in odd-numbered rows and is discontinuous, as shown by trends in the column-wise quartiles ( Fig. 4E ; Q1–Q4), where Q3 representing columns 13 to 18 is unaffected by the pattern. After correction, the variance in the IQR is minimized and regional median values are in close agreement over the array area ( Fig. 4F ). The data suggest that the 1 × 7 MF is an effective tool for correction of periodic row-wise striping errors given moderate hit densities (median filter samples containing fewer than two hits [outliers] at peripheral columns and fewer than four in central columns; see Suppl. Fig. S5).
Fig. 4.
Discontinuous periodic error characterized as row or column striping can be addressed with a 1D 1 × N or N × 1 median filter (MF), respectively. A spatial map of the data (A) defines exact locations of high-amplitude (light gray) and low-amplitude (black) hits, as well as periodic depression in background values (dark gray “stripes” with white text). The 1 × 7 MF obtains its median value from a single rank-order list composed of all sampled elements (B, bottom; peripheral elements gray; central element black). The 3D heat map of the raw data array (C) is corrected with the 1 × 7 MF (D) with grayscales representing about 20% of the array dynamic range. Correction artifacts are indicated with black arrows and define points on the data array where outliers were lost or introduced (D). Statistics summarizing the correction efficiency of the 1 × 7 MF (B, top) show a dynamic range of improvement and lower coefficient of variation (CV). Row and column binned median values (horizontal tick) and interquartile range (25%–75% of data range; vertical line; IQR) before correction (E) profile the simulated systematic error by subdivision over the array area. After 1 × 7 MF correction (F), the median and IQR suggest smoothing of the array background as indicated by IQR convergence around a common median value in all binned subdivisions. Q1 to Q4 represent subsequent 25% grouping of rows or columns (see Materials and Methods).
Application of a modified 5 × 5 HMF to correct discontinuous quadrant-based periodic error
Less common than row- or column-wise periodic patterns are gradients over a quadrant of wells, such as when a 96-pin liquid handling manifold is used to address a 384-well plate. Although the effects of consecutive aspirate/dispense operations to a plate quadrant result in a quadrant value gradient, this is a confined and repeated pattern, making it a periodic systematic error. A discontinuous quadrant-based error model was generated to challenge median filter correction of the pattern. The error is shown descriptively with regional median and interquartile ranges ( Fig. 5E ). The median values of even rows are less than odd rows, odd columns, and even columns, suggesting a top to bottom gradient in array quadrants. Furthermore, the median value of odd columns is greater than even columns, suggesting a left to right bias. Together these observations suggest a quadrant value gradient following the well orientation, A1 > A2 > B1 > B2. The quartile grouping of columns (Q1–Q4) shows that the error is not uniform over the MTP array area, as indicated by the bell-shaped curve of median values. Note the row-wise quartiles are by comparison more stable with respect to the IQR and median values. In particular, the IQR of column group Q3 (columns 13–18) suggests that this region of the array is least affected by the simulated error as indicated by the narrow IQR. A modified 5 × 5 HMF was designed to sample the simulated MTP array by quadrant and named the row/column 5 × 5 HMF ( Fig. 5B, bottom; RC 5 × 5 HMF; for correction process, see Suppl. Fig. S3). The simulated error map ( Fig. 5A) defines the error coverage and hit locations for tracking before and after array correction. The 3D maps of the array surface contours show clear mitigation of the simulated quadrant bias with minor deficiencies in the correction indicated by black arrows ( Fig. 5C,D). The RC 5 × 5 HMF reduced the raw data CV from 20.8% to 7% and improved the dynamic range of the simulated array data by threefold ( Fig. 5B , top). An alternative view of the correction shows the stabilization of MTP median values over binned regions and the collapse of the IQR (compare Fig. 5E with Fig. 5F). Collectively, the RC 5 × 5 HMF ( Fig. 5 and Suppl. Fig. S3) and 1 × 7 MF ( Fig. 4 and Suppl. Fig. S2) corrections demonstrate median filters can be designed ad hoc to address discrete systematic errors relating to assay design and equipment used to conduct a high-throughput screen.
Fig. 5.
Quadrant-based systematic error can be introduced by repetitive aspirate/dispense operations of a liquid handler and can be corrected with filters designed to sample on a quadrant basis. A spatial map of the data (A) defines exact locations of high-amplitude (white, gray text, bold outline) and low-amplitude (black, white text) hits. The periodic quadrants are shown in a gradient of white (with black text) to dark gray. The row/column 5 × 5 hybrid median filter (RC 5 × 5 HMF) is shown (B, bottom) with center element (black), lateral elements (light gray), diagonal elements (dark gray), and unsampled elements (white). The RC 5 × 5 HMF correction (D) of the raw data 3D contour (C) leaves behind minor correction artifacts, indicated with black arrows in D. Each scale on the 3D contour represents ~20% of the array dynamic range. Statistics summarizing the efficiency of the RC 5 × 5 HMF correction (B, top) indicate an increase in array dynamic range and a decreased coefficient of variation (CV). Row and column binned median values (horizontal tick) and interquartile range (25%–75% of data range; vertical line; IQR) before correction (E) profile the simulated systematic error by subdivision over the array area. After 1 × 7 MF correction (F), the median and IQR suggest smoothing of the array background as indicated by IQR convergence around a common median value in all binned subdivisions. Q1 to Q4 represent subsequent 25% grouping of rows or columns (see Materials and Methods).
Serial median filter applications mitigate complex systematic error
Previously, the 5 × 5 HMF was shown to address the gradient vectors present in the hepatocyte lipid droplet screen, and the 1 × 7 MF was shown to mitigate simulated systematic error described as row-wise “striping.” To determine if such filters could be used in combination to address complex systematic error caused by multiple error sources, an error model was developed that overlaid the discontinuous striping observed in Figure 4A with an unbalanced gradient vector imposed on the bottom of the simulated array ( Fig. 6A). Regional statistics describe the combined error in the simulated array ( Fig. 6E). The median value of odd rows suggests the row-wise bias with odd and even columns equally affected by the error. The third column grouping (Q3) IQR shows the central region of the array is only weakly affected by the combined error. The bell-shaped curve of the four column groupings (Q1–Q4) suggests the systematic error is more focused at the array periphery, and the fourth row grouping (Q4) shows a depression in its median value, indicating severe systematic error at the array bottom. Taken together, these measures indicate a row-wise systematic error with a gradient vector focused at the bottom left and right corners of the array. The serial correction of combined periodic and gradient vector error applies the filter targeting the periodic error first with any remaining gradient vector corrected second using an appropriate filter ( Fig. 6B, bottom). Addressing the row-wise striping pattern, the 1 × 7 MF is applied first, achieving a result similar to that seen in Figure 4D . In the first applied correction, the 1 × 7 MF reduced the raw data CV of 37.0% to 12.1% and improved the data dynamic range 3.3-fold ( Fig. 6B ; top). The second correction with the STD 5 × 5 HMF removed unresolved gradient vectors at the MTP periphery (3D contour not shown for 1 × 7 MF correction alone). Thus, the secondary correction further reduced the CV to 10.5% and improved the dynamic range 3.8-fold as compared to the uncorrected data. The 3D contours and the median statistics of the serial median filter correction showed significant reduction of the combined error ( Fig. 6C−F), although correction artifacts similar to those seen in Figure 4 appear and are indicated with black arrows ( Fig. 6D; Suppl. Figs. S2 and S5). Collapse of the IQR around the median value of binned MTP regions after serial correction shows that the background values of the array have converged toward a common value (~100; Fig. 6E−F). The correction of combined periodic and gradient vector patterns by serial MF applications suggests that progressive corrections with appropriately designed median filters are a viable approach to mitigating systematic error patterns too complex for a single correction.
Discussion
Primary screen correction
The STD 5 × 5 HMF was used here for the first time to correct systematic error in all MTP data arrays (690 MTPs in 384-well format) of a large primary chemical library screen. This corrective filter was chosen because it was among the most efficient filters tested for the correction of gradient vectors. The 1 × 7 MF and RC 5 × 5 HMF could also improve the data resolution of MTP arrays suffering from gradient vectors but would yield less efficient corrections due to their construction. For example, the 1 × 7 MF samples 2D data in only one dimension and is more susceptible to outlier corruption. The RC 5 × 5 HMF is weighted less about its center element and would result in more conservative estimates of local variations in the MTP array. In either case, the corrections would likely improve data resolution but are less effective than the STD 5 × 5 HMF at addressing the gradient vectors presented in the primary screen MTPs.
A large number of plates in the primary screen presented with gradient vectors of varying degree, but some MTPs did not show measurable systematic error (see Suppl. Fig. S6 for a larger panel of MTPs). Importantly, all MTP data arrays of the screening campaign were HMF corrected ( Fig. 2E−F). As was shown in the Results section ( Fig. 2 ), median filter correction of MTPs unaffected by systematic error did not significantly erode the data array integrity. Consequently, the detection of systematic error in any fraction of the screening population justifies the use of an appropriate median filter for the correction of all MTPs in the screen.
Most impressive in the HMF-corrected data was the reduction in qualified hits from 8606 in the uncorrected data to 953, implicating cost and labor reduction associated with confirmation and evaluation in secondary assays. Furthermore, if the acceptance criteria were made more stringent for the uncorrected data, the hit population could become more biased toward artifacts of small molecules affecting cell survival and signal detection.18 Of additional interest was the improvement in hit distribution from the uncorrected to the HMF-corrected data where the former observed a dense cluster of hits in the MTP central regions and the latter a seemingly random distribution of hits over the MTP surface ( Fig. 3 ). The uncorrected data hit profile suggests possible underestimation or overestimation of hits in the periphery and central MTP regions, respectively. A close analysis of the HMF-corrected hit population permitted further improvement in data quality. The MTP hit per well frequency indicated an unusual accumulation of hits at a single well location, and this hit count was separated with statistical confidence, allowing further refinement of the primary screen confirmation rate. Such hit artifacts would appear as a normal occurrence on a MTP basis but were separable at the level of the hit population in the HMF-corrected data (see Suppl. Fig. S7).
Spatial Pattern Recognition
Not all primary screens will observe a near-random array of chemicals or biomolecules as applied to the assay MTP. For example, small-molecule screens of low diversity are more likely to generate high hit rates and patterns consistent with a structure–activity relationship and are not good candidates for MF correction. This work adopted a method of systematic error pattern recognition based on descriptive statistics from individual MTPs and the MTP population of a primary screen. Pattern recognition of MTP data using descriptive statistics from localized assay metrics has been demonstrated by others and is an effective way to view error trends in primary screening data.3,5–9,15 Secondary to pattern visualization is classification of systematic error patterns as a periodic or gradient vector—an issue critical to proper MF selection. The extreme convolution of the systematic error model shown in Figure 6 illustrates that multiple overlaid spatial patterns can be described by regional MTP statistics, thus not limiting pattern visualization to a discrete error class.
Systematic error continuity
Simulated systematic error models were developed to challenge median filters with breaks in pattern continuity and duration. This feature of the error models addresses an important quality of median filters—sample space. Models presented in Figures 4 and 5 break periodic pattern continuity for variable duration in row/column orientation. The filters selected were appropriately sized to “sense” the value change at “step-edges” created by the irregular pattern breaks. Larger filter kernels (e.g., 1 × 11 MF, Fig. 4 ; RC 9 × 9 HMF, Fig. 5 ) would have a greater tendency to bridge a break in pattern continuity and consequently become less sensitive to a tightly nested region of shifted array values. This is not particular to periodic error patterns as gradient vectors also suffer from conservative selection of filter kernel size (a large kernel filter). In practice, kernel filter row/column dimensions from 20% to 30% of the 384-well MTP row/column sample space provide an adequate balance of sensitivity and resistance to corruption. As indicated by black arrows in Figures 4D and 5D, periodic pattern border regions present difficult correction scenarios. In most cases, the correction was made properly, but on occasion, the correction filter did break down, as shown in well M19 from Figures 3D and 5D. In this case, a low-amplitude outlier (hit) nested at M17 created a bridge to periodic error (depressed row-wise values) resuming at M20, ultimately corrupting the filter matrix bounded by coordinates M16 to M22 (see Suppl. Figs. S3 and S5). The resulting high-amplitude correction of locus M19 is inversely proportional to the value depression returned by the filter kernel for the MTP area bound by M16 to M22, with respect to the global median. For example, if the 1 × 7 MF returns a local median one-half the “ideal” value, then the value at M19 is doubled (equation (1)). Ultimately, the choice of filter size and sample space is a balance between sensitivity to localized trends in array values and resistance to outlier corruption.
Periodic error and process priority in serial median filter application
Serial application of two discrete MFs was required for the best resolution of the multiple patterning errors presented in Figure 6. This experiment outlined three important principles: separation of error contributions, median filter designs targeting the discrete error contributions, and a rational order of correction. Spatial systematic error observed in a primary screen can be classified in two categories: (1) periodic as in row and column bias and (2) gradient vectors as in continuous sloping of data array values. The two categories could present additively to a variable extent or discretely on the same or different MTPs. The devised workflow presented a specific order—1 × 7 MF followed by 5 × 5 HMF—to properly correct the complex error presented in Figure 6. More generally, median filter correction of periodic row, column, and quadrant bias should precede correction of continuous sloping or edge-effects on the array background when both are present. This is because the kernel pattern of the former will not convolute the latter and in most cases will contribute to the correction of vector gradients; however, primary correction of the complex error with a MF designed to target gradient vectors would further convolute the periodic error due to the unusual density of outlier values in rows or columns.
Median filter corrections require a broader test audience
Although MFs are promising data correction tools, they require further scrutiny by a broader audience. The row-column-specific median filters ( Figs. 4−5) have never been challenged by a large primary screen data set. Furthermore, HMF application to 1536-well MTP screening assays with sampling filters scaled appropriately to the area (e.g., STD 5 × 5 scaled to a 9 × 9 HMF) would bring attention to the issue of median filter correction portability across MTP formats. It is expected that MFs would perform similarly in higher density formats, but this is not adequately supported by the data presented in this article. Critical to a broader test audience is the development of software to guide a user through the median filter correction process. Two items of particular interest for future MF applications are (1) an automated guide for error pattern classification and filter matrix selection and (2) streamlined batch processing of large numbers of plates for a naive user of the correction technique. The data here suggest that common statistical tests for data spatial correlations could be used to classify systematic error and navigate a library of established median filters appropriate to the correction. Indeed, median filter selection presents a point of some difficulty to anyone unfamiliar with the correction methodology. In addition, the tolerance of any median filter to outlier corruption can be calculated, and this metric could be useful in an MF correction program dually designed to flag outlier clusters that might foul the performance of a given filter during correction. This work is both a tutorial and a starting point for median filter correction of systematic error in primary screening data, which might bring additional attention to the methodology and promote the development of user-friendly tools for a broader test audience. Ultimately, the results shown here suggest median filters are useful tools for mitigating the impact of systematic errors on spatially arrayed MTP primary screening data.
Supplementary Material
Acknowledgments
This work was funded in part by NIH Screening Grants R03-MH083261 to Patrick McDonough (principal investigator for the hepatic lipid droplet formation assay, Vala Sciences Inc.) and NIH Roadmap Program Grant U54HG005033. We thank members of the Conrad Prebys Center for Chemical Genomics and particularly members of the high-content and high-throughput screening teams for the data derived from their efforts on the primary screen and secondary confirmation. Special thanks to Drs. Mark Mercola and Jeffrey Price, whose support and critical review of the prior manuscript16 inspired further exploration of the median filter correction methodology.
Footnotes
Supplementary material for this article is available on the Journal of Biomolecular Screening Web site at http://jbx.sagepub.com/supplemental.
References
- 1.Kelley BP, Lunn MR, Root DE, Flaherty SP, Martino AM, Stockwell BR. A Flexible Data Analysis Tool for Chemical Genetic Screens. Chem. Biol. 2004;11:1495–1503. doi: 10.1016/j.chembiol.2004.08.026. [DOI] [PubMed] [Google Scholar]
- 2.Makarenkov V, Kevorkov D, Zentilli P, Gagarin A, Malo N, Nadon R. HTS–Corrector: Software for the Statistical Analysis and Correction of Experimental High–Throughput Screening Data. Bioinformatics. 2006;22:1408–1409. doi: 10.1093/bioinformatics/btl126. [DOI] [PubMed] [Google Scholar]
- 3.Makarenkov V, Zentilli P, Kevorkov D, Gagarin A, Malo N, Nadon R. An Efficient Method for the Detection and Elimination of Systematic Error in High–Throughput Screening. Bioinformatics. 2007;23:1648–1657. doi: 10.1093/bioinformatics/btm145. [DOI] [PubMed] [Google Scholar]
- 4.Root DE, Kelley BP. Stockwell B. R.Detecting Spatial Patterns in Biological Array Experiments. J. Biomol. Screen. 2003;8:393–398. doi: 10.1177/1087057103254282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brideau C, Gunter B, Pikounis B, Liaw A. Improved Statistical Methods for Hit Selection in High–Throughput Screening. J. Biomol. Screen. 2003;8:634–647. doi: 10.1177/1087057103258285. [DOI] [PubMed] [Google Scholar]
- 6.Kevorkov D, Makarenkov V. Statistical Analysis of Systematic Errors in High–Throughput Screening. J. Biomol. Screen. 2005;10:557–567. doi: 10.1177/1087057105276989. [DOI] [PubMed] [Google Scholar]
- 7.Gubler H, Hüser J, Gubler H. Methods for Statistical Analysis, Quality Assurance and Management of Primary High–Throughput Screening Data. In: Hüser J, editor. High–Throughput Screening in Drug Discovery: Methods and Principles in Medicinal Chemistry. Vol. 35. Wiley–VCH GmbH: Weinheim; Germany: 2007. [Google Scholar]
- 8.Janzen WP, Bernasconi P, Coma I, Herranz J, Martin J. In: Statistics and Decision Making. In High–Throughput Screening, High Throughput Screening: Methods and Protocols. 2nd ed. Janzen WP, Bernasconi P, editors. Humana; Totowa, NJ: 2009. [Google Scholar]
- 9.Gunter B, Brideau C, Pikounis B, Liaw A. Statistical and Graphical Methods for Quality Control Determination of High–Throughput Screening Data. J. Biomol. Screen. 2003;8:624–633. doi: 10.1177/1087057103258284. [DOI] [PubMed] [Google Scholar]
- 10.Berg M, Undisz K, Thiericke R, Zimmermann P, Moore T, Posten C. Evaluation of Liquid Handling Conditions in Microplates. J. Biomol. Screen. 2001;6:47–56. doi: 10.1177/108705710100600107. [DOI] [PubMed] [Google Scholar]
- 11.Dong H, Ouyang Z, Liu J, Jemal M. The Use of a Dual Dye Photometric Calibration Method to Identify Possible Sample Dilution from an Automated Multichannel Liquid–handling System. Clin. Lab. Med. 2007;27:113–122. doi: 10.1016/j.cll.2007.01.002. [DOI] [PubMed] [Google Scholar]
- 12.Hayashi Y, Matsuda R, Maitani T, Ito K, Nishimura W, Imai K, Maeda M. An Expression of Within–Plate Uncertainty in Sandwich ELISA. J. Pharm. Biomed. Anal. 2004;36:225–229. doi: 10.1016/j.jpba.2004.05.017. [DOI] [PubMed] [Google Scholar]
- 13.Rhode H, Schulze M, Renard S, Zimmermann P, Moore T, Cumme GA, Horn A. An Improved Method for Checking HTS/uHTS Liquid–Handling Systems. J. Biomol. Screen. 2004;9:726–733. doi: 10.1177/1087057104269496. [DOI] [PubMed] [Google Scholar]
- 14.Stone D, Marine S, Majercak J, Ray WJ, Espeseth A, Simon A, Ferrer M. High–Throughput Screening by RNA Interference: Control of Two Distinct Types of Variance. Cell Cycle. 2007;6:898–901. doi: 10.4161/cc.6.8.4184. [DOI] [PubMed] [Google Scholar]
- 15.Parham F, Austin C, Southall N, Huang R, Tice R, Portier C. Dose–Response Modeling of High–Throughput Screening Data. J. Biomol. Screen. 2009;14:1216–1227. doi: 10.1177/1087057109349355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bushway PJ, Azimi B, Heynen–Genel S, Price JH, Mercola M. Hybrid Median Filter Background Estimator for Correcting Distortions in Microtiter Plate Data. Assay Drug Dev. Technol. 2010;8:238–250. doi: 10.1089/adt.2009.0242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang JH, Chung TD, Oldenburg KR. A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays. J. Biomol. Screen. 1999;4:67–73. doi: 10.1177/108705719900400206. [DOI] [PubMed] [Google Scholar]
- 18.Wu X, Sills MA, Zhang JH. Further Comparison of Primary Hit Identification by Different Assay Technologies and Effects of Assay Measurement Variability. J. Biomol. Screen. 2005;10:581–589. doi: 10.1177/1087057105275628. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






