Model-Based Background Correction (MBCB): R Methods and GUI for Illumina Bead-array Data

Jeffrey D Allen; Min Chen; Yang Xie

doi:10.4172/1948-5956.1000004

. Author manuscript; available in PMC: 2010 May 24.

Published in final edited form as: J Cancer Sci Ther. 2009;1(1):25–27. doi: 10.4172/1948-5956.1000004

Model-Based Background Correction (MBCB): R Methods and GUI for Illumina Bead-array Data

Jeffrey D Allen ^1,², Min Chen ³, Yang Xie ^2,^4,^*

PMCID: PMC2874975 NIHMSID: NIHMS172311 PMID: 20502629

Abstract

Summary

Illumina BeadArray platform (Illumina Inc.) is playing an increasing role in cancer research. MBCB, an R package designed for use on Illumina Bead-Array data, allows for microarray data to be pre-processed through various model-based statistical methods. These model-based background-correction methods have proven to be a significant improvement over the traditional methods provided by Illumina in their BeadStudio software. MBCB accepts the summarized bead-type data; the data can then be normalized and background-corrected in a statistically-efficient manner. When compared to the popular Robust multi-array (RMA) background correction approach and the default, Illumina-provided background-correction method, MBCB has shown to lead to more precise determination of gene expression and better biological interpretation of Illumina BeadArray data. The software developed will facilitate molecular biomedical - especially cancer - research.

Availability

This package will soon be available from Bioconductor. Instructions for use are included with the package.

Keywords: Background correction, Microarray, BeadArray

Introduction

Illumina have produced a novel microarray platform – BeadArray - for use in multiple environments: Gene expression studies, CGH, and SNP, among others. Illumina expression BeadArrays often generate high quality data with relatively low cost and less RNA sample input. These features make BeadArray an increasingly popular microarray platform.

One distinguishing feature of the BeadArray platform is that each array contains thousands of non-specific negative control bead types. These negative control beads offer great potential for controlling background noise.

The background-correction method in BeadStudio software provided by Illumina Inc. did not use negative control beads efficiently. It takes the mean of all negative control beads and subtracts that value from all of the other beads. Unfortunately, this method tends to result in a large number of negative expression values which are typically discarded. In certain cases, more than half of the beads on the chip have been negative when using this method. Some studies (Barnes et al., 2005) have suggested that the pre-processing methods offered by Illumina will actually cause such massive data loss that the raw values should be used instead. However, using only the raw values has also shown to be problematic as significant data attenuation is observed when expression ratios are calculated between two expression- level data sets (Ding et al., 2008).

Ding et al., (2008) suggested an alternative model-based background- correction method to address this problem. Xie et al., (2009) proposed three different statistical methods to estimate the parameters in the model. We developed MBCB – an R package – to take advantage of these new methods. By using R, the package is inherently cross-platform, easily distributable and can be easily integrated into existing R tools.

Description

Input

The user provides MBCB with the summarized bead-type data. This summarization can be obtained through BeadStudio. Essentially, the file summarizes the raw, bead-level data and provides the average intensity and variance of each bead type.

Background-correction

The primary contribution of this package is the ability to background- correct the given data in a statistically-efficient manner. The algorithms used no longer cause massive data attenuation; instead, they lead to more accurate measurement of gene expression levels.

The user can select from a list of background-correction methods:

Maximum likelihood estimation

One of the more accurate methods built. Assuming Gaussian distribution for the noise term, MLE iteratively updates parameter estimates by making use of the non-specific beads on the microarray.

Gamma maximum likelihood estimation

This method is similar to Maximum Likelihood Estimation except that the noise term is assumed to have a Gamma distribution. It is preferred when the distribution of the non-specific negative control data is not symmetric.

Bayesian method

Is possibly more extensible than the others (because it allows for extra prior information), but has not consistently outperformed the other methods, despite being, by far, the most computationally-intensive.

Non-Parametric

Avoids the use of assumptions about the parameters of the model. This is the fastest and one of the most accurate methods.

Robust multi-array average

This method (modeled after the methods found in the Affymetrix package) can be used if the users could not provide negative control information.

Normalization

For convenience’ sake, the package also provides the opportunity to normalize the data, if so desired, using either Quantile- Quantile normalization (from the affy package) or global (median) normalization. Obviously, normalization is not mandatory for this package. Users can also apply their own preferred normalization approach after using MBCB for background-correction.

Output

The files created by this package include the background-corrected data (one file per background-correction method used). In addition, the user is given a file detailing the parameter estimations for each correction method.

Graphical user interface

To ensure ease-of-use among non-technical audiences, we provide a graphical interface through which users can accomplish all that they could with the command-level functions. A screenshot is shown in Figure 1. The user can browse to the data files, select one or multiple model-based background-correction methods from a list, then select normalization method(s). The user can then browse to the location at which they’d like to save the output.

A screenshot of the background-correction options in MBCB.

Discussion

BeadArray technology has a great potential for molecular profiling especially for cancer research; however, because of the analysis issues surrounding the background-correction of the data, the platform has experienced significant data loss/attenuation. These new model-based methods have proven to be more accurate and efficient (Ding et al., 2008; Xie et al., 2009) and the user-friendly R package will simplify the use of this data processing approach.

Because the package is written exclusively in R, it can be used on any Windows or UNIX-based operating system. Many previously-written tools surrounding this research have also been written in R. Due to the open-source nature of most such packages, the tools can easily be combined or modified to meet specific needs.

Acknowledgements

This work was support by NIH UL1 RR024982, NNJ05HD36G, and 1R21DA027592.

Footnotes

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

References

1.Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 2005;33:5914–5923. doi: 10.1093/nar/gki890. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Ding LH, Xie Y, Park S, Xiao G, Story MD. Enhanced identification and biological validation of differential gene expression via Illumina whole-genome expression arrays through the use of the model-based background correction methodology. Nucleic Acids Res. 2008;36:e58. doi: 10.1093/nar/gkn234. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Xie Y, Wang X, Story M. Statistical methods of background correction for Illumina BeadArray data. Bioinformatics. 2009;25:751–757. doi: 10.1093/bioinformatics/btp040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 2005;33:5914–5923. doi: 10.1093/nar/gki890. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Ding LH, Xie Y, Park S, Xiao G, Story MD. Enhanced identification and biological validation of differential gene expression via Illumina whole-genome expression arrays through the use of the model-based background correction methodology. Nucleic Acids Res. 2008;36:e58. doi: 10.1093/nar/gkn234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Xie Y, Wang X, Story M. Statistical methods of background correction for Illumina BeadArray data. Bioinformatics. 2009;25:751–757. doi: 10.1093/bioinformatics/btp040. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Model-Based Background Correction (MBCB): R Methods and GUI for Illumina Bead-array Data

Jeffrey D Allen

Min Chen

Yang Xie

Abstract

Summary

Availability

Introduction

Description

Input

Background-correction

Maximum likelihood estimation

Gamma maximum likelihood estimation

Bayesian method

Non-Parametric

Robust multi-array average

Normalization

Output

Graphical user interface

Figure 1.

Discussion

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Model-Based Background Correction (MBCB): R Methods and GUI for Illumina Bead-array Data

Jeffrey D Allen

Min Chen

Yang Xie

Abstract

Summary

Availability

Introduction

Description

Input

Background-correction

Maximum likelihood estimation

Gamma maximum likelihood estimation

Bayesian method

Non-Parametric

Robust multi-array average

Normalization

Output

Graphical user interface

Figure 1.

Discussion

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases