Abstract
Advances in multiplex qRT-PCR have enabled increasingly accurate and robust quantification of RNA, even at lower concentrations, facilitating RNA expression profiling in clinical and environmental samples. Here we describe a data-driven qRT-PCR normalization method, the minimum variance method, and evaluate it on clinically derived Mycobacterium tuberculosis samples with variable transcript detection percentages. For moderate to significant amounts of non-detection (~50%), our minimum variance method consistently produces the lowest false discovery rates compared to commonly used data-driven normalization methods.
Keywords: Multiplex qRT-PCR, Data-driven normalization
Accurate quantification of very low concentrations of RNA via multiplex qRT-PCR has enabled large-scale RNA expression profiling beyond the traditional laboratory settings, including the longitudinal analysis of clinical, pathogen, and environmentally-derived samples. Data normalization in these types of samples poses three major challenges not typically encountered in carefully controlled laboratory experiments: 1) the absence of validated endogenous reference genes (housekeeping genes), 2) marked variation in the abundance of target material, and 3) differences between the reference sample and the test sample (between-sample variability) in the number of detectable transcripts, particularly in samples with low abundance of target material. As such, these studies are often normalized using data-driven methods, but between-sample variability in the number of non-detectable transcripts (i.e., missing transcripts) may bias these data-driven normalizations, potentially leading to false discovery.
Non-detected transcripts, defined as transcripts with a zero or very low abundance as evaluated by a qRT-PCR threshold, may result from two experimentally indistinguishable possibilities: a biological state in which a gene is not transcribed (true negative result), or conversely, an experimental error in which a gene is transcribed but not detected (false negative result) due to the detection limits of the assay. False negative results increase as the abundance of target material decreases and low-concentration transcripts can no longer be amplified efficiently.
In our own work, we have encountered these challenges in the global analysis of Mycobacterium tuberculosis (Mtb) gene expression profiles from pathogen RNA derived from the sputum of patients treated for tuberculosis (TB). We observed that as antibiotic treatment kills the bacteria, the burden of live Mtb decreases, and therefore the number of transcripts detected declines progressively during longitudinal treatment.
In this manuscript, we systematically evaluate data-driven normalization methods, and introduce a novel method – minimum variance normalization – designed for data with variable transcript non-detection. We first discuss circumstances in which standard reference-based normalizations are inadequate and data-driven normalization may be preferable. Using an experimental dilution series of mRNA derived from TB patient samples to simulate decreasing abundance of input target material and increasing proportions of non-detection, we evaluate three existing data-driven normalization methods (quantile, median, and mean) and our novel minimum variance method. To identify whether bias resulting from between-sample variability in transcript non-detection leads to increased false discovery rates, we compare the proportion of genes erroneously classified as differentially expressed with all four methods.
The most commonly used method for normalizing qRT-PCR analysis in laboratory-based and controlled environments is the identification of a set of housekeeping genes whose expression is demonstrated or hypothesized to be invariant, relative to all other genes under the conditions being studied [1]. Expression is reported as a ratio relative to the expression of the set of reference genes in that sample. Experimental validation of stable expression of housekeeping genes is imperative, since choosing the wrong housekeeping genes for normalization may lead to erroneous conclusions [2–4].
Unfortunately, experimental validation of reference genes for normalization is frequently not possible for complex samples such as human clinical specimens, particularly those collected in observational field settings. In our study of Mtb expression for example, bacterial phenotypes and sputum conditions change substantially with increasing duration of antibiotic treatment, transitioning from unencumbered bacterial growth prior to antibiotic initiation to rapid killing in the early days of treatment, to an antibiotic-tolerant “persister” bacterial state as therapy progresses [5].
We considered whether other sample characteristics – including sputum volume, quantitative culture or nucleic acid abundance – might provide an alternative reference for normalization; an ideal measure would enumerate viable bacteria present. Unfortunately, no optimal sample-based referent was identified. Sputum sample volume is difficult to measure and does not directly correlate with bacterial burden. Culture fails to enumerate viable but non-culturable bacilli, which may be transcriptionally active. Quantity of Mtb DNA does not measure viable bacteria since DNA from killed Mtb has a long half-life in sputum [6]. Ribosomal RNA subunits appear to be regulated by antibiotic exposure [7] and are therefore not a stable reference.
One alternative to using the sample specific endogenous controls is to normalize to an exogenous control (spike-in) transcript of known quantity that is not present in the sample of interest. Exogenous controls enable adjustment for variability in PCR quantification [3]; however, since they are added after initial sample processing, they cannot control for variation in input biological material, which is frequently a central challenge in complex samples.
Data-driven normalization is fundamentally different than normalization to endogenous or exogenous controls. Data-driven methods assume that mathematical or statistical properties of the data itself (such as the mean, the median, quantiles, or the variance) are invariant across samples and conditions. These properties are used as a reference to make the overall signal distribution of samples as similar to one another as possible. Long standard in analysis of cDNA microarrays [8, 9], data-driven normalization methods have been adapted for PCR because multiplex platforms can simplify quantification of hundreds or thousands of transcripts [10]. Although data-driven normalization methods have been widely adopted [3, 10], there has been insufficient attention paid to determining if between-sample variability in the proportion of non-detectable transcripts can bias data-driven normalization methods, leading to false discovery and false inference of differential expression.
We compared data-driven normalization methods in dilution experiments designed to simulate the progressive increase in non-detection of low-abundance transcripts that often occurs as the abundance of target RNA decreases. Since there are no true biological differences between our dilutions, any transcripts significantly differentially expressed in the comparison between dilutions represent false discoveries, which we can use to evaluate methods.
In order to evaluate data-driven normalization methods we performed dilution experiments followed by the implementation of three commonly used data-driven methods and a novel alternative method, the minimum variance method, designed to minimize false discovery in such datasets. We compared false discovery rates among these different normalization methods, at different levels of transcript non-detection.
Using the PCR platform described in the supplemental methods, we quantified expression of 768 Mtb genes from sputum samples collected from two TB patients prior to antibiotic treatment. Three technical replicates were performed each at 10- and 100-fold dilutions, and six replicates were performed each at 1,000- and 10,000-fold dilution.
As expected, increasing dilution was associated with an increasing proportion of non-detected transcripts. Non-detection affected 1% of transcripts at 10-fold dilution, 10% at 100-fold dilution, 49% at 1,000-fold dilution, and 62% at 10,000-fold dilution (Figure 1). Non-detection was non-random, as probes with higher cycle threshold (CT) values prior to dilution were preferentially not detected (Supplemental Figure S1).
Figure 1. Simulating effects of low abundance on non-random non-detection of transcripts using serial dilution experiments.

Each point represents a detected transcript in the dilution with the cycle threshold (CT) value coming from the undiluted sample. Distributions become more divergent for greater CT values making quantile-based approaches unreliable.
We normalized pre-dilution and diluted samples with quantile, mean, median, and our novel minimum variance methods. Quantile normalization assumes that the distribution of expression values itself is invariant; it ranks each sample’s expression values then partitions ordered expression values into intervals (quantiles) [10]. Expression values are then shifted so that the quantiles from one sample match the quantiles of the reference sample. Non-random non-detection skews quantiles to being less representative of reference quantiles for higher instances of non-detection. Because quantile normalization is standard practice in microarrays, we evaluated the effect of this non-random, non-detection skew on our dataset (Figure 1 and Supplemental Figure S1). The quantile method was tested with a bin size of 1 percent using all detectable transcripts. Quantiles were then shifted based upon the difference in a given bin’s median compared to the reference median. Distributions become more divergent for greater non-detection, making quantile-based approaches less reliable.
Median normalization assumes that the median expression value should not vary across samples; it shifts each sample’s distribution of expression values such that all samples end with the same median expression value. This is equivalent to a simplified quantile approach that normalizes based on only the 50th percentile. Two possible approaches to implementing median normalization in data with variable non-detection are based on: (1) the median of all detected transcripts and (2) the median of only the transcripts that are detectable in both the reference and sample of interest (referred to as shared transcripts). Both of these methods were chosen to demonstrate the effect that non-detection has upon false discovery rates, and to demonstrate a simple way to mitigate these effects for moderate amounts of non-detection.
Mean normalization shifts expression based on the geometric mean for expression values and fold change or the arithmetic mean for CT values. Mean-based normalization often utilizes correction factors that account for differences in distributions between a reference set and a test set, one of which we used for this analysis [11]. The mean-based approach was calculated using the arithmetic mean of CT values and was then corrected for differences in standard deviation [11] between the undiluted and diluted sample. Only shared detected transcripts between the diluted and undiluted sample were used in this calculation.
In the above methods, between-sample variability in the proportion of non-detected transcripts could alter the statistical property used for normalization, potentially leading to false discovery rates. To minimize this bias in normalization due to variability in transcript detection, we designed a minimum variance method that minimizes the between-sample variance in only the shared transcripts. A reference sample is selected and treated as the true distribution of transcripts. The variance between the reference and a sample is then calculated (equation 1) where Ti is the CT value for ith shared transcript and n is the total number of shared transcripts.
| (1) |
The optimal shift to be used for normalization is the value that minimizes global variance and thereby makes the overall distribution of a sample’s expression values as similar to the reference as possible.
We tested for significant differential expression between the undiluted and diluted samples with unpaired t-tests for methods that used all detectable transcripts and paired t-tests for methods that used shared detectable transcripts. Each dilution replicate was treated as an independent sample with an undiluted reference in order to simulate the effect of having independent samples. A Benjamini-Hochberg multiple testing correction with a 0.05 false discovery rate was used to set a threshold for differential expression.
The criterion for determining the validity of a method is a false discovery rate <0.05 (Figure 2). For samples with low average non-detection (<10% non-detection), all methods performed well. For samples with high average non-detection (62% non-detection), no method had less than a 0.05 false discovery rate after the multiple testing correction. All paired methods performed adequately well up to an average non-detection of 49%, with minimum variance consistently performing the best out of the paired methods. By contrast, when average non-detection reached 49%, false discovery rates were 23% with quantile normalization and 30% with unpaired median normalization, indicating that these methods are inappropriate in samples with moderate to high non-detection. The quantile method performed slightly better than the unpaired median approach across all dilutions, which is expected given that the median approach is a simplified quantile approach.
Figure 2. False discovery rates for different data-driven normalization methods.
The false discovery rate is the percentage of transcripts that were significantly differentially expressed after the Benjamini-Hochberg correction for multiple hypothesis testing. Average non-detection represents the average percent of transcripts that were non-detected across the dilution replicates.
Our dilution series shows that even at a moderate level of non-detection (~50%), normalization techniques that use only the shared detectable genes do not produce high false discovery rates, with the minimum variance approach consistently performing the best. Differential expression testing is still valid for samples with high non-detection and variable target material due to death of bacilli, allowing for elucidation of novel biology in clinical samples with low template material.
Supplementary Material
Acknowledgments
BG acknowledges support from a NLM Institutional Training Grant, NIH 5T15LM009451 and from NICTA, funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program. MS acknowledges support from the Eppley Foundation, the Potts Foundation, and the Boettcher Foundation Webb-Waring Biomedical Research Program.
Abbreviations
- CT
cycle threshold
- TB
tuberculosis
- Mtb
Mycobacterium tuberculosis
- qRT-PCR
quantitative real-time polymerase chain reaction
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, Mueller R, Nolan T, Pfaffl MW, Shipley GL, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clinical chemistry. 2009;55:611–622. doi: 10.1373/clinchem.2008.112797. [DOI] [PubMed] [Google Scholar]
- 2.Dheda K, Huggett JF, Chang JS, Kim LU, Bustin SA, Johnson MA, Rook GA, Zumla A. The implications of using an inappropriate reference gene for real-time reverse transcription PCR data normalization. Analytical biochemistry. 2005;344:141–143. doi: 10.1016/j.ab.2005.05.022. [DOI] [PubMed] [Google Scholar]
- 3.Huggett J, Dheda K, Bustin S, Zumla A. Real-time RT-PCR normalisation; strategies and considerations. Genes and immunity. 2005;6:279–284. doi: 10.1038/sj.gene.6364190. [DOI] [PubMed] [Google Scholar]
- 4.Nolan T, Hands RE, Bustin SA. Quantification of mRNA using real-time RT-PCR. Nature protocols. 2006;1:1559–1582. doi: 10.1038/nprot.2006.236. [DOI] [PubMed] [Google Scholar]
- 5.Mitchison D, Davies G. The chemotherapy of tuberculosis: past, present and future. The international journal of tuberculosis and lung disease: the official journal of the International Union against Tuberculosis and Lung Disease. 2012;16:724–732. doi: 10.5588/ijtld.12.0083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Desjardin LE, Chen Y, Perkins MD, Teixeira L, Cave MD, Eisenach KD. Comparison of the ABI 7700 system (TaqMan) and competitive PCR for quantification of IS6110 DNA in sputum during treatment of tuberculosis. Journal of clinical microbiology. 1998;36:1964–1968. doi: 10.1128/jcm.36.7.1964-1968.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Stallings CL, Stephanou NC, Chu L, Hochschild A, Nickels BE, Glickman MS. CarD is an essential regulator of rRNA transcription required for Mycobacterium tuberculosis persistence. Cell. 2009;138:146–159. doi: 10.1016/j.cell.2009.04.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
- 9.Park T, Yi SG, Kang SH, Lee S, Lee YS, Simon R. Evaluation of normalization methods for microarray data. BMC bioinformatics. 2003;4:33. doi: 10.1186/1471-2105-4-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mar JC, Kimura Y, Schroder K, Irvine KM, Hayashizaki Y, Suzuki H, Hume D, Quackenbush J. Data-driven normalization strategies for high-throughput quantitative RT-PCR. BMC bioinformatics. 2009;10:110. doi: 10.1186/1471-2105-10-110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Willems E, Leyns L, Vandesompele J. Standardization of real-time PCR gene expression data from independent biological replicates. Analytical biochemistry. 2008;379:127–129. doi: 10.1016/j.ab.2008.04.036. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

