Abstract
Quantitative PCR (qPCR) is the method of choice in gene expression analysis. However, the number of groups or treatments, target genes and technical replicates quickly exceeds the capacity of a single run on a qPCR machine and the measurements have to be spread over more than 1 plate. Such multi-plate measurements often show similar proportional differences between experimental conditions, but different absolute values, even though the measurements were technically carried out with identical procedures. Removal of this between-plate variation will enhance the power of the statistical analysis on the resulting data. Inclusion and application of calibrator samples, with replicate measurements distributed over the plates, assumes a multiplicative difference between plates. However, random and technical errors in these calibrators will propagate to all samples on the plate. To avoid this effect, the systematic bias between plates can be removed with a correction factor based on all overlapping technical and biological replicates between plates. This approach removes the requirement for all calibrator samples to be measured successfully on every plate. This paper extends an already published factor correction method to the use in multi-plate qPCR experiments. The between-run correction factor is derived from the target quantities which are calculated from the quantification threshold, PCR efficiency and observed Cq value. To enable further statistical analysis in existing qPCR software packages, an efficiency-corrected Cq value is reported, based on the corrected target quantity and a PCR efficiency per target. The latter is calculated as the mean of the PCR efficiencies taking the number of reactions per amplicon per plate into account. Export to the RDML format completes an RDML-supported analysis pipeline of qPCR data ranging from raw fluorescence data, amplification curve analysis and application of reference genes to statistical analysis.
Keywords: qPCR, Between-run variation, Between-plate correction, Software, RDML, Multi-plate experiment
1. Introduction
In experimental biology, replicating a series of measurements under presumably identical circumstances often leads to results that show the same proportional differences between experimental groups, disease states or treatments, but different absolute values within each of the conditions in the different measurement sessions [1]. A quantitative PCR (qPCR) analysis also often requires more than one run, each run consisting of one plate with measurements for every well on the plate. In such a multi-plate qPCR analysis, between-run variation may result from small, but systematic, differences in cDNA, primer and reagent concentrations, reaction temperatures and timing of denaturing, annealing and elongation phases. Apart from this, the yield of the RT-reaction and handling of the plates prior to running the PCR are systematically affecting the results per run. The fact that a logarithmic plot of replicated measurements shows parallel lines per run (Fig. 1A) indicates that these variation sources together proportionally increase or decrease the outcome of the amplification reaction for all samples on the plate. Therefore, this between-run variation is commonly removed using calibrator samples. In this correction, every observed target quantity on the plate is divided by the geometric mean of the target quantities of the calibrator samples on that plate [2]. Consequently, all observations on a plate are divided by a constant, the so-called calibration factor; the between-run variation is thus assumed to result from a multiplicative factor that affects the whole plate in a similar way. A drawback of the calibrator sample approach is that it requires the successful measurement of all calibrator samples because missing values will bias the calibration factor. Moreover, the calibrator samples cannot be used in further statistical analysis because their residual error is artificially reduced compared to other samples. In the extreme case, with only one calibrator sample per plate, the calibrator samples are without variation after the correction [1]. As no measurement is free of error, the error in the calibrator samples are propagated to all samples in the plate and is, therefore, still present as between-run variation in all non-calibrator samples.
Factor correction was proposed as a method to determine the multiplicative factors that enable the removal of the systematic bias between measurement sessions without the use of calibrators [1]. As stated in that paper, “for a correction method to be effective, the correction factors should be based on all observations in the session and the estimation of these factors should not be affected by incomplete data sets”. The current paper describes the use of factor correction in the analysis of qPCR data. In terms of quantitative PCR, the above quote means that an optimal estimation of factors to correct between-run differences requires a maximum overlap between plates. This overlap can be reached with respect to conditions, being the unique combinations of targets, experimental treatments or biological variables in the study. The statistical model is not based strictly on technical replicates, as is the case for calibrator samples, but also on biological replicates. It will be shown that all target quantities that have technical or biological replicates between plates can serve to estimate the factor per run. The described program, Factor-qPCR, imports analysed qPCR data, consisting of Cq, PCR efficiency, quantification threshold and target quantity values, and performs the correction between plates. The corrected target quantities can directly be used to calculate relative gene expression levels [3] which can further be analysed with standard statistical software. Alternatively, to calculate efficiency-corrected relative gene expression ratios [4] and to perform further statistical analysis with software commonly used in qPCR analysis [2], the corrected target quantities can be converted into efficiency-corrected Cq values, using a PCR efficiency per target. Factor-qPCR supports import of data and export of corrected values in spreadsheet or RDML format [5].
2. Methods and results
2.1. Factor correction model
As described, measurements that result from multi-session experiments can be considered to result from a mixed additive and multiplicative model [1]. This is also true for multi-plate qPCR experiments. The equations in this paper refer to a multi-plate experiment with N runs, J conditions and K measurements per target; the lowercase characters are used as indexes in the equations.
The multiplicative nature of the between-run variation in the data set is illustrated by the approximately parallel lines that connect the data points per run in a logarithmic plot of the data (Fig. 1A). In a multi-plate experiment with such a multiplicative between-run variation, the observations can, therefore, be described with Eq. (1)
(1) |
The additive part of this model, between parentheses, states that the result of a measurement in condition j is the sum of the population mean (Ymean), the effect of condition j (Cj), and a technical error and/or biological variation. Note that the condition effect C in this model represents the effect of a combined condition consisting of the target and the biological conditions in which the samples are collected. For each run n, the additive part of the observation Y.j is multiplied by plate factor Fn. This factor affects every target and sample in a plate in the same way.
In this model the biological error is normally distributed with mean 0 and standard deviation σ. This biological error reflects the variance within a condition, whereas the condition effects reflect the differences between conditions. As in standard statistics, the sum of the condition effects is 0. The product of the session factors equals 1, which, together with the condition effects sum of 0, ensures that in a complete and balanced design the mean of all observations Ynj is equal to the overall Ymean.
2.2. Estimation of the session factors with the ratio approach
The run factors F can be determined by the described ratio approach [1]. This approach is based on the fact that a between-run ratio for a pair of observations from different runs (a and b) but for the same condition (Cj) can be written as Eq. (2):
(2) |
In this between-run ratio, the ratio of the two normally distributed additive parts of the multi-run model (Eq. (1)) has a Cauchy distribution. Theoretically speaking, the Cauchy distribution has no mean but it has a median of zero and a symmetrical clock shape [6]. Because more pairs of observations overlap between plate a and plate b, the average of the last term in Eq. (2) will approach zero and cancel out. This makes every between-run ratio an unbiased estimate of the ratio of the two run factors [1]. Because of this, the best estimate of a ratio of two run factors can be determined by calculating the geometric mean for all pairs of observations that have conditions in common in each pair of two runs.
To derive run factors from this between-run ratio matrix, the matrix has to be complete. When a pair of runs has no conditions in common, the between-run ratio will be missing (e.g. runs 5 and 6 in Fig. 1, Fig. 2). Such a missing between-run ratio can be substituted. To this end, the quotients of the other ratios in the column of the missing ratio and those in each of the other columns of the matrix are calculated. Their geometric mean is the fold difference between the columns in the matrix (Fig. 2B). An estimate of the missing ratio can then be calculated by dividing the observed ratio in the row of the missing ratio (Fig. 2A) by this fold difference. The geometric mean of these N − 1 estimates, one per column, is then the best substitute for the missing ratio (Fig. 2C). Eq. (3) shows the nested geometric means that are applied in this calculation.
(3) |
In Eq. (3), N is the number of runs; n and i range from 1 to N − 1, excluding row r and column c, respectively. The inner geometric mean calculates the fold difference between matrix columns. The observed ratios in the affected row are divided by this difference to obtain substitutes and the outer geometric mean then results in the best substitute. Substitution can be applied repeatedly. However, when more than 2 rounds of substitution are required, the design of the experiment is clearly incomplete and it is recommended to run another plate to fill in the missing overlap.
From the matrix in which the missing between-run ratios are substituted (Fig. 2D), the run factors can be determined. In the matrix of between-run ratios every cell is an estimate of the factor of the run in the column divided by the factor of the run in the row (Fig. 2D). Because in the factor correction model (Eq. (1)), the product of all session factors equals 1, the geometric mean of column i in this between-run ratio matrix is an estimate of the correction factor for run i (Fig. 2E; as shown in Eq. (4), in which n ranges over the N rows in the matrix).
(4) |
The between-run variation in the original data set can now be removed by dividing each measured target quantity (N0) in each plate by the corresponding run factor (Eq. (5)).
(5) |
The corrected data of Fig. 1A are shown in Fig. 1B. After factor correction, the average of the between-run ratios has become 1.
2.3. Application of factor correction to qPCR data: Factor-qPCR
Analysis of qPCR data starts with raw fluorescence values which can either be processed by the software of the qPCR system, resulting in a PCR efficiency per target derived from a dilution series and a Cq value per target and sample, or be exported as raw data. In the latter case, the amplification curves can be analysed with other programs mostly resulting in the PCR efficiency per target and a target quantity per sample and target [7], [8] (Fig. 3). The qPCR systems or analysis programs export these data per run to a table, in text or spreadsheet format, or to an XML-based hierarchical tree structure defined as RDML (www.rdml.org) [5], [9]. During the import of the data of every run into Factor-qPCR, the program creates a variable that identifies the plate. The user has to select the variables that identify the targets, group and treatment annotations which serve to set the combined condition. When the target quantity (N0), that has to be corrected, is not reported in the input file, it has to be calculated from the quantification threshold (Nq) and PCR efficiency value (both per target) and Cq value (per sample) (Eq. (6)).
(6) |
After these user choices, the correction factor per run can be determined (Fig. 3). Dividing each target quantity by the correction factor has removed the between-run variation (Eq. (5)).
The corrected N0 values can then be saved into a spreadsheet format that can be read by standard statistical packages. However, the use of the corrected target quantity by statistical packages used in qPCR analysis requires its conversion into an efficiency-corrected Cq value. To this end, the PCR efficiency per target that was reported per plate has to be converted into a PCR efficiency per target that is representative for the multi-plate experiment. Because the individual PCR efficiencies, as reported by amplification curve analysis programs, are normally distributed when pooled over runs, the mean of those values can be used to estimate the mean PCR efficiency for each target for the combined runs (Eq. (7))
(7) |
In Eq. (7), N is the number of plates and Kn is the number of observations for the specific target on plate n. However, when only the PCR efficiency per target is reported for each plate, e.g. derived from a dilution series included in every plate, the sum of the PCR efficiency values for the multi-plate experiment is calculated by multiplying the reported value (En) by the number of observations for the target on the plate (Kn); the mean PCR efficiency can then be calculated (Eq. (8)).
(8) |
Export to RDML requires that a standard error (SE) for these PCR efficiency values is reported. This SE can be determined from the residual variation (SSres). In case of individual efficiencies, this residue is determined with respect to the PCR efficiency per target per plate (Eq. (9)).
(9) |
In Eq. (9), En is the PCR efficiency of the target on plate n and En,k are the individual efficiencies observed for the Kn samples in which the target was amplified. When only an SE of the PCR efficiency per plate is reported in the input files, these SEs have to be converted into the variation per plate and summed over plates (Eq. (10)).
(10) |
The SE of the mean PCR efficiency for the combined runs can be calculated from the SSres as usual (Eq. (11)).
(11) |
In Eq. (11), Σ(Kn) is the sum of the number of observations for each target per plate.
An efficiency-corrected Cq value per sample can be calculated with the PCR efficiency per target (Emean) and the corrected target quantity (r_N0). To this end, the inverse of the logarithmic version of Eq. (6) is applied (Eq. (12)).
(12) |
The quantification threshold is set to 1. Factor-qPCR enables the export of the corrected data to a spreadsheet or RDML for further statistical analysis. The original plate differences were removed and can, therefore, be ignored in these analyses. Factor-qPCR was specifically implemented to perform factor correction on multi-run quantitative PCR experiments. The program and a demonstration dataset can be downloaded from http://HFRC.nl.
3. Discussion
Between-run correction in qPCR data analysis assumes a multiplicative difference between plates in a multi-plate qPCR experiment. After division by a constant factor per plate, all between-run differences that are not multiplicative will still be present. The latter is true, irrespective of the method used to determine the correction factor: restricted to calibrator samples or including all overlapping technical and biological replicates. The advantage of using all overlap between plates is that measurements that have failed for technical reasons can be replicated easily in a new run and then added to the experiment.
Factor correction requires overlap between plates. Although this overlap can be reached by spreading replicate measurements over different plates, the required overlap is not restricted to such technical replicates. The statistical model on which the correction is based requires maximum overlap between plates with respect to targets and samples collected under the same biological or experimental conditions. In the ratio approach (Eq. (2)), the biological and technical errors, both normally distributed with a mean of zero, cancel out (Eq. (2)). Overlap between plates can, therefore, be based on biological and technical replicates. The design of the experiment should preferably be balanced and complete: a similar number of samples coming from every condition on each of the required plates. Because every target and condition then has the same influence on the between-run correction, the loss of condition effects through the correction is avoided.
Plate design in qPCR experiments often aims at sample maximisation: biological or medical samples that have to be compared are measured all in 1 plate, limiting the number of targets that can be measured on that plate [2]. This approach allows comparison between samples, but not between targets. Alternatively, target maximisation enables comparison between genes but not between samples. The maximum overlap design that is basic in Factor-qPCR allows comparison of samples and genes and thus enables the study of gene expression pathways.
Competing interests
The authors declare that they have no competing interests.
Acknowledgement
ARV is supported by the Marie Curie CardioNeT grant (CA 289600).
References
- 1.Ruijter J.M., Thygesen H.H., Schoneveld O.J.L.M., Das A.T., Berkhout B., Lamers W.H. Factor correction as a tool to eliminate between-session variation in replicate experiments: application to molecular biology and retrovirology. Retrovirology. 2006;3:2. doi: 10.1186/1742-4690-3-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hellemans J., Mortier G., De P.A., Speleman F., Vandesompele J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol. 2007;8:R19. doi: 10.1186/gb-2007-8-2-r19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ruijter J.M., Ramakers C., Hoogaars W.M., Karlen Y., Bakker O., van den Hoff M.J., Moorman A.F. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res. 2009;37:e45. doi: 10.1093/nar/gkp045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pfaffl M.W. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 2001;29:e45. doi: 10.1093/nar/29.9.e45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lefever S., Hellemans J., Pattyn F., Przybylski D.R., Taylor C., Geurts R., Untergasser A., Vandesompele J. RDML: structured language and reporting guidelines for real-time quantitative PCR data. Nucleic Acids Res. 2009;37:2065–2069. doi: 10.1093/nar/gkp056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Johnson N.L., Kotz S., Blakrishnan N. vol. 1. John Wiley; New York: 1994. pp. 298–331. (Continuous Univariate Distributions). [Google Scholar]
- 7.Ruijter J.M., Pfaffl M.W., Zhao S., Spiess A.N., Boggy G., Blom J., Rutledge R.G., Sisti D., Lievens A., De P.K., Derveaux S., Hellemans J., Vandesompele J. Evaluation of qPCR curve analysis methods for reliable biomarker discovery: bias, resolution, precision, and implications. Methods. 2012;59:32–46. doi: 10.1016/j.ymeth.2012.08.011. [DOI] [PubMed] [Google Scholar]
- 8.Pabinger P., Rödiger S., Kriegner A., Vierlinger K., Weinhäusel A. A survey of tools for the analysis of quantitative PCR (qPCR) data. Biomol. Detect. Quantif. 2014;1:23–33. doi: 10.1016/j.bdq.2014.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ruijter J.M., Lefever S., Anckaert J., Hellemans J., Pfaffl M.W., Benes V., Bustin S.A., Vandesompele J., Untergasser A., RDML Consortium RDML-Ninja and RDMLdb for standardized exchange of qPCR data. BMC Bioinform. 2015;16:197. doi: 10.1186/s12859-015-0637-6. [DOI] [PMC free article] [PubMed] [Google Scholar]