Abstract
Background
The use of reference genes for normalization of whole blood qRT-PCR data may be problematic in conditions such as stroke which induce alterations in white blood cell differential. In this study, we assessed the influence of stroke on the stability of commonly employed reference genes, and we evaluated data-driven normalization as an alternative.
Methods
Peripheral whole blood was sampled from 33 stroke patients and 29 controls, and qRT-PCR was used to measure the expression levels of 10 target genes whose transcripts are known stroke biomarkers. Target gene expression levels were normalized via those of 2 frequently cited reference genes (ACTB and B2M) as well as with the NORMA-Gene data-driven normalization algorithm.
Results
Whole blood expression levels of reference genes were significantly altered in stroke patients relative to controls. In comparison to normalization via reference genes, NORMA-Gene produced more robust target gene expression data in terms of differential expression dynamics, variance properties, and diagnostic performance.
Conclusions
Our findings suggest that whole blood expression levels of commonly used reference genes may be sensitive to changes in white blood cell differential, and that data-driven qRT-PCR normalization approaches offer a powerful alternative.
Keywords: housekeeping gene, polymerase chain reaction, RNA, β-actin, β-2 microglobulin, delta delta CT, ddCT, ΔΔCT, complete blood count, digital PCR, NORMA-Gene
The advent of whole transcriptome profiling along with the push toward personalized medical treatments has led to the discovery of countless diagnostically useful RNA biomarkers. The peripheral whole blood is an especially attractive source of such biomarkers, as it is relatively easy to sample, and exhibits altered gene expression in several disease states via the peripheral immune response. Quantitative reverse transcription PCR (qRT-PCR) remains one of the most sensitive methods for quantification of gene expression and is widely used for the testing of clinical specimens in biomarker studies; it is often employed to validate the results of whole transcriptome profiling or to evaluate previously identified candidate biomarkers in clinical trials. While measurement of gene expression via qRT-PCR is robust, the results can be influenced by interspecimen variations in RNA integrity, amplification efficiency, and liquid handling. As a result, internal normalization is required to control for such sources of experimental bias.1 Currently, the most commonly implemented methods of qRT-PCR normalization rely on the use of 1 or more reference genes. While reference genes can provide robust normalization in some applications, their use in biomarker studies for normalization of gene expression data generated from human whole blood can be problematic.
Two key assumptions need to be met in order for accurate normalization via reference genes: first, that the selected reference genes exhibit stable expression across all subjects, and second, that they are measured without error.1 The extent of heterogeneity present in human clinical populations already makes meeting this first assumption difficult; however, the dynamic plasticity in the cellular composition of peripheral blood adds an additional layer of complexity. It is becoming increasingly apparent that the numerous subpopulations of leukocytes that comprise the peripheral immune system are transcriptionally unique, to the extent that they can be distinguished from each other based on RNA expression alone.2 As a result, any change in white blood cell (WBC) differential has the potential to alter gene expression in whole blood, even in the absence of transcriptional changes at the level of individual cells.3 This phenomenon makes identifying stable whole blood reference genes difficult in the numerous conditions which induce alterations in WBC counts.
It is well established that the cellular composition of the peripheral blood is altered in a wide range of conditions including cancer, infectious disease, autoimmune disorders, and countless others. For example, acute brain injuries such as stroke result in a robust immune response characterized by dramatic increases in peripheral blood neutrophil counts and simultaneous decreases in lymphocyte counts.4 Due to the transcriptional heterogeneity of the peripheral immune system and the resultant influence of cellularity on whole blood gene expression, whole blood reference genes for biomarker studies in these conditions not only must be stable in terms of transcriptional regulation but also must exhibit homogeneous expression across the various peripheral leukocyte subpopulations.5 However, the potential effects of cellularity changes are rarely considered when selecting reference genes for gene expression quantification in whole blood. Identifying and validating reference genes suitable for specific experimental applications is often impractical due to time and cost restraints; reference genes that have been well characterized as transcriptionally stable in cellularly static solid tissues or isolated cells are often chosen for normalization of whole blood gene expression in biomarker studies out of convenience. Unfortunately, it is becoming increasingly realized that several of these canonical reference genes perform poorly for normalization of whole blood gene expression data, and such practices can negatively affect the fidelity of the final data and interpretation of results.6 Due to the shortcomings inherently associated with the use of reference genes for normalization of whole blood gene expression data in biomarker studies, alternative normalization strategies should be explored.
Data-driven qRT-PCR normalization approaches have been shown to outperform reference genes in a limited number of benchmarking studies that have analyzed gene expression data generated from cell lines and solid animal tissues,7,8 and they may offer a viable alternative to reference genes for normalization of whole blood gene expression data in human biomarker studies. Unlike reference gene-based approaches, data-driven normalization methods reduce systematic and artificial bias via normalization factors derived from analysis of the collective body of expression data generated within an experiment. Data-driven normalization methods are the widely used for normalization of high-dimensional microarray data9; they have only recently, however, been developed for normalization of qRT-PCR data.The first data-driven qRT-PCR normalization algorithms were only applicable to large data sets containing expression data from hundreds of genes7 and thus were impractical for use in biomarker validation studies that only sought to analyze a small number of select targets. However, a data-driven normalization algorithm known as NORMA-Gene has been developed that is suitable for the normalization of data sets containing as few as 5 target genes.8 NORMA-Gene uses a least squares regression approach to generate normalization factors that optimally reduce variance across a data set by correcting for estimated bias between specimens within experimental groups. NORMA-Gene has been used successfully in an increasing number of studies that have analyzed gene expression in model organisms, such as Daphnia magna, Eisenia fetida, and Phodopus sungorus (species of plankton, earthworm, and hamster, respectively), for which there are not well characterized reference genes.10-12 However, to our knowledge, NORMA-Gene has never been implemented as normalization strategy for the assessment of gene expression in human specimens. Thus, the purpose of this study was to assess the performance of NORMA-Gene relative to reference genes for normalization of whole blood qRT-PCR data in a context relevant to clinical biomarker research.
In a previous investigation, our group used microarray in tandem with machine learning-based analysis to identify a panel of 10 target genes whose peripheral blood expression levels are differentially regulated in ischemic stroke, and can be used to discriminate between stroke patients and controls with levels of sensitivity and specificity approaching 100%.13 In the study reported here, we recruited a similar cohort of stroke patients and control subjects, and used qRT-PCR to measure the peripheral blood expression levels of these previously identified target genes in parallel with those of 2 commonly used reference genes, ACTB and B2M. We first assessed the influence of stroke-induced changes in WBC differential on the expressional stability of both reference genes in whole blood. We then normalized target gene expression levels using either these reference genes or the NORMA-Gene data-driven normalization algorithm, and examined the effects of normalization strategy on target gene differential expression and diagnostic accuracy.
Materials and Methods
Experimental Design
We first recruited 33 ischemic stroke patients and 29 neurologically asymptomatic controls (Table 1) who were demographically similar to the patient populations examined in our prior microarray analysis. Peripheral blood specimens were obtained at emergency department admission and were collected in an identical manner as in the aforementioned investigation. WBC differential was assessed via cytometry, and qRT-PCR was used to measure the whole blood expression levels of ACTB and B2M in parallel with those of the 10 previously identified target genes. The expression levels of ACTB and B2M were further measured in peripheral leukocyte subpopulations isolated from the blood of a subset of 3 control subjects using immunomagnetic negative selection.
Table 1.
Clinical and Demographic Characteristics
| Control (n = 29) | Stroke (n = 33) | P | |
|---|---|---|---|
| a Age (mean ± SD) | 60.2 ± 17.2 | 72.6 ± 12.8 | 3E-8* |
| b Female n (%) | 24 (82.8) | 20 (61.6) | 0.091 |
| a NIHSS (mean ± SD) | 0.0 ± 0.0 | 8.7 ± 7.8 | 1E-7* |
| b Family history of stroke n (%) | 16 (55.2) | 13 (39.4) | 0.308 |
| b Hypertension n (%) | 16 (55.2) | 28 (84.8) | 0.013* |
| b Dyslipidemia n (%) | 10 (34.5) | 16 (48.5) | 0.310 |
| b Diabetes n (%) | 2 (6.90) | 6 (18.2) | 0.264 |
| b Previous stroke n (%) | 1 (3.40) | 7 (19.4) | 0.057 |
| b Atrial fibrillation n (%) | 0 (0.00) | 11 (33.3) | 4E-4* |
| b Myocardial infarction n (%) | 0 (0.00) | 9 (27.3) | 0.002* |
| b Hypertension medication n (%) | 14 (48.3) | 24 (72.7) | 0.068 |
| b Diabetes medication n (%) | 1 (3.40) | 6 (18.2) | 0.109 |
| b Cholesterol medication n (%) | 6 (20.7) | 13 (39.4) | 0.168 |
| b Anticoagulant or antiplatelet n (%) | 0 (0.00) | 21 (63.6) | 1E-5* |
| b Current smoker n (%) | 1 (6.70) | 7 (21.2) | 0.012* |
Intergroup statistical comparisons were performed via a2-tailed Student's t-test or b2x2 Fisher’s exact test; *statistically significant.
In order to explore the potential influence of stroke-induced changes in WBC differential on the expressional stability of reference genes in whole blood, we first examined the distribution of ACTB and B2M expression across isolated leukocyte subpopulations. We then evaluated the relationship between the whole blood expression levels of both reference genes and post-stroke neutrophil to lymphocyte ratio (NLR), and further compared the whole blood expression levels of both reference genes between groups.
In order to explore the potential influence of qRT-PCR normalization strategy on the ability to detect the differential expression of peripheral blood biomarkers, we normalized the expression levels of the 10 target genes using ACT and B2M via common reference gene approaches, as well as with the NORMA-Gene data-driven normalization algorithm. We then compared the postnormalization levels of intergroup target gene differential expression resulting from each normalization strategy to those which we observed in our prior microarray analysis. In addition, we compared the reduction in variance achieved with each approach, as well as the influence of each approach on the diagnostic robustness of target gene expression levels.
Subjects
Ischemic stroke patients and neurologically asymptomatic controls were recruited at Ruby Memorial Hospital, Morgantown, West Virginia. Diagnosis was confirmed by magnetic resonance imaging (MRI) or computed tomography (CT) according to the criteria for diagnosis of acute ischemic cerebrovascular syndrome.14 All samples were collected within 24 hours of symptom onset and prior to thrombolytic intervention. Injury severity was determined according to the National Institutes of Health stroke scale (NIHSS) at the time of blood draw. All procedures were approved by the institutional review boards of Ruby Memorial Hospital and West Virginia University (IRB protocol #1410450461R001). Written informed consent was obtained from all subjects or their authorized representatives prior to any study procedures.
Blood Collection
Parallel peripheral venous blood specimens were collected from subjects via PAXgene RNA tubes (Qiagen) and K2EDTA Vacutainers (Becton Dickenson). PAXgene RNA tubes were frozen immediately and stored at -80ºC until RNA extraction, while K2EDTA tubes were stored at room temperature until white blood cell differential (less than 30 minutes) or leukocyte isolation (performed immediately).
White Blood Cell Differential and NLR
White blood cell differential was assessed in EDTA-treated blood via combined cellular impedance and optical flow cytometry using a clinical hematology analyzer (Cell-Dyn, Abbott Diagnostics). NLR was calculated as absolute neutrophil count divided by absolute lymphocyte count.
Isolation of Leukocyte Subpopulations
Leukocytes were isolated from EDTA-treated blood via immuno-magnetic negative selection (EasySep Direct, StemCell Technologies). Neutrophils, monocytes, CD4+ T-lymphocytes, CD8+ T-lymphocytes, B-lymphocytes, and NK-cells were isolated from 4 mLs of blood per leukocyte population according to the manufacturer-recommended protocol. Isolated cells were rinsed once in phosphate buffered saline and lysed in Qiagen buffer RLT, flash frozen in liquid nitrogen, and stored at -80 until RNA extraction.
RNA Extraction and Quantitative Reverse Transcription PCR
Total RNA extraction was performed via the PreAnalytiX PAXgene blood RNA kit (Qiagen) and automated using the QIAcube system (Qiagen). Isolated RNA was quantified fluorometrically via microplate reader (Synergy HT, BioTek) using the Quant-iT broad range RNA assay kit (Invitrogen). Purity of isolated RNA was determined via spectrophotometry (NanoDrop, Thermo Scientific), and quality of RNA was confirmed via chip capillary electrophoresis (Agilent 2100 Bioanalyzer, Agilent Technologies) prior to downstream experiments. cDNA was generated from purified RNA using the Applied Biosystems high capacity reverse transcription kit (Applied Biosystems).
qPCR was performed using the RotorGeneQ PCR platform (Qiagen). Target and reference sequences were amplified from 10 ng of cDNA template using sequence specific primers (Table 2) and the following thermocycling conditions: 40 cycles consisting of 15 seconds at 95 °C followed by 60 seconds at 60 °C. PCR products were detected using SYBR green (Power SYBR, Thermo Scientific). All reactions were performed in triplicate, and melting point analysis was performed to confirm the presence of a single PCR product. Raw amplification plots were background-corrected and crossing threshold (CT) values were generated via the RotorGeneQ software package. Analysis of individual reaction kinetics was performed to confirm similar amplification efficiencies between reference and target reactions.
Table 2.
Primers Used for qRT-PCR
| Gene | Transcripts (NCBI RefSeq) | Primers (5’-3’) | Product (bp) |
|---|---|---|---|
| B2M | NM_004048.2; XM_006725182.2; XM_005254549.2 | FOR: GAGGCTATCCAGCGTACTCCA REV: CGGCAGGCATACTCATCTTTT | 248 |
| ACTB | NM_001101.3; XM_006715764.1 | FOR: CATGTACGTTGCTATCCAGGC REV: CTCCTTAATGTCACGCACGAT | 250 |
| ANTXR2 | NM_058172.5; NM_001145794.1 | FOR: GATCTCTACTTCGTCCTGGACA REV: AAATCTCTCCGCAAGTTGCTG | 90 |
| STK3 | NM_006281.3; XM_011517258.1; XM_011517255.1; XM_011517254.1; XM_011517253.1; XM_011517252.1; XM_011517251.1; XM_011517250.1; XM_011517249.1; XM_011517247.1; NM_001256312.1; NM_001256313.1 | FOR: CGATGTTGGAATCCGACTTGG REV: GTCTTTGTACTTGTGGTGAGGTT | 105 |
| PDK4 | NM_002612.3 | FOR: GACCCAGTCACCAATCAAAATCT REV: GGTTCATCAGCATCCGAGTAGA | 82 |
| CD163 | NM_004244.5; XM_005253529.3; XM_005253528.3; NM_203416.3 | FOR: GCGGGAGAGTGGAAGTGAAAG REV: GTTACAAATCACAGAGACCGCT | 89 |
| MAL | NM_002371.3; NM_022439.2 | FOR: GCCCTCTTTTACCTCAGCG REV: GCAATGTTTTCATGGTAGTGCCT | 95 |
| PLXDC2 | NM_032812.8; XM_011519750.1 | FOR: ACTCAGATCGAGGAGGATACAGA REV: CCGGCTGGCAGAATCAGATG | 75 |
| ID3 | NM_002167.4 | FOR: GAGAGGCACTCAGCTTAGCC REV: TCCTTTTGTCGTTGGAGATGAC | 170 |
| CTSZ | NM_001336.3 | FOR: CAGCGGATCTGCCCAAGAG REV: CGATGACGTTCTGCACGGA | 198 |
| KIF1B | NM_015074.3; NM_183416.3 | FOR: AAACAAGGGTAATTTGCGTGTGC REV: GTAACTGCCAACTTGGACAGAT | 78 |
| GRAP | NM_006613.3 | FOR: AGCCCTTGCTCAAGTCACC REV: CGTAACTCCGTGGGAAGAAGC | 180 |
For normalization via reference genes, CT values of target genes were normalized by the CT values of either ACTB alone, B2M alone, or their geometric mean using the delta-delta CT (ΔΔCT) method.15 Data-driven normalization was performed via the NORMA-Gene V1.1 macro for Microsoft Excel developed by Heckman et al.8 using the collective CT values of all target and reference genes as input. A NORMA-Gene macro-enabled Excel workbook preloaded with raw data and instructions to reproduce our data-driven normalization analysis is provided as Supplementary File 1 (online).
Microarray Data
Microarray data (HumanRef-8 expression bead chip, Illumina) were previously published by O’Connell, et al., and are available through the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (Geo) via accession number GSE16561.13 Raw probe intensities were background subtracted, quantile normalized, and summarized at the gene level using Illumina GenomeStudio default settings.
Calculation of Variance Reduction
The change in variance induced by each normalization method was determined by calculating the pre- to postnormalization fold change in expression level standard deviation for each target gene. The cumulative fold change in variance across all target genes was calculated via simple averaging and then compared between methods.
Subject Classification Via K-Nearest Neighbors
The diagnostic robustness of postnormalization target gene expression values resulting from each normalization strategy were tested in terms of their ability to discriminate between stroke patients and controls using k-nearest neighbors (k-NN). Classification was performed using z-scored expression values, 5 nearest neighbors, and majority rule via the knn.cv() function of the “class” package for R (R project for statistical computing).16 The resultant prediction probabilities were used to generate receiver operator characteristic (ROC) curves using the roc() function of the “pROC” package for R.17 Areas under the curves were then statistically compared via the roc.test() function according the method described by DeLong, et al.18
Statistics
All statistics were performed using R 2.14. Fisher’s exact test was used for comparison of dichotomous variables. Student's t-test or 1-way ANOVA was used for comparisons of continuous variables where appropriate. Pearson’s r was used to test the strength of correlational relationships. The null hypothesis was rejected when P < .05. In the case of multiple comparisons, P values were adjusted via Holm’s Bonferroni correction.19
Results
Stability of Reference Gene Expression in Peripheral Whole Blood
In terms of the transcriptional distribution of ACTB and B2M within the peripheral immune system, we observed little intersubject variability in expression levels within leukocyte populations. There were, however, several statistically significant differences in expression levels between leukocyte populations. Most notably, expression levels of both ACTB and B2M were nearly 10-fold higher on isolated neutrophils than on isolated lymphocytes (Figures 1A and 1B), a phenomenon that suggests that their expression levels in whole blood are likely prone to influence by alterations in WBC differential. In support of this premise, whole blood expression levels of both ACTB and B2M were positively associated with NLR following stroke (Figures 1C and 1D), and elevated in stroke patients relative to controls (Figures 1E and 1F).
Figure 1.
Stability of reference gene expression in peripheral whole blood. (A,B) mRNA expression levels of ACTB and B2M in isolated leukocyte populations harvested from the peripheral blood of control subjects. Expression values are presented as fold difference relative to the population of lowest expression. Expression levels were statistically compared via 1-way ANOVA; subsequent post hoc testing was performed via 2-sample 2-tailed Student's t-test using Holm’s Bonferroni method to correct for multiple comparisons. Full post hoc results can be found in Supplementary Table 1 (online). (C,D) Relationship between whole blood ACTB and B2M mRNA expression levels and NLR in stroke patients. Expression values are presented as fold difference relative to lowest observed expression. Strength of correlational relationships were tested via Pearson’s r. (E,F) Whole blood mRNA expression levels of ACTB and B2M in stroke patients and controls. Expression levels are presented as fold difference relative to control. Expression levels were statistically compared using 2-sample 2-tailed Student's t-test. All expression values were calculated from raw CT values.
Effects of Normalization Strategy on Target Gene Differential Expression
As we had expected, our findings suggested that the expressional instability of ACTB and B2M that we observed in whole blood negatively affected their effectiveness with regard to normalization of target gene expression levels. Normalization via ACTB and B2M using the ΔΔCT method failed to produce target gene data that recapitulated the pattern of differential expression we previously observed between stroke patients and controls in our prior microarray analysis. However, data-driven normalization using NORMA-Gene resulted in normalized expression data that reproduced our prior results in the case of all 10 target genes (Figures 2A–2J).
Figure 2.
Effects of normalization strategy on target gene differential expression. (A–J) Postnormalization levels of target gene differential expression between stroke patients and controls resulting from each qRT-PCR normalization strategy, compared to those observed in our prior microarray analysis. Expression values are presented as fold difference relative to control. Expression levels were statistically compared between groups via 2-sample 2-tailed Student's t-test.
Effects of Normalization Strategy on Target Gene Variance Reduction
The superior performance of data-driven normalization via NORMA-Gene was quantifiably observable when we assessed variance reduction. While normalization using ACTB and B2M by way of the ΔΔCT method resulted in increased target gene variance in several instances, normalization via NORMA-Gene yielded a reduction in variance the case of all 10 target genes, as well as a statistically greater cumulative reduction in variance (Figure 3).
Figure 3.
Effects of normalization strategy on target gene variance reduction. Pre- to postnormalization fold change in intragroup variance achieved by each normalization strategy. Individual data points represent the average change in variance observed across the stroke and control groups for each target gene. Box plots represent the cumulative change in variance observed across all 10 target genes with respect to each normalization strategy. Cumulative changes in variance were compared between normalization strategies using repeated measures 1-way ANOVA; subsequent post hoc tests were performed via 2-tailed paired Student's t-test using Holm’s Bonferroni method to correct for multiple comparisons.
Effects of Normalization Strategy on Target Gene Diagnostic Performance
Based on our aforementioned observations, it was unsurprising to find that target gene expression data generated via NORMA-Gene exhibited superior diagnostic performance relative to target gene expression data produced by reference gene-based normalization. Postnormalization target gene expression data generated via NORMA-Gene were better able to discriminate between stroke patients and controls using k-NN, as indicated by higher levels of sensitivity and specificity (Figure 4A) and a statistically greater area under ROC curve relative to postnormalization target gene expression data produced via ACTB and B2M using the ΔΔCT method (Figure 4B).
Figure 4.
Effects of normalization strategy on target gene diagnostic performance. A, ROC curves depicting the cumulative ability of target gene expression values resulting from each normalization strategy to discriminate between stroke patients and control using k-NN. The sensitivities and specificities reported are those associated with the ROC cut-off that yielded the highest combined value. B, Areas under each ROC curve. Error bars depict 95% confidence intervals. Areas were statistically compared using the DeLong method and P values were adjusted using Holm’s Bonferonni correction to account for multiple comparisons.
Discussion
It is becoming increasingly evident that the immune system plays a significant role in the pathology of almost all human disease; expectedly, gene expression in the peripheral blood is altered in numerous pathological conditions. This phenomenon makes the peripheral blood an ideal source of candidate RNA biomarkers in numerous disease states. qRT-PCR is often used for quantification of whole blood gene expression in biomarker studies to validate results obtained with genome-wide transcriptomic approaches, or to evaluate previously identified candidate biomarkers in clinical trials. These studies often employ the use of reference genes as a normalization strategy; however, the dynamic cellularity of the peripheral immune system may confound the use of reference gene-based normalization approaches, especially in conditions that induce dramatic alterations in WBC counts, such as stroke. Data-driven normalization strategies such as NORMA-Gene may offer a viable alternative to the use of reference genes for the normalization of whole blood qRT-PCR data, but they have never been implemented as normalization strategy for the assessment of whole blood gene expression in human biomarker research. In this study, we demonstrate that the whole blood expression levels of two commonly employed reference genes are sensitive to stroke-induced alterations in WBC differential, and that NORMA-Gene may offer a robust alternative to reference genes for the normalization of whole blood qRT-PCR data in human biomarker studies.
To our knowledge, this study is the first to explicitly demonstrate that variations in white blood cell differential can alter reference gene expression in human whole blood and have a negative impact on target gene normalization. We observed differences in the whole blood expression levels of 2 commonly employed reference genes, ACTB and B2M, between stroke patients and controls. Our results implied that this differential regulation was at least in part driven by stroke-induced alterations in WBC differential, supporting the idea that reference gene expression in whole blood can be sensitive to changes in cellularity. The instability we observed in these reference genes lessened their effectiveness in normalizing target gene expression levels, as normalization of target gene data using ACTB and B2M impaired our ability to reliably detect the differential expression of known stroke biomarkers.
It is pertinent to note that in the case of this study, we measured targeted genes that are differentially expressed at relatively low levels, thus magnifying the impact that variations in reference gene expression can have on results; similar variations in reference gene expression would likely have had a lesser impact on the fidelity of postnormalized expression data if we had assessed the expression levels of target genes that exhibit a higher degree of differential regulation. It is also important to note that not all reference genes may be as sensitive to alterations in WBC differential as the 2 we evaluated in this study; however, our results highlight the potential confound that alterations in WBC differential can have on reference gene expression in whole blood. Thus, our findings emphasize the need for application-specific vetting of candidate reference genes in future biomarker studies that choose to employ reference genes for normalization of whole blood qRT-PCR data.
Interestingly, both ACTB and B2M have been widely used for the normalization of whole blood gene expression data in countless human RNA biomarker investigations, including several stroke biomarker discovery studies.20-22 Our findings suggest that the use of these genes as references may have to some degree confounded the results of these investigations. In the case of some of these stroke studies, candidate RNA biomarkers which were identified as differentially regulated in stroke using genome-wide transcriptional profiling failed to validate using qRT-PCR.20,21 It is possible that the reason these candidate biomarkers failed to validate in these studies lies in the use of ACTB and B2M as reference genes during validation qRT-PCR analysis.
We believe the potential for confounds highlighted in this study make the use of reference genes unideal for normalization of whole blood gene expression data in biomarker studies; our results suggest that data-driven normalization strategies such as NORMA-Gene may offer a robust alternative approach. In this study, NORMA-Gene generated target gene expression data superior to those produced by reference gene-based normalization in terms of differential expression dynamics, variance properties, and diagnostic performance. These results are consistent with previous benchmarking tests comparing NORMA-Gene to reference genes using artificial data sets and to real data sets generated from earthworm and plankton solid tissue samples.8
In addition to the potential advantages in terms of data fidelity, the implementation of data-driven qRT-PCR normalization approaches offers several logistical benefits over the use of reference genes in human biomarker studies. The proper vetting of candidate reference genes requires considerable time and resources; the use of data-driven normalization approaches eliminates the need to perform this arduous validation process, expediting the collection of experimental data. Furthermore, data-driven approaches allow for the time and resources that researchers would normally dedicate to the measurement of reference genes to be diverted to the study of biologically significant target genes. Thus, data-driven normalization approaches may be especially advantageous in situations where sample is limited, such as in case of pediatric blood samples, because they eliminate the need to dedicate sample to the measurement of reference genes.
It is important to note that there may be caveats with regard to study design that need to be considered for successful implementation of data-driven qRT-PCR normalization approaches. In the case of NORMA-Gene specifically, the fact that artificial interspecimen bias is estimated and reduced in a group-wise manner necessitates the use of a block design with regard to specimen processing in order to minimize intergroup variance; valid normalization requires that specimens from all experimental conditions are processed in parallel with regards to RNA extraction, cDNA synthesis, and PCR detection.8 The number of genes used as input also requires consideration. Benchmarking studies have suggested that the minimum number of genes required for valid normalization using NORMA-Gene is 5; however, the use of additional expression data has been shown to incrementally improve normalization accuracy.8 This potential incremental improvement in performance underlies our decision to include the reference gene expression data as input in our normalization analysis; it is important to note, however, that their inclusion was not required to achieve valid target gene normalization and likely had a relatively small impact on the fidelity of the final data.
Collectively, these results suggest that whole blood expression levels of commonly employed reference genes can be sensitive to alterations in white blood cell differential and may not be optimal for normalization of whole blood qRT-PCR data, especially in conditions, such as stroke, where alternations in the cellular composition of the peripheral immune system play a significant role in pathology. Furthermore, our findings suggest that data-driven normalization approaches such as NORMA-Gene may offer a robust alternative to the use of reference genes for whole blood qRT-PCR normalization in future human biomarker studies.
Funding
Work was funded via a Robert Wood Johnson Foundation nurse faculty scholar award to TLB (70319) and a National Institutes of Health CoBRE sub-award to TLB (P20 GM109098).
Disclosures
Dr. O'Connell and Dr. Barr have a patent pending re: markers of stroke and stroke severity. Dr. Barr serves as chief scientific officer for Valtari Bio Incorporated. Work by Dr. O'Connell is part of a pending licensing agreement with Valtari Bio Incorporated. The remaining authors report no potential conflicts of interest.
Supplementary Material
Acknowledgments
The authors would like to thank the subjects and their families, as this work was truly made possible by their selfless contribution. We also would like to thank the stroke team at Ruby Memorial Hospital for supporting the research effort.
Abbreviations
- qRT-PCR
quantitative reverse transcription PCR
- WBC
white blood cell
- NLR
neutrophil to lymphocyte ratio
- MRI
magnetic resonance imaging
- CT
computed tomography
- NIHSS
National Institutes of Health stroke scale
References
- 1. Huggett J, Dheda K, Bustin S, Zumla A. Real-time RT-PCR normalisation; strategies and considerations. Genes Immun. 2005;6(4):279–284. [DOI] [PubMed] [Google Scholar]
- 2. Abbas AR, Baldwin D, Ma Y et al. Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data. Genes Immun. 2005;6(4):319–331. [DOI] [PubMed] [Google Scholar]
- 3. Whitney AR, Diehn M, Popper SJ et al. Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A. 2003;100(4):1896–1901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Vogelgesang A, Grunwald U, Langner S et al. Analysis of lymphocyte subsets in patients with stroke and their influence on infection after stroke. Stroke. 2008;39(1):237–241. [DOI] [PubMed] [Google Scholar]
- 5. Piek CJ, Brinkhof B, Rothuizen J, Dekker A, Penning LC. Leukocyte count affects expression of reference genes in canine whole blood samples. BMC Res Notes. 2011;4:36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Stamova BS, Apperson M, Walker WL et al. Identification and validation of suitable endogenous reference genes for gene expression studies in human peripheral blood. BMC Med Genomics. 2009;2:49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Mar JC, Kimura Y, Schroder K et al. Data-driven normalization strategies for high-throughput quantitative RT-PCR. BMC Bioinformatics. 2009;10:110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Heckmann LH, Sørensen PB, Krogh PH, Sørensen JG. NORMA-Gene: a simple and robust method for qPCR normalization based on target gene data. BMC Bioinformatics. 2011;12:250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7(1):55–65. [DOI] [PubMed] [Google Scholar]
- 10. Meyer V, Lerchl A. Evidence for species-specific clock gene expression patterns in hamster peripheral tissues. Gene. 2014;548(1):101–111. [DOI] [PubMed] [Google Scholar]
- 11. Qiu TA, Bozich JS, Lohse SE et al. Gene expression as an indicator of the molecular response and toxicity in the bacterium Shewanella oneidensis and the water flea Daphnia magna exposed to functionalized gold nanoparticles. Environ Sci Nano. 2015;2(6):615–629. [Google Scholar]
- 12. Hayashi Y, Engelmann P, Foldbjerg R et al. Earthworms and humans in vitro: characterizing evolutionarily conserved stress and immune responses to silver nanoparticles. Environ Sci Technol. 2012;46(7):4166–4173. [DOI] [PubMed] [Google Scholar]
- 13. O’Connell GC, Petrone AB, Treadway MB et al. Machine-learning approach identifies a pattern of gene expression in peripheral blood that can accurately detect ischaemic stroke. npj Genomic Med. 2016;1:16038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Kidwell CS, Warach S. Acute ischemic cerebrovascular syndrome: diagnostic criteria. Stroke. 2003;34(12):2995–2998. [DOI] [PubMed] [Google Scholar]
- 15. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) method. Methods. 2001;25(4):402–408. [DOI] [PubMed] [Google Scholar]
- 16. Venables WN, Ripley BD.. Modern Applied Statistics with S. New York, NY: Springer New York; 2002. [Google Scholar]
- 17. Robin X, Turck N, Hainard A et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–845. [PubMed] [Google Scholar]
- 19. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6(2):65–70. [Google Scholar]
- 20. Barr TL, Conley Y, Ding J et al. Genomic biomarkers and cellular pathways of ischemic stroke by RNA gene expression profiling. Neurology. 2010;75(11):1009–1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Petrone AB, O’Connell GC, Regier MD et al. The role of arginase 1 in post-stroke immunosuppression and ischemic stroke severity. Transl Stroke Res. 2016;7(2):103–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Raman K, O’Donnell MJ, Czlonkowska A et al. Peripheral blood MCEMP1 gene expression as a biomarker for stroke prognosis. Stroke. 2016;47(3):652–658. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





