Statistical Analysis of Repeated MicroRNA High-Throughput Data with Application to Human Heart Failure: A Review of Methodology

Shesh N Rai; Herman E Ray; Xiaobin Yuan; Jianmin Pan; Tariq Hamid; Sumanth D Prabhu

doi:10.2147/OAMS.S27907

. Author manuscript; available in PMC: 2014 Apr 13.

Published in final edited form as: Open Access Med Stat. 2012 Apr 13;2012(2):21–31. doi: 10.2147/OAMS.S27907

Statistical Analysis of Repeated MicroRNA High-Throughput Data with Application to Human Heart Failure: A Review of Methodology

Shesh N Rai ¹, Herman E Ray ², Xiaobin Yuan ¹, Jianmin Pan ¹, Tariq Hamid ^3,⁴, Sumanth D Prabhu ^3,⁴

PMCID: PMC3984897 NIHMSID: NIHMS435695 PMID: 24738042

Abstract

Complex experimental designs present unique challenges in the analysis of microRNA (miRNA) Cycle to Threshold (Ct) values. In this manuscript, we discuss various statistical techniques and their application in an analysis performed at the JG Brown Cancer Center. We consider data quality evaluation, data normalization, and statistical hypothesis procedures all in context of the example. The experiment utilized as the motivating example involved repeated sampling over time. The intra-subject correlation created by the repeated sampling should be incorporated into the analysis resulting in additional significant miRNAs. The statistical techniques leveraged to analyze miRNA Ct values resulting from qPCR should incorporate key features of the experimental design. It discusses potential issues with the commonly used methodologies when the experiment collects multiple samples from the same individuals over time.

Keywords: miRNA, repeated measurements, normalization, hypothesis testing

Introduction

Several studies have examined the role of miRNAs in various diseases such as cancer¹ and heart disease.² miRNAs are short, noncoding RNA molecules that affect gene expression. The clinical understanding of the role of miRNAs in disease is growing very quickly. There are several different techniques used to obtain the expression levels of the miRNAs and include microarray analysis and TaqMan PCR from Applied Biosystems.³ The reproducibility of experiments performed with Taq-Man PCR has also been investigated and found to be high.⁴ There are also several different normalization techniques that can be employed to remove systematic differences between samples that do not represent true biologic differences.⁵ The Ct value represents the cycle number at which the fluorescent signal of the reporter dye crosses a threshold value.⁶ The threshold is placed such that the PCR is in the exponential phase.

Typically, a hypothesis testing procedure is applied once the Ct values are normalized. The student’s t-test appears to be a popular procedure applied to compare the mean of the normalized Ct values between the two groups.^7–9 Then the p-values need to be adjusted to control the type I error rate using an appropriate method such as the Benjamin-Hochberg¹⁰ method.

The experiments are becoming more complex as they are designed to examine the relationships between the disease process, treatments, and the expression of miRNAs. They often involve repeated measurements on the same subjects over time and require specialized statistical techniques to handle the additional correlation. Montenegro et al.¹¹ developed an experiment that examines the expression of miRNAs at different gestational ages. The authors used a Generalized Estimating Equation (GEE)¹² model

g (E [Y_{i j k} ∣ x_{i j}])

(1)

with an exchangeable correlation structure where Y_ijk is the k^th Ct value for the i^th subject and the j^th gestational age. The x_ij is the j^th covariate for the i^th subject. The model included the obstetric condition and gestational age. The GEE model is in the class of semi-parametric models since it does not require full specification of the likelihood to calculate the parameter estimates. The model is easily applied in the repeated sampling situation created by the qPCR experiment.

The experimental designs present unique challenges to the analyst. The motivation of this manuscript is an analysis of miRNA Ct values performed at the James Graham Brown Cancer Center. The experiment was designed to perform an exploratory analysis of changes in the cardiac expression of miRNAs in patients with end-stage heart failure (HF) undergoing placement of a left ventricular assist device (LVAD) and subsequent heart transplantation. The experimental design presented some unique challenges in the analysis of the data that require a description of the experiment for full appreciation. The remaining sections will describe the experiment, compare various analysis techniques, finally discuss the results, and provide conclusions.

Motivating Example

The experiment is designed to analyze miRNA expression profiles in patients with advanced HF undergoing surgical implantation of an LVAD (a mechanical pump designed to assist in blood flow from the weakened heart) as a bridge to heart transplantation, i.e., to maintain the patient until the heart can be replaced.

The initial assessment of miRNA expression levels is an exploratory analysis of 384 unique miRNAs. Number of the selected miRNAs are carefully selected based on resources and their known features in heart functioning at the time of experimentation. Each subject has a sample of the left ventricle removed at the time of LVAD implantation (IMP), a sample taken of the left ventricle at the time of heart transplant and LVAD explantation (ELV), and a sample taken of the right ventricle at the time of LVAD explant (ERV). Therefore, each person receiving an LVAD in the study contributed three samples at two different time periods. The term subject refers to the patient under study. The plate refers to one of the three biologic samples that the subject contributed. The plate is the specific set of miRNAs contributed by a subject from a specific point in time and location in the heart. All participants signed an informed consent form for the use of the tissue and the study was approved by the University of Louisville’s Institutional Review Board (IRB, IRB# 101.04JH). There are also archived control samples which represent hearts not experiencing failure.

The experiment is intended to be an exploratory analysis and there are a limited number of wells. Therefore, there are no technical replicates in order to maximize the number of the miRNAs to be included in the analysis. Each of the 384 wells contains a unique miRNA except for the endogenous controls which may be repeated a few times. There are challenges created by the time required to collect the data as well as the multiple time points in the trial. The specific challenges will be discussed in the following section.

Statistical Methodologies

The experimental design presents two unique challenges to the analyst. First, the experiment did not include technical replicates in order to maximum the number of unique miRNAs to include in the experiment. This implies the normal data quality techniques are not available but a different approach is required. Second, the repeated sampling of miRNAs from the same subjects over time presents a challenge to the analyst. The typical normalization techniques are not designed to preserve the naturally occurring correlation structures. There are statistical models that can include the correlation structure which can be employed.

In this manuscript, we discuss the most commonly used methodologies or methodologies with readily available software. We apply them to the data discussed in the motivating example.

Data Quality

The data quality had to be assessed once the data was ready for analysis. Usually, the technical replicates are used to assess the quality of the Ct values. The technical replicates can be used to determine if the information is truly missing or missing at random. We defined missing to be a Ct value greater than 35 even though the software can detect values up to 40. If the values are missing at random, then an imputation algorithm can be utilized, while truly missing values should be unaltered. We had to approach the problem differently since the experiment was constructed to include as many unique miRNAs as possible excluding technical replicates.

The general approach to the specific problem was completed in two steps. First, we calculated the number of plates with values larger than 35 for each miRNA. All of the plates regardless of time during treatment were simultaneously included in the analysis. Then miRNAs that were missing across a large number of plates were excluded from the study. The benefit is a reduction in the number of miRNAs included in the hypothesis testing.

Although using a fixed threshold value is suggested and routinely used, but is subject to selection bias. Another alternative is to use a varying threshold for each plate, in which case we need to combine CT values and varying threshold values in a parametric model that we consider in another manuscript.

Normalization

The second issue encountered during the analysis of the Ct values was appropriate normalization. Normalization is required to remove unwanted technical variation present in the sample.¹³ Many of the normalization techniques are from the analyses of mircroarray datasets and may not be completely applicable. The number of measurements is much smaller in the miRNA data and the majority of miRNA are not expressed or expressed at very low levels.⁵

In this analysis, we considered the delta-Ct method,¹⁴ the mean normalization,¹³ quantile normalization,15 and rank invariant normalization.¹⁶ The normalization techniques represent commonly used techniques with readily available software. The coefficient of variation associated with the raw data is included as a reference to evaluate the normalization techniques against.

Let N be the total number of subjects included in the study after filtering and i = 1,…, N be the individual patient number. Let j = 1, 2, 3 be the repeated sample number for each subject. In the motivating example, j = 1 corresponds with the IMP biologic sample, j = 2 corresponds with the ELV biologic sample, and j = 3 corresponds with the biologic ERV sample. Then M = 3N is the total number of plates included in the experiment where m = 1,…,M. We will also let K be the unique count of miRNAs included in the analysis after filtering where k indicates the k^th miRNA for k = 1, 2, …, K. Then Ct_ijk represents the Ct value from the i^th subject, j^th sample, and k^th miRNA. For simplicity, the calculations that are plate specific will be discussed in terms of the subscript m where m = 1, …, M. Note that M = 3N which represents the total number of plates to be analyzed. The calculations that are sample and person specific will include all three subscripts i, j, and k.

Delta-Ct

The delta-Ct method subtracts the mean of the endogenous controls from the remaining Ct values. Two endogenous controls were selected for the analysis RNU24 and RNU48. The algebraic equation representing this is

Δ C t_{mk} = C t_{mk} - C t_{e}

(2)

where Ct_e is average of the Ct values from the endogenous controls and Ct_mk is the individual values for all the other miRNAs in the sample. The delta-Ct is a popular method of normalization due to the natural biologic motivation and explanation which is contained in the Appendix.

Mean Normalization

The mean normalization subtracts the average of plate m’s Ct values from all Ct values contained on plate m. The mathematical representation is

Δ C t^{m} = C t_{m k} - \frac{\sum_{k = 1}^{K} C t_{m k}}{K}, k = 1, \dots, K a n d m = 1, \dots, M

(3)

where Ct_mk is the k^th miRNA from the m^th plate and M = 3N . The method is similar, in essence, to the Delta-Ct method but relies on an average of all Ct values to perform the normalization.

Quantile Normalization

The quantile normalization forces the distribution of Ct values to be the same across all the plates. The method takes the largest value and replaces it with the mean of the largest values, and then repeats for each subsequent data point. Let

q_{k}^{*} = \frac{1}{\sqrt{M}} \sum_{m = 1}^{M} q_{m k} d = (\frac{1}{M} \sum_{m = 1}^{M} q_{m k}, \dots, \frac{1}{M} \sum_{m = 1}^{M} q_{m k})

(4)

where

d = (\frac{1}{\sqrt{M}}, \dots, \frac{1}{\sqrt{M}})

(5)

and q_k is the k^th row of ordered Ct values. The Ct values are ordered for each plate independent of the other plates. The quantile normalization methodology is commonly used in the analysis of microarray expression values but the technique assumes that the distribution of the expression values is the same.

Rank Invariant Normalization

The rank invariant method attempts to determine miRNAs which whose rank does not change across the plates. The rank invariant miRNAs are then used to create a smooth curve applied to the entire sample. The rank invariant normalization is completed in two steps. First, the k^th miRNA is considered to be rank invariant if the absolute value of the change in the relative rank (r) of the miRNA in the $m_{i}^{t h}$ plate and the $m_{j}^{t h}$ plate is less than 0.05 or

\frac{∣ r_{m_{i} k} - r_{m_{j} k} ∣}{r_{m_{j} k}} < 0.05 .

(6)

Then a smooth line is fit through the set of rank invariant genes which is applied to all miRNAs. The rank invariant method is another technique resulting from the analysis of mircoarray expression values.

Coefficient of Variation

The cumulative distribution of the coefficient of variation is used to compare the various normalization techniques. We calculated the coefficient of variation for each miRNA over all the plates. Let K be the total number of miRNAs then the coefficient of variation is

C V_{k} = \frac{s d (C t)}{m e a n (C t)}, k = 1, 2, \dots, K

(7)

Next we created a cumulative distribution of the coefficient of variation as

\tilde{F} = \frac{1}{K} \sum_{k = 1}^{K} I {C V_{k} \leq t}

(8)

Hypothesis Testing

The final issue to consider during the analysis was the appropriate hypothesis testing procedure. A t-test, Mann-Whitney U test, and a testing procedure proposed byPounds and Rai18 were considered as well as a model based on the GEE approach.

The experiment collected data on the same individuals at two different time points. The first sample was taken when the LVAD was implanted but only the left ventricle could be sampled. The second sample was taken when the heart transplant was performed and both the left and right ventricles were sampled.

GEE Model

A GEE model was constructed to consider the repeated sampling. The Y_ijk is the repeated Ct values taken at each of the time points described for each individual. The x_ij represent the covariates including the three groups which are the sample taken at the time of LVAD implant (IMP), the left ventricle sample at the time of explant (ELV), and the right ventricle sample at the time of explant (ERV). Contrasts were constructed to analyze the difference between the Ct values taken at the time of implant and the left ventricle sample at the time of explant as well as the difference in Ct values at the time of the explant between the left and right ventricles. The analysis currently assumes an exchangeable correlation structure which accounts for the correlation between the samples for each subject. The Gaussian distribution with the identity link function was selected given the relatively normal distribution of the normalized Ct.

t-test

If the intra-subject correlation is ignored then a paired t-test can be employed to compare the average expression values from two of the three samples. The mathematical notation is

t_{d} = \frac{\sqrt{N} \bar{d_{k}}}{s_{d}}

(9)

where

{\overset{‒}{d}}_{k} = \frac{\sum_{n = 1}^{N} (C t_{i 1 k} - C t_{i 2 k})}{N}

(10)

and Ct_i1k is the Ct value for the k^th miRNA from the first sample for the i^th subject. The value s_d is the appropriate estimate of the standard error for the paired difference. Under the null hypothesis of no difference, the test statistic, t_d, follows a t-distribution with N – 1 degrees of freedom.

Mann-Whitney U-test

The Mann-Whitney U test is a non-parameter test which evaluates the population medians based on two samples. The procedure also ignores the intra-subject correlation and does not require the assumption that sampling statistic follows the normal distribution. The Mann-Whitney U test statistic is

U = N^{2} + \frac{N (N + 1)}{2} + R_{1}

(11)

where R₁ is the sum of the ranks, based on the entire combined sample, associated with just the first sample.

In both the t-test and the Mann-Whitney U-test, as well as the GEE model, there are a total K test statistics and corresponding hypothesis tests. A multiplicity adjustment should be applied in order to control the total type I error rate or the false discovery rate.

Assumption Adequacy Averaging

The concept of Assumption Adequacy Averaging (AAA) was proposed by Pounds and Rai¹⁸ as a technique for developing more robust methods that incorporate assessments of assumption adequacy into the analysis. The technique utilized empirical Bayesian principles described by Efron et a.,¹⁹ as well as Pounds and Morris,²⁰ to develop a method that averages the results from different testing procedures with weights determined by tests of assumption adequacy. The method combines results from the classical t-test and rank-sum tests with weights determined by the Shapiro-Wilk’s test to assess the normality assumption.

Sample Size Justification

Many of these studies are designed on ad-hock basis. Experimentations are not, usually, planned before, like in clinical trials. However, post analyses justification for sample size is essential. In high-throughput data analyses, where number of hypotheses exceeds excessively, one of the primary objective is to have a high probability of declaring a hypothesis (such as a miRNA) to be significant (differentially expressed) if they are truly significant (truly expressed), while keeping the probability of making false declarations low. There are two approaches for controlling the error rates: False Discovery Rate (FDR) and Family-Wise Error Rate (FWER). Following Benjamini and Hochberg (1995), the FDR is the expected value of the proportion of the non-prognostic genes (in our case miRNAs) among the discovered genes (in our case miRNAs). We will use FDR approach for sample size justification.

In our motivating example we have three repeat measurements. We plan to compare overall effect and two pairwise comparisons. The pairwise comparison is based on a paired t-test. It is better to use a two-sided test as we do not have prior information of a miRNA being down regulated or up-regulated, although this requires a slightly bigger sample size.

Adjusted significance level

Following Chow et al. (2008), the adjusted significance level is given as

α^{*} = \frac{r_{1} f}{m_{0} (1 - f)}

In the above expression r₁ is the desired number of alternative hypothesis (# of miRNAs to be discovered) to be declared significant at f false discovery rate from m total hypotheses (total # of miRNAs), with m₁ potentially alternative hypotheses (potentially significant # of miRNAs) and m₀ (= m – m₁) null hypotheses (not significantly expressed miRNAs). Once level of significance is determined it is straight forward to determine design parameters (power, effect size or sample size).

Demonstration to Motivating example

Based on our experience and collaboration, we expect m1= 40 (around 10% of 384) of which we expect to identify about r1=10 miRNAs. The resulting adjusted significance level will be 0.0015 and 0.0032 at FDR=5% and 10%, respectively. Using a one sided, one-sided paired t test, with n=9 for 80% power, and significance level of 0.0015 and 0.0032, we can detect effect sizes of 1.69 SD (standard deviation) units and 1.52 SD, respectively. Assuming equal variances in the repeat measures, we can identify unit fold (ratio of means), a quantitiy most commonly used in the collaborative research, 2.69 fold for up-regulated or 0.37 fold down regulated at FDR level of 5%.

Results

The methods described above were applied to the Ct values resulting from the motivating example. The first item to consider is the data quality resulting from the qPCR experiment. One component of the analysis considered a comparison between the expression values from the left ventricle at the time of explant (ELV) to the expression values from the right ventricle at the time of explant (ERV). There were nine subjects with the PCR performed on the same platform. miRNAs with missing values on 13 or more of the 18 plates (or more than 72.2%) were excluded from further analysis.

A similar approach was used to reduce the number of Ct values greater than 35 in an analysis that considered the three samples (IMP, ELV, and ERV) in one model. miRNAs were excluded from future analysis if the Ct values were missing from 19 or more of the 27 plates (or more than 70.4%). After filtering, each individual contributes exactly the same miRNAs at each of the three time points resulting in balanced repeated design. Table 1 reports the percentage of the plates with Ct values larger than 35 for the different comparisons before and after filtering. In each case, the percentage of reasonable Ct values less than 35 increased.

Table 1.

Effect of Filtering on Percentage of Ct Values Deemed Undetermined

Experiment	Data	Before Filtering	After Filtering
ELV vs ERV	Ct values ≤ 35	61.1%	90.2%
	Ct values > 35	35.9%	9.8%
IMP vs ELV vs ERV	Ct values ≤ 35	69.5%	90.0%
	Ct values > 35	39.5%	10.0%

Open in a new tab

Notes: Abbreviations: – Cycle to Threshold, HF- end-stage heart failure, VAD - Left Ventricle Assist Device, GEE – Generalized Estimating Equations, IMP – the sample taken from the left ventricle at time of implant, ERV – the sample taken from the right ventricle at time of explant, ELV – the sample taken from the left ventricle at the time of the explant.

The four different normalization techniques were applied to the miRNA values after the filtering. The cumulative distribution associated with the various normalization techniques is depicted in Figure 1. The quantile normalization performs the best based on the reduction of the coefficient of variation since the graph goes to 1 at the fastest rate. The other methods introduce more variation than seen in the raw data. The endogenous controls were carefully selected for the delta-Ct method based on a review of literature including analysis conducted by Applied Biosystems.¹⁷ The stability of the controls was accessed as well and the average of the triplicates was used as the Ct_e.

Cumulative Distribution of the Coefficient of Variation

The effects of the normalization techniques are depicted in Figures 2 and 3. Figure 2 displays the density estimates of the raw Ct values for each plate. We can see the distributions appear to follow a normal distribution with a small deviation near 40. The small deviation is caused by the number of miRNAs with Ct values of 40 which implies a Cycle to Threshold value was not determined. It also supports the choice of the quantile normalization since all of the curves are very similar in shape, location, and scale. Figure 3 displays the density estimates based on the quantile normalized Ct values. There are still 27 curves (or one for each plate) but the normalization technique forces each distribution to be nearly identical. The results appear to be extremely normal but the additional small deviation from the normal curve still appears near the Ct value 40.

Density Estimate of Ct Values for Each Plate over all miRNAs - No Normalization

Density Estimate of Ct Values for Each Plate over all miRNAs - Quantile Normalization

The final issue to consider during the analysis was the appropriate hypothesis testing procedure. A t-test, Mann-Whitney U test, and a testing procedure proposed by Pounds and Rai¹⁸ were considered as well as a model based on the GEE approach.

The distribution of the adjusted p-values is depicted in Figures 4 and 5. The False Discovery Rate described by Benjamini and Hochberg⁹ was used. We considered an adjusted p-value smaller than 5% as statistically significant. Figure 4 displays the distribution of the adjusted p-values resulting from the contrast comparing the average Ct values from the left ventricle at the time of LVAD implantation (IMP) with the average Ct values from the left ventricle at the time of heart transplant (ELV). There are many miRNAs with significantly different expression values. The GEE model was also used to compare the expression values between the ELV and ERV but there are not as many significantly expressed miRNAs. The results are displayed in Figure 5.

Histogram of P-values of Comparison between IMP and ELV Based on GEE.

Histogram of P-values of Comparison between IMP and ERV Based on GEE.

Figure 6 contains the histogram of FDR adjusted p-values resulting from a comparison between the IMP and ELV based on the paired t-test. We can see there are fewer significantly different miRNAs than in the same comparison based on the GEE model. Although the t-test is paired, it does not have the ability to incorporate the additional correlation. The distribution of p-values associated with the comparison between the ELV and ERV time periods based on the paired t-test is similar to Figure 6. The figure containing the distribution is not included in the manuscript.

Histogram of P-values of Comparison between IMP and ELV based on t-test.

The results from the analysis utilizing the AAA methodology, as well as the Mann-Whitney U test results, are not included here since the distribution of adjusted p-values is similar to the previously discussed t-test.

Discussion

In reviewing the current methodologies, there are statistical models that incorporate the intra-subject correlation created by repeated measurements on the same individuals. The GEE model is a popular method that incorporates the additional correlation. In the exploratory analysis, it is apparent the GEE model also results in additional significantly expressed miRNAs than the paired t-test. Although the statistical model incorporates the additional correlation, what affect does the normalization technique have on the naturally occurring correlation structure? Should the effect be of concern to the analyst? The delta-Ct and the mean normalization techniques shift the mean of the expression value of each plate thus preserving the original correlation structures. The quantile normalization is not a simple shift of the center but actually changes the distribution of the Ct values, resulting in a different correlation structure than naturally occurring. The methodology also reduces the variance based on the coefficient of variation. Based on the readily available software and the cumulative distribution of the coefficient of variation, it appears the quantile normalization technique is the best choice. What affect does the normalization technique have on the GEE model and the resulting significantly expressed genes? How does the effect compare to the shift of center normalization procedures that do not reduce the variation in the Ct values? Should the analysis be performed on the raw, unnormalized data?

The topics discussed require a method to simulate the correlated Ct values. Once the initial problem is solved, then one must evaluate the various combinations of normalization procedures with hypothesis testing procedure to determine the impact on the results.

Conclusions

The motivating example brings to the surface many important questions facing analyst of miRNA Ct values. The increased accuracy and reproducibility of the qPCR methods imply that more researchers are turning to the technology. The experimental designs are becoming more advanced including repeated biologic sampling from the same individuals over time. In general, it is important that the analytic techniques consider the complexities of the experimental design in order to fully understand the results.

The analysis presented is not unique but rather the presentation of an analysis from data preparation, through normalization, and hypothesis testing is. The analysis emphasizes the importance of considering the experimental design. The theoretical formulation of many popular methods is also now contained in one place. In general, it is important that the analysis techniques consider the complexities of the experimental design in order to fully understand the results.

The analysis also brings up several important research questions. The questions represent some of the ideas currently under investigation in order to determine the most appropriate analysis methods for miRNA Ct values obtained from experiments collecting multiple samples on the same individuals over time. The sharing of such information will aid future researchers with the analysis of qPCR Ct values using freely available software and methods.

Acknowledgments/Disclosure

We thank Dr. D. Miller for advancing free use of these methods with future publication of the R routine and manual for public use at http://browncancercenter.louisville.edu/biostats. SN Rai was partly supported with Wendell Cherry Chair in Clinical Trial Research Fund and HE Ray was partly supported by Kennesaw State Univeristy. SD Prabhu was partially supported by VA Merit Award, NIH grant HL-99014, UofL CEGIB Director’s Award, and the UofL Department of Medicine Bales Fund (S.D.P.). T Hamid was partially supported by an AHA SDG award 0835456N.

Appendix

Typically, the delta-Ct normalization technique is derived through the ration of the target gene efficiency (ET) raised to the power

Δ C t_{T} = C t_{T} - C t_{e}

(A.1)

and the reference gene efficiency (ER) raised to the power

Δ C t_{R} = C t_{R} - C t_{e}

(A.2)

where Ct_T is the Ct value for the target group and Ct_R is the Ct values for the reference group (or control group). The ratio is

\frac{E_{T}^{Δ C t_{T}}}{E_{R}^{Δ C t_{R}}} .

(A.3)

If the PCR amplification efficiency achieves the maximum value then both E_T = E_R = 2 and the ratio is written as

2^{- Δ Δ Ct}

(A.4)

where -ΔΔCt = ΔCt_T - ΔCt_R. The first step, or

Δ C t = C t - C t_{e}

(A.5)

Is a normalization technique using the endogenous controls as the reference.

References

1.Ryan BM, Robles AI, Harris CC. Genetic variation in microRNA networks: the implications for cancer research. Nat Rev Cancer. 2010;10:389–402. doi: 10.1038/nrc2867. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.van Rooij E, Marshall WS, Olson EN. Toward MicroRNA-Based therapeutics for heart disease: the sense in antisense. Circ Res. 2009;103:919–928. doi: 10.1161/CIRCRESAHA.108.183426. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Ach RA, Wang H, Curry B. Measuring microRNAs: comparisons of microarray and quantitative PCR measurements, and of different total RNA prep methods. BMC Biotechnol. 2009;8:69. doi: 10.1186/1472-6750-8-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Chen Y, Gelfond JA, McManus LM, et al. Reproducibility of quantitative RT-PCR array in miRNA expression profiling and comparison with microarray analysis. BMC Genomics. 2009;10:407. doi: 10.1186/1471-2164-10-407. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Pradervand S, Weber J, Thomas J, et al. Impact of normalization on miRNA microarray expression profiling. RNA. 2009;15:493–501. doi: 10.1261/rna.1295509. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative CT method. Nat. Protocols. 2008;3:1101–1108. doi: 10.1038/nprot.2008.73. [DOI] [PubMed] [Google Scholar]
7.Yuan JS, Reed A, Chen F, et al. Statistical analysis of real-time PCR data. BMC Bioinformatics. 2006;7:85. doi: 10.1186/1471-2105-7-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Schonrock N, Ke YD, Humphreys D, et al. Neuronal MicroRNA deregulation in response to Alzheimer’s Disease Amyloid. PLoS One. 2010;5:6. doi: 10.1371/journal.pone.0011070. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Melkamu T, Zhang X, Tan J, et al. Alteration of microRNA expression in vinyl carbamate-induced mouse lung tumors and modulation by the chemopreventive agent indole-3-carbinol. Carcinogenesis. 2010;31:252–258. doi: 10.1093/carcin/bgp208. [DOI] [PubMed] [Google Scholar]
10.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statistical Society. Series B. 1995;57:289–300. [Google Scholar]
11.Montenegro D, Romero R, Pineles BL, et al. Differential expression of microRNAs with progression of gestation and inflammation in the human chorioamniotic membranes. Am J Obstet Gynecol. 2007;197:289, e1–6. doi: 10.1016/j.ajog.2007.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–1060. [PubMed] [Google Scholar]
13.Mestdagh P, Van Vlierberghe P, De Weer A, et al. A novel and universal method for microRNA RT-qPCR data normalization. Genome Biol. 2009;10:R64. doi: 10.1186/gb-2009-10-6-r64. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001;25:402–408. doi: 10.1006/meth.2001.1262. [DOI] [PubMed] [Google Scholar]
15.Bolstad BM, Irizarry RA, Astrand M, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–93. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
16.Tseng GC, Oh MK, Rohlin L, Liao JC, et al. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001;29:2549–2557. doi: 10.1093/nar/29.12.2549. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Wong L, Lee K, Russell I, et al. Endogenous controls for real-time quantitation of miRNA using TaqMan microRNA assays. Applied Biosystems. 2010 09/03/2011. [Google Scholar]
18.Pounds S, Rai SN. Assumption adequacy averaging as a concept for developing more robust methods for differential gene expression analysis. Comput Stat Data An. 2009;53:1604–1612. doi: 10.1016/j.csda.2008.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of a microarray experiment. J. Amer. Statistical Assoc. 2001;96:1151–1160. [Google Scholar]
20.Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003;19:1236–1242. doi: 10.1093/bioinformatics/btg148. [DOI] [PubMed] [Google Scholar]
21.Chow SC, Shao J, Wang Hansheng. Sample Size Calculations in Clinical Research. 2nd Edition CRC Press; Boca Raton, Florida: 2008. [Google Scholar]

[R1] 1.Ryan BM, Robles AI, Harris CC. Genetic variation in microRNA networks: the implications for cancer research. Nat Rev Cancer. 2010;10:389–402. doi: 10.1038/nrc2867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.van Rooij E, Marshall WS, Olson EN. Toward MicroRNA-Based therapeutics for heart disease: the sense in antisense. Circ Res. 2009;103:919–928. doi: 10.1161/CIRCRESAHA.108.183426. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Ach RA, Wang H, Curry B. Measuring microRNAs: comparisons of microarray and quantitative PCR measurements, and of different total RNA prep methods. BMC Biotechnol. 2009;8:69. doi: 10.1186/1472-6750-8-69. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Chen Y, Gelfond JA, McManus LM, et al. Reproducibility of quantitative RT-PCR array in miRNA expression profiling and comparison with microarray analysis. BMC Genomics. 2009;10:407. doi: 10.1186/1471-2164-10-407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Pradervand S, Weber J, Thomas J, et al. Impact of normalization on miRNA microarray expression profiling. RNA. 2009;15:493–501. doi: 10.1261/rna.1295509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative CT method. Nat. Protocols. 2008;3:1101–1108. doi: 10.1038/nprot.2008.73. [DOI] [PubMed] [Google Scholar]

[R7] 7.Yuan JS, Reed A, Chen F, et al. Statistical analysis of real-time PCR data. BMC Bioinformatics. 2006;7:85. doi: 10.1186/1471-2105-7-85. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Schonrock N, Ke YD, Humphreys D, et al. Neuronal MicroRNA deregulation in response to Alzheimer’s Disease Amyloid. PLoS One. 2010;5:6. doi: 10.1371/journal.pone.0011070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Melkamu T, Zhang X, Tan J, et al. Alteration of microRNA expression in vinyl carbamate-induced mouse lung tumors and modulation by the chemopreventive agent indole-3-carbinol. Carcinogenesis. 2010;31:252–258. doi: 10.1093/carcin/bgp208. [DOI] [PubMed] [Google Scholar]

[R10] 10.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statistical Society. Series B. 1995;57:289–300. [Google Scholar]

[R11] 11.Montenegro D, Romero R, Pineles BL, et al. Differential expression of microRNAs with progression of gestation and inflammation in the human chorioamniotic membranes. Am J Obstet Gynecol. 2007;197:289, e1–6. doi: 10.1016/j.ajog.2007.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–1060. [PubMed] [Google Scholar]

[R13] 13.Mestdagh P, Van Vlierberghe P, De Weer A, et al. A novel and universal method for microRNA RT-qPCR data normalization. Genome Biol. 2009;10:R64. doi: 10.1186/gb-2009-10-6-r64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001;25:402–408. doi: 10.1006/meth.2001.1262. [DOI] [PubMed] [Google Scholar]

[R15] 15.Bolstad BM, Irizarry RA, Astrand M, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–93. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]

[R16] 16.Tseng GC, Oh MK, Rohlin L, Liao JC, et al. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001;29:2549–2557. doi: 10.1093/nar/29.12.2549. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Wong L, Lee K, Russell I, et al. Endogenous controls for real-time quantitation of miRNA using TaqMan microRNA assays. Applied Biosystems. 2010 09/03/2011. [Google Scholar]

[R18] 18.Pounds S, Rai SN. Assumption adequacy averaging as a concept for developing more robust methods for differential gene expression analysis. Comput Stat Data An. 2009;53:1604–1612. doi: 10.1016/j.csda.2008.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Efron B, Tibshirani R, Storey JD, et al. Empirical Bayes analysis of a microarray experiment. J. Amer. Statistical Assoc. 2001;96:1151–1160. [Google Scholar]

[R20] 20.Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003;19:1236–1242. doi: 10.1093/bioinformatics/btg148. [DOI] [PubMed] [Google Scholar]

[R21] 21.Chow SC, Shao J, Wang Hansheng. Sample Size Calculations in Clinical Research. 2nd Edition CRC Press; Boca Raton, Florida: 2008. [Google Scholar]

PERMALINK

Statistical Analysis of Repeated MicroRNA High-Throughput Data with Application to Human Heart Failure: A Review of Methodology

Shesh N Rai

Herman E Ray

Xiaobin Yuan

Jianmin Pan

Tariq Hamid

Sumanth D Prabhu

Abstract

Introduction

Motivating Example

Statistical Methodologies

Data Quality

Normalization

Delta-Ct

Mean Normalization

Quantile Normalization

Rank Invariant Normalization

Coefficient of Variation

Hypothesis Testing

GEE Model

t-test

Mann-Whitney U-test

Assumption Adequacy Averaging

Sample Size Justification

Adjusted significance level

Demonstration to Motivating example

Results

Table 1.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Discussion

Conclusions

Acknowledgments/Disclosure

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases