Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 13.
Published in final edited form as: Res Synth Methods. 2020 Sep 13;12(2):148–160. doi: 10.1002/jrsm.1442

Estimating the Reference Range from a Meta-Analysis

Lianne Siegel 1,*, M Hassan Murad 2, Haitao Chu 1,
PMCID: PMC7881056  NIHMSID: NIHMS1634740  PMID: 32790064

Abstract

Often clinicians are interested in determining whether a subject’s measurement falls within a normal range, defined as a range of values of a continuous outcome which contains some proportion (e.g. 95%) of measurements from a healthy population. Several studies in the biomedical field have estimated reference ranges based on a meta-analysis of multiple studies with healthy individuals. However, the literature currently gives no guidance about how to estimate the reference range of a new subject in such settings. Instead, meta-analyses of such normative range studies typically report the pooled mean as a reference value, which does not incorporate natural variation across healthy individuals in different studies. We present three approaches to calculating the normal reference range of a subject from a meta-analysis of normally or lognormally distributed outcomes: a frequentist random effects model, a Bayesian random effects model, and an empirical approach. We present the results of a simulation study demonstrating that the methods perform well under a variety of scenarios, though users should be cautious when the number of studies is small and between-study heterogeneity is large. Finally, we apply these methods to two examples: pediatric time spent awake after sleep onset and frontal subjective postural vertical measurements.

Keywords: Reference range, meta-analysis, prediction interval, normative data, random effects model

1. Introduction

The number of published meta-analyses has increased sharply over the past several decades1,2. While most meta-analyses aim to provide a more precise estimate of the effect of a treatment or a risk factor’s association with a disease1, the literature has many examples of meta-analyses of normative data313. These studies generally aim to establish “typical” or “normal” values for a measurement or outcome using “healthy” populations from multiple studies, to serve as a reference. However, most often these meta-analysis studies report the pooled mean as the “reference value,” which has limited interpretability when determining whether a measurement is “normal”. Although Bohannon5 noted that measurements lying outside the confidence interval for the pooled mean could be considered above or below “average”, a reference range would be more useful in determining whether an observed measurement was within the range of values measured on healthy individuals. Horn et al.14 define a reference range or interval as “a set of values within which some percentage, 95% for example, of the values of a particular analyte in a healthy population would fall.” In a meta-analysis, this requires accounting for the natural variability in the healthy population as reflected by variation both within and between studies.

Several medical systematic reviews have estimated and reported reference ranges using a meta-analysis of healthy individuals from multiple studies3,7,10,12,1518. However, some of these studies have used the confidence interval for the pooled mean as the “reference range” 3,16,17, which reflects uncertainty in the estimated mean, not natural variation in the population. Venner et al.18 used the measurement ranges reported in each study when available to construct reference ranges based on the overall minimum and maximum values across studies. While this better reflects natural variation across healthy individuals than the confidence interval for the pooled mean, only four out of the twelve studies included in the meta-analysis reported ranges, and this method requires setting the desired percentage of individuals captured in the reference range to 100%.

However, several studies estimate reference ranges containing a specified proportion of measurements from a healthy target population based on the observed mean, standard deviation, and sample size from each study7,10,12,15. Conceição et al.7 use a method similar to the empirical approach proposed later in this paper in order to estimate normal ranges for how accurately healthy participants perceive whether they are oriented vertically in space. Wyman et al.12 use the fixed effects model by Laird and Mosteller19 in order to establish normative ranges for non-invasive bladder function measurements in healthy women. Nemeth et al.10 estimate a reference range for normal concentrations of asymmetric dimethylarginine in the plasma of healthy individuals, though their method for estimating the marginal reference range across all studies is not clear. Finally, Khoshdel et al.15 simulate individual patient data based on the summary statistics from each study, then use fractional polynomials to estimate age-specific reference ranges for pulse wave velocity.

It is unknown what proportion of measurements from the “true” overall populations these reference ranges capture. Currently, the literature gives no guidance on how to approach the question of estimating reference ranges based on meta-analyses. Several authors have recently advocated reporting prediction intervals for a new study2023, but they have not addressed prediction for an individual. To the best of our knowledge, the present paper is the first to propose methods for estimating reference ranges based on meta-analyses.

The first two proposed methods build on the commonly used random effects model. Section 2 motivates this choice and introduces notation. Section 3 proposes three approaches for estimating reference ranges. The first uses results from a frequentist method such as restricted maximum likelihood estimation (REML), while the second is a Bayesian approach using a posterior predictive interval. The final method is an empirical approach similar to that used by Conceição et al.7. We call these the frequentist, Bayesian, and empirical approaches. All three of these approaches use the means, standard deviations, and sample sizes reported in each study and do not require individual patient data. Section 4 presents simulation studies illustrating the performance of the three approaches, which we then apply to two examples in Section 5. Finally, Section 6 discusses some of the key distributional assumptions required by these approaches and potential areas of future work.

2. Random effects model

2.1. Choice of model

Three models are commonly used in meta-analysis: the common effect, random effects, and fixed effects models. The common effect model assumes the underlying true mean or effect is the same in each study and that variation between studies in estimated mean arises purely from sampling variation24,25. This is often called a fixed effect model, which is easily confused with the fixed effects model of Laird and Mosteller19. We follow Bender et al.24 by using the term “common effect model”. This model imposes a strong assumption that each study population has the same underlying true mean, which may not be appropriate for situations in which the observed means from the studies differ for reasons other than sampling variability26. When there is considerable heterogeneity between studies, as is often the case in meta-analyses of continuous outcomes27, it may instead be desirable to assume that each study has a different true mean and that these means are drawn from separate distributions, as in the fixed effects model of Laird and Mosteller19. However, because this model makes no assumptions about how the effects in the different studies are related, it may not be used to draw any conclusions about a new study measuring the same outcome, much less about a new individual. Therefore, we will instead focus on the random effects model. This model allows the true means to differ between study populations but assumes they follow some underlying distribution, which is most often a normal distribution24,25. It is common to interpret this assumption as meaning that each study in the meta-analysis was randomly sampled from a population of theoretically possible studies, including the population studied and methods of measurement. However, Higgins et al.28 point out that this stronger assumption is often violated, as later studies are often designed based on the results of previous studies. However, Higgins et al.28 focus on random effects models comparing treatment effects in two groups, while we focus on estimating normal ranges from a group of healthy subjects. Thus, this assumption may be reasonable.

2.2. Notation

Let yi denote the observed mean for study i = {1,…,k}, θi be study i’s true mean, μRE be the overall mean of the distribution of study means, and σi2 be study i’s within-study variance. Also, let τ2 be the variance of the θi across studies. Then, we have

y¯iN(θi,σi2/ni),θiN(μRE,τ2) (1)

In the frequentist framework, the overall mean μRE is traditionally estimated as a weighted average of the study-specific means24,29

μ^RE=i=1kyiωi,REi=1kωi,RE,forωi,RE=1si2/ni+τ^2, (2)

where si2 is study i’s within-study sample variance though there has been some debate about how to estimate τ2. Here, we use the restricted maximum likelihood (REML) estimate, as implemented in the meta package30. The commonly-used estimate originally proposed by DerSimonian and Laird29 has been shown to underestimate the true between-study variance, particularly when the number of studies k is small3133. The overall variance in μ^RE can be estimated by VRE=1i=1kωi,RE. The following is commonly used as an α-level confidence interval for μ^RE:μ^RE±zα/2×VRE, where zα/2 is the standard normal critical value for the chosen significance level (α).

Alternatively, one can take a Bayesian approach and place prior distributions on μRE and τ as described in Section 3.228,34. Because we consider the fixed mean assumption in the common effect model inappropriate in most situations, we use a random effects model for the two model-based approaches presented below. However, one could easily alter these methods to reflect a common effect assumption.

3. Methods for estimating the reference range from a meta-analysis

In estimating a 95% normal reference range, we aim to find an interval that contains approximately 95% of individuals in the target population35,36. Because the models described in Section 2 only allow inference on the pooled mean, we need additional methods and assumptions to estimate the 95% normal reference range for an individual. First, we must make an assumption about the distribution of the data within each study, assuming we do not have access to the study’s individual patient data (IPD). In this paper, we assume the individual-patient data in each study were generated from either a normal or log-normal distribution, with the family of distribution being consistent across all studies. We present three approaches to estimating the normal reference range: a frequentist approach, a fully Bayesian approach, and an empirical approach. The frequentist and Bayesian methods assume the within-study distributions have the same variance in all studies, while the empirical approach does not. We present each of our three proposed methods under normality, and then show how to apply them under a log-normality assumption.

3.1. A frequentist approach

Under the random effects model, if we assume observations within each study are normally distributed, the within-study variances are the same in all studies, and the study-specific means follow a normal distribution, then we have yijN(μRE,σT2), where σT2=τ2+σ2 and σ2 is the common within-study variance. We can estimate μ^RE and τ^ as described in Section 2, e.g., using REML, and estimate σ2 as the unbiased pooled sample variance:

σ^2=i=1k(ni1)si2i=1k(ni1). (3)

Substituting μ^RE, τ^2, and σ^2 into the marginal distribution of yij, the marginal distribution of individuals, marginal to studies, can be estimated as N(μ^RE,σ^T2), where σ^T2=τ^2+σ^2. The α/2 and 1 − α/2 percentiles of this distribution can then be taken as the bounds of the α-level normal reference range: μ^RE±z1α/2σ^2+τ^2. Because σ^2 and τ^2 were estimated, this suggests that a t-distribution may be appropriate instead of a normal. However, most meta-analyses will have a large enough total sample size across studies that the appropriate f-distribution will be closely approximated by a normal distribution. Alternatively, the fully Bayesian method presented in Section 3.2 accounts for the uncertainty in σ^2 and τ^2.

3.2. A Bayesian approach

A fully Bayesian approach places prior distributions on μRE and τ. As in the frequentist approach, we assume that the true variances are the same in all studies, and now we use the normal-theory sampling distribution of the sample variance to capture uncertainty about the within-study variance σ2, according to this model:

y¯iN(θi,σ2/ni)θiN(μRE,τ2)(ni1)si2gamma(ni12,12σ2). (4)

We place a N(0,1000) prior on μRE and Unif(0,100) priors on τ and σ, then sample from the posterior predictive distribution for a new individual to incorporate into the normal reference range uncertainty about each of the parameter estimates:

yneωN(μRE,σ2+τ2), (5)

where the predictive density of ynew given the data {yij} is given by:

f(yneω|{yij})=f(yneω|μRE,σ2,τ2)f(μRE,σ2,τ2|{yij})dμREdσ2dτ2 (6)

The limits of the α-level normal reference range can then be estimated by the α/2 and 1 − α/2 percentiles of ynew’s predictive distribution.

3.3. An empirical approach

The third approach is a simple empirical approach that does not assume the studies all have the same within-study variances and does not specify the distribution of yij within each study. However, like the frequentist approach in Section 3.1, it does not account for estimation uncertainty and assumes the population captured across all studies follows a normal distribution. First, estimate the overall mean across all studies, weighted by study sample size:

μ^emp=i=1Nniy¯ii=1Nni. (7)

This is equivalent to the pooled mean in Laird and Mosteller’s19’s fixed effects model, weighted by sample size. Then estimate the marginal variance across studies using the conditional variance formula Var(Y) = E[Var(Yij|S = i)] + Var[E(Yij|S = i)]:

σ^T,emp2=i=1N(ni1)si2i=1N(ni1)+i=1N(ni1)(y¯iμ^)2i=1N(ni1) (8)

The limits of the α-level normal reference range are then given by the α/2 and (1 − α/2) percentiles of a N(μ^emp,σ^T,emp2) distribution: μ^emp±z1α/2×σ^T,emp2. Conceição et al.7 used this method but weighted by n rather than n − 1 in the variance calculation. We prefer the unbiased estimate of the variance, but weighting by n will generally give similar results.

3.4. Lognormal distribution for yij

Each of the above methods can also be applied when each study’s observations are assumed to be drawn from a lognormal(θi,σi2) distribution, so that log(yij)N(θi,σi2). In this case, first transform the observed study means and sample variances to the log scale using Equation (9) before estimating the reference range:

y¯i*=log(y¯i1+si2/y¯i2)si2*=log(1+si2y¯i2). (9)

This transformation uses the method of moments estimators for the location and scale parameters of the lognormal distribution. For more details, see the Appendix. The normal reference range can then be estimated as before, substituting y¯i* and si* for the observed study-level means and standard deviations. Finally, exponentiate the limits of the resulting range to give the normal reference range: (eμ^z1α/2σ^T,eμ^+z1α/2σ^T). This method requires that the y¯i*’s be normally distributed, an assumption that should be checked using a method such as a Q-Q plot. Depending on the distribution of y¯i, the distribution of the y¯i*’s can be quite skewed.

4. Simulations

4.1. Methods of simulation

To assess how well each of the three methods captures a true 95% normal reference range, we conducted simulations under a variety of different conditions. In all conditions, we assumed the true distributions within studies were normal and that the true study-specific means varied according to the random effects model in Section 2. For each condition, we then considered different values of true between-study variation (τ2) as a proportion of total variability (τ2 + σ2). In all conditions, each study had 50 subjects, with the total number of studies (N) being 5, 10, 20, or 30. The overall pooled mean (μ) was set to 8 and the true total variance (σ2 + τ2) was 1.25 for all conditions. We conducted 1000 simulations for each condition. We considered scenarios where the true study-level variances were equal, as well as cases where they were not equal. For the frequentist approach, we used the R package metagen37 to fit the random effects model using REML. For the Bayesian approach, we used JAGS version 4.3.0 with the packages rjags38 and coda39, in R version 3.6.040. We ran two chains each with 10,000 samples and after discarding 1,000 samples for burn-in.

4.1.1. Equal variances

For the equal variance scenario, we first generated the true study-level means (θi’s) according to a N(μ, τ2) distribution. For each study i, we then generated the individual-level data according to a N(θi, σ2) distribution, where σ2 was constant across studies. We summarized the means and standard deviations for each study to give the observed summary data, fit each of the three models, frequentist, Bayesian, and empirical, then found the area under the probability density function of a N(μ, σ2 + τ2) distribution between the upper and lower limits of the estimated 95% normal reference ranges.

4.1.2. Unequal variances

Data were simulated in the unequal variance scenario as in the equal variance scenario except that we generated σi, the true within-study standard deviation, from a doubly-truncated normal distribution, with both the left truncation point and mean equal to X and the right truncation point equal to X + 1, for X ranging from 0 to 0.64, with increments of 0.02. For each X, we estimated E[σi2] by simulating from the doubly-truncated normal distribution. These estimates ranged from 0.291 to 1.246. We let τ2=1.25E^[σi2] so as X increased, E^[σi2] increased as a proportion of the total variance. Because we truncated the normal distribution, the variance of σi remained constant throughout all conditions. We approximated the true reference distribution for yij by simulating from the full conditional distributions: σi|X, θi|μ, σi, and yij|θi, σi, τ.

4.2. Simulation results

We first generated the data under the equal within-study variance scenario and measured the fraction of the true population distribution captured by each of the three reference range methods, which we call the “coverage” (Figure 1). For example, when the between study variance comprised 25% of the overall variance, the frequentist reference ranges based on 1000 simulated meta-analyses containing 30 studies captured a median of 94.9% of a N(μ, σ2 + τ2) distribution, the true distribution of individual measurements. This can also be interpreted as the frequentist reference ranges excluding a median of 5.1% of extreme values. The observed median and variability of coverage depended on the true ratio of between-study variance (τ2) to total variance (τ2 + σ2) and the number of studies N included in the meta-analysis. For the frequentist and empirical methods, the median coverage decreased as τ2 increased as a fraction of the total variance; this decrease was most pronounced when the number of studies included was small (N = 5 or 10). In these conditions, the empirical method’s coverage decreased more quickly than the frequentist method’s. Also, variation in coverage increased as τ2 increased and decreased as N increased. While the variation in coverage also increased with τ2 for the Bayesian posterior predictive interval, this effect was less dramatic. In contrast with the frequentist and empirical methods, the Bayesian method’s median coverage increased with τ2. This increase began for smaller τ2 and was more extreme for small N. This increase in variation with τ2 appears to reflect the additional estimation uncertainty when τ2 is large, particularly when N is small. Unlike the frequentist and empirical methods, the Bayesian method accounts for posterior uncertainty about each parameter and thus appears more conservative. The results for the unequal within-study variances case were qualitatively similar to the equal variance case (Figure 2).

Figure 1: Simulation Results, Equal Variances.

Figure 1:

Median, 2.5th percentile, and 97.5th percentile of the proportion of the true population distribution captured by the estimated 95% reference range, for different numbers N of studies. The horizontal axis is τ2 as a proportion of the total variance.

Figure 2: Simulation Results, Unequal Variances.

Figure 2:

Median, 2.5th percentile, and 97.5th percentile of the proportion of the true population distribution captured by the estimated 95% reference range, for different numbers N of studies. The horizontal axis is τ2 as a proportion of the total variance.

5. Examples

5.1. Example 1: Pediatric Nighttime Sleep

Galland et al.9 sought to establish “reference values” for pediatric nighttime sleep outcomes measured by actigraphy, based on a systematic review and subsequent meta-analysis of 79 studies. We focus on the outcome wake after sleep onset (WASO) time in hours. The authors found 24 studies reporting WASO, with most participants belonging to the same age group (9–11 years) so they focused on the pooled mean across all age groups. In our review of these studies, one study included in this meta-analysis41 did not appear to actually report WASO but rather reported the average length of wake bouts. We excluded this study from our analysis, which therefore contained 23 studies. Figure 3 shows the pooled mean and corresponding standard errors. In this case, only one of the 95% confidence intervals for the study means overlapped with the point estimate of the pooled mean. Galland et al.9 explain that this variability reflects inconsistency across studies in how waking bouts were defined as well as a low specificity when using actigraphy to identify wakefulness. The authors also used meta-regression to investigate regional differences in sleep as a source of variation but did not observe a difference across study regions. This large variation in estimated WASO time across studies provides further evidence that the pooled mean may not provide a full picture of what constitutes a “normal” WASO time and that a reference range may be more useful. To better visualize the heterogeneity in WASO time within and across studies, we also present frequentist 95% prediction intervals based on a t-distribution for each study in the same figure20,21. Because Galland et al.9 only reported study means and standard errors, we obtained the standard deviations directly from each study’s paper. When the paper did not report the standard deviation, we estimated the standard deviation using the standard error reported by Galland et al.9 and a normal approximation. Therefore, our results should be interpreted as merely an illustration of the proposed methods.

Figure 3: WASO.

Figure 3:

Mean (95% CI) and 95% predictive interval for a new individual for each study, overall estimate of pooled mean (95% CI) based on REML , 95% predictive interval for a new study mean, and 95% reference ranges based on Bayesian, empirical, and frequentist methods.

We checked whether the study means deviated from normality using a QQ-plot (see Supplementary Materials); no apparent departure from normality is observed except a few points at the end of both tails. As in the simulations, we used the R package metagen37 to fit the random effects model using REML. We also used this to estimate the pooled mean across studies and to obtain the prediction interval for a new study28. For the Bayesian approach, we again used JAGS version 4.3.0 with the packages rjags38 and coda39, in R version 3.6.040. We ran two chains each with 50,000 samples and after discarding 5,000 samples for burn-in. Convergence was assessed using MCMC standard error and visual inspection of trace plots.

The estimated 95% normal reference ranges were (−0.47, 2.24), (−0.54, 2.32), and (−0.33, 2.34) for the frequentist, Bayesian, and empirical methods, respectively. We truncated these at zero because negative WASO values are meaningless, giving (0, 2.25), (0, 2.32), and (0, 2.34). Based on the frequentist result, we would expect about 95% of healthy children to have WASO time between 0 and 2.25 hours based on actigraphy. This reflects the large amount of variability between individuals included in the meta-analysis. Before truncation, the Bayesian reference range was widest, followed by the frequentist reference range; the empirical method gave the narrowest interval. This is consistent with simulation results and is likely due to the Bayesian method accounting for uncertainty about the parameters σ, τ, and μ. The code and results for both case study examples are included in the Supplementary Materials.

5.2. Example 2: Frontal SPV

Accurate perception of verticality is an important part of everyday functioning and can be altered in individuals such as “aged people, patients with vestibular disorders, Parkinson’s disease, idiopathic scoliosis, and stroke patients” 7. Accurate perception of verticality has also been associated with better functioning in patients following a stroke42. A person’s subjective postural vertical (SPV) can be measured by placing them in a tilting chair while blindfolded and asking them to tell an examiner how to adjust the chair so they perceive that they are in an upright position. Frontal and sagittal SPV refer to deviation (in degrees) of the specified position from true verticality in the frontal and sagittal planes. Because SPV measurements can be used to assess neurological functioning, it is important to establish a reference range in healthy persons.

In their meta-analysis, Conceigao et al.7 sought to establish reference ranges for frontal and sagittal SPV from 15 studies measuring frontal SPV and 5 studies measuring sagittal SPV. They estimated the reference range using the empirical approach except that they weighted by n rather than n − 1 in the variance calculation. We re-analyze the data for frontal SPV using REML to estimate the pooled mean across all studies and the 95% predictive interval for a new study. We then used the same methods as in Section 5.1 to estimate the three reference ranges.

We again checked for non-normality of the study-means using a Q-Q plot (see Supplementary Materials); no apparent departure from normality is observed except a few points at the upper tail. Figure 4 presents the reference range results as well as the estimated pooled mean and predictive interval for a new study. Conceicao et al.7 estimated the frontal SPV reference range as (−2.87°, 3.11°). The frequentist, Bayesian, and empirical methods gave estimated reference ranges (−2.92°, 3.15°), (−3.07°, 3.20°), and (−2.89°, 3.13°), respectively. As expected, the empirical method’s results are quite similar to those reported by Conceigao et al.7, while the frequentist and Bayesian methods give slightly wider intervals.

Figure 4: Frontal SPV.

Figure 4:

Mean (95% CI) and 95% predictive interval for a new individual for each study, overall estimate of pooled mean (95% CI) based on REML , 95% predictive interval for a new study mean, and 95% reference ranges based on Bayesian, empirical, and methods.

6. Discussion

This paper proposes three methods of estimating reference ranges for an individual from a meta-analysis. The methods are simple to implement and can serve as a starting point for future development. Based on the simulations, all three methods tended to perform best when the number of studies was large and between-study variability was relatively small. However, while the frequentist and empirical methods tended to underestimate the width of the reference range as between-study heterogeneity increased (particularly for small N), the Bayesian posterior predictive interval did not. This is likely because the Bayesian method accounts for estimation uncertainty about the parameters, while the other methods do not. Instead, the posterior predictive interval more often overestimated the width of the interval. Depending on how the reference range is used, one might consider this behavior conservative. We recommend using caution when the number of studies is small, such as 5 or 10. If the number of studies is very small and the estimated between-study variation makes up more than 50% of total estimated variation, it may be more useful to report reference ranges specific to each study, rather than a pooled range.

As for which method might be most appropriate in which circumstance, the simulation results suggest that when the number of studies is large (at least 20), and the normality assumptions hold, the three methods will likely perform similarly. Conceição et al.7 used the empirical approach but weighted by n instead of n − 1. This suggests that the calculations required are intuitive and could easily be implemented by clinicians. The frequentist method, by contrast, requires familiarity with random effects models, although investigators are often interested in the pooled mean and thus likely to use this approach anyway. Finally, the Bayesian predictive interval requires familiarity with Bayesian methods and a software such as JAGS, though it is still a simple model to implement and can account for estimation uncertainty.

Each method makes distributional assumptions beyond those needed when estimating the pooled mean. Besides the usual assumption made when using likelihood methods to analyze random effects models — that the study-means are normally distributed — the frequentist and Bayesian approaches also assume a normal distribution for individuals within a study. While this may appear problematic, prediction intervals for a new observation based on a single study regularly impose this assumption43. Unfortunately, if only study means and standard deviations are available, this assumption cannot be validated. Section 3.4 extended our approaches to allow individuals within studies to be lognormally distributed, but we caution that the transformed means on the log scale must be approximately normally distributed. This paper focuses on meta-analyses in which individual participant data (IPD) are not available; when IPD are available, non-parametric approaches using order statistics may be possible, as they are currently used in non-parametric estimates of reference ranges based on single studies14,44.

Another key assumption of the frequentist and Bayesian methods is that the true within-study variances are the same in all studies and that any observed differences are due to sampling variability. Differences between studies in sampling methods or measurement techniques could render this assumption invalid. However, Section 4.2’s simulation results suggest that these methods may be robust to deviations from this assumption when the true study-specific standard deviations vary between studies according to a truncated normal distribution. Further work is needed to assess the models’ performance under other deviations from this assumption.

Finally, we reiterate that random effects models for estimating the pooled mean, on which we built the frequentist and Bayesian methods, require the study-specific means to be normally distributed. This is true of most random effects methods, except for the method of moments estimator developed by DerSimonian and Laird29, which is known to underestimate between-study variability and therefore give results with inappropriately high precision3133. It is a common misconception that normality of the study means is guaranteed by the central limit theorem (CLT)23. At best, the CLT only ensures that the sampling distribution of an observed average for a single study has an approximate normal distribution, not that the true means of the collected studies follow a normal distribution. One way of assessing departures from this assumption is the use of a normal Q-Q plot.

Although our methods make specific distributional assumptions, they do provide a starting point for additional development. Future work should generalize these methods to address instances where the assumptions used here are likely not met. This could involve cases with or without IPD. Section 4.2’s simulation results show that our approaches work well in cases with many studies and relatively low between-study variability. However, future studies should compare these methods with new methods incorporating IPD. Future methods should also improve performance when the number of studies is small or the between-study variation is large. The proposed methods may also be extended to cases where the data from each study are assumed to follow truncated normal distributions. Finally, these methods could be extended to a meta-regression setting to include characteristics such as age or sex, at either the individual or the study level.

Supplementary Material

Supplemental Material

9. Highlights.

What is already known?

  • Reference ranges allow clinicians to determine whether an individual’s measurement falls within a normal range.

  • Meta-analyses of normative data typically report the estimate of the pooled mean as a reference value. This only provides information on whether an individual might be above or below average.

  • A reference range reflecting variability among individuals would be more useful in clinical practice, but the previous literature gives no guidance on how to estimate this in meta-analysis.

What is new?

  • We present three methods for estimating the reference range for an individual from a meta-analysis. The first uses results from a frequentist method, such as REML, while the second is a Bayesian posterior predictive interval. The third is an empirical approach.

  • Our simulation results suggest that these methods perform best when the number of studies is large and the between-study variability is relatively small. As the between-study heterogeneity increased, the frequentist and empirical methods tended to underestimate the width of the reference range, while the Bayesian posterior predictive interval did not.

  • All three methods assume a normal distribution for individuals within a study. We also extended this approach to allow individuals within a study to follow a log-normal distribution.

  • The frequentist and Bayesian methods both assume equal within-study variances.

  • Our proposed methods provide a starting point for additional development. Future work can generalize these methods to cases where the distributional assumptions are likely not met.

7. Acknowledgements and Funding

The authors gratefully acknowledge the NIH National Heart, Lung, and Blood Institute (T32HL129956) and the NIH National Library of Medicine (R21LM012744, R01LM012982). The authors greatly appreciate the thoughtful comments and suggestions by Professor James Hodges, whose views are not necessarily represented by this manuscript.

Appendix A

A.1. Method of moments estimators for lognormal distribution

In (9), we use the method of moments estimators for the location and scale parameters of the lognormal distribution in order to transform the observed mean and variance to the log scale, where the observations would be normally distributed. Suppose Y = {y1, …, yn} ~ Lognormal(μ, σ2). Then the first two moments of the lognormal distribution are given by45:

E[Y]=eμ+12σ2E[Y2]=e2μ+2σ2. (A.10)

We can then set:

eμ+12σ2=1nj=1nyi=y¯e2μ+2σ2=1nj=1nyj2. (A.11)

Solving for μ and σ2, we have:

μ^MM=log(y¯2y¯2+s2)=log(y¯1+s2/y¯2)σ^MM2=log(y¯2+s2y¯2)=log(1+s2y¯2), (A.12)

where s2=1ni=1n(yiy¯)2.

Therefore for each study i in a meta-analyses, we can let:

y¯i*=log(y¯j1+si2/y¯i2)si2*=log(1+si2y¯i2). (A.13)

We can then treat y¯i* and si2* as approximations of the sample mean and sample variance of the study on the log scale.

Footnotes

8

Data Availability Statement

The data that support the findings in this study are available in Tables 1 and 2 of the Supplementary Materials.

References

  • [1].Haidich AB. Meta-analysis in medical research. Hippokratia. 2010;14 (Suppl 1):9. [PMC free article] [PubMed] [Google Scholar]
  • [2].Niforatos Joshua D., Weaver Matt, Johansen Michael E.. Assessment of Publication Trends of Systematic Reviews and Randomized Clinical Trials, 1995 to 2017. JAMA Internal Medicine. 2019;179(11):1593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Fateh Bazerbachi, Samir Haffar, Zhen Wang, et al. Range of Normal Liver Stiffness and Factors Associated With Increased Stiffness Measurements in Apparently Healthy Individuals. Clinical Gastroenterology and Hepatology. 2019;17(1):54–64. [DOI] [PubMed] [Google Scholar]
  • [4].Poliana do Amaral Benfica, Larissa Tavares Aguiar, Sherindan Ayessa Ferreira de Brito, Helena Nunes Bernardino Luane, Luci Fuscaldi Teixeira-Salmela, Danielli Coelho de Morais Faria Christina. Reference values for muscle strength: a systematic review with a descriptive meta-analysis. Brazilian Journal of Physical Therapy. 2018;22(5):355–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Bohannon Richard W Reference Values for the Timed Up and Go Test: A Descriptive Meta-Analysis. Journal of Geriatric Physical Therapy. 2006;29:64–68. [DOI] [PubMed] [Google Scholar]
  • [6].Bohannon Richard W, Williams Andrews A. Normal walking speed: a descriptive meta-analysis. Physiotherapy. 2011;97(3):182–189. [DOI] [PubMed] [Google Scholar]
  • [7].Conceição Laila B, Baggio Jussara AO, Mazin Suleimy C, Edwards Dylan J, Santos Taiza EG. Normative data for human postural vertical: A systematic review and meta-analysis. PLoS One; San Francisco. 2018;13(9):e0204122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Dodds Richard M, Syddall Holly E, Cooper Rachel, Kuh Diana, Cooper Cyrus, Sayer Avan Aihie. Global variation in grip strength: a systematic review and meta-analysis of normative data. Age and Ageing. 2016;45(2):209–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Galland Barbara C, Short Michelle A, Terrill Philip, et al. Establishing normal values for pediatric nighttime sleep measured by actigraphy: a systematic review and meta-analysis. Sleep. 2018;41(4). [DOI] [PubMed] [Google Scholar]
  • [10].Balázs Németh, Zánó Ajtay, László Hejjel, et al. The issue of plasma asymmetric dimethylarginine reference range - A systematic review and meta-analysis. PLOS ONE. 2017;12(5):e0177493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Juxian Tang, Yihui Lin, Huachao Mai, et al. Meta-analysis of reference values of haemostatic markers during pregnancy and childbirth. Taiwanese Journal of Obstetrics and Gynecology. 2019;58(1):29–35. [DOI] [PubMed] [Google Scholar]
  • [12].Wyman Jean F, Zhou Jincheng, LaCoursiere DY, et al. Normative noninvasive bladder function measurements in healthy women: A systematic review and meta-analysis. Neurourology and Urodynamics. 2020;39(2):507–522. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/nau.24265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Haoming Xu, Maira Fonseca, Zachary Wolner, et al. Reference values for skin microanatomy: A systematic review and meta-analysis of exvivo studies. Journal of the American Academy of Dermatology. 2017;77(6):1133–1144. e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Horn Paul S, Pesce Amadeo J, Copeland Bradley E. A robust approach to reference interval estimation and evaluation. Clinical Chemistry. 1998;44(3):10. [PubMed] [Google Scholar]
  • [15].Ali Reza Khoshdel, Ammarin Thakkinstian, Carney Shane L, John Attia. Estimation of an age-specific reference interval for pulse wave velocity: a meta-analysis:. Journal of Hypertension. 2006;24(7):1231–1237. [DOI] [PubMed] [Google Scholar]
  • [16].Levy Philip T, Aliza Machefsky, Sanchez Aura A., et al. Reference Ranges of Left Ventricular Strain Measures by Two-Dimensional Speckle-Tracking Echocardiography in Children: A Systematic Review and Meta-Analysis. Journal of the American Society of Echocardiography. 2016;29(3):209–225. e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Staessen Jan A, Fagard Robert H, Lijnen Paul J, Thijs Lutgarde, Van Hoof Roger, Amery Antoon K.. Mean and range of the ambulatory pressure in normotensive subjects from a meta-analysis of 23 studies. The American Journal of Cardiology. 1991;67(8):723–727. [DOI] [PubMed] [Google Scholar]
  • [18].Venner Allison A, Doyle-Baker Patricia K, Lyon Martha E, Fung Tak S. A meta-analysis of leptin reference ranges in the healthy paediatric prepubertal population. Annals of Clinical Biochemistry. 2009;46(1):65–72. Publisher: SAGE Publications. [DOI] [PubMed] [Google Scholar]
  • [19].Laird Nan M, Mosteller Frederick. Some Statistical Methods for Combining Experimental Results. International Journal of Technology Assessment in Health Care. 1990;6(01):5–30. [DOI] [PubMed] [Google Scholar]
  • [20].Joanna IntHout, Ioannidis John P. A., Rovers Maroeska M., Goeman Jelle J.. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open. 2016;6(7):e010247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Lifeng Lin. Use of Prediction Intervals in Network Meta-analysis. JAMA Network Open. 2019;2(8):e199735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Riley Richard D, Higgins Julian PT, Deeks Jonathan J. Interpretation of random effects meta-analyses. BMJ. 2011;342:d549. [DOI] [PubMed] [Google Scholar]
  • [23].Chia-Chun Wang, Wen-Chung Lee. A simple method to estimate prediction intervals and predictive distributions: Summarizing meta-analyses beyond means and confidence intervals. Research Synthesis Methods. 2019;10(2):255–266. [DOI] [PubMed] [Google Scholar]
  • [24].Ralf Bender, Tim Friede, Armin Koch, et al. Methods for evidence synthesis in the case of very few studies. Research Synthesis Methods. 2018;9(3):382–392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Michael Borenstein, Hedges Larry V., Higgins Julian P. T., Rothstein Hannah R. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods. 2010;1(2):97–111. [DOI] [PubMed] [Google Scholar]
  • [26].Kenneth Rice, Higgins Julian PT, Thomas Lumley. A re-evaluation of fixed effect(s) meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2018;181(1):205–227. [Google Scholar]
  • [27].Alba Ana C, Alexander Paul E, Joanne Chang, John MacIsaac, Samantha DeFry, Guyatt Gordon H. High statistical heterogeneity is more frequent in meta-analysis of continuous than binary outcomes. Journal of Clinical Epidemiology. 2016;70:129–135. [DOI] [PubMed] [Google Scholar]
  • [28].Higgins Julian P T, Thompson Simon G, Spiegelhalter David J. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society. Series A, (Statistics in Society). 2009;172(1):137–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Rebecca DerSimonian, Nan Laird. Meta-analysis in clinical trials. Controlled Clinical Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]
  • [30].Guido Schwarzer. meta: An R package for meta-analysis. R News. 2007;7(3):40–45. [Google Scholar]
  • [31].Cornell John E, Mulrow Cynthia D, Russell Localio, et al. Random-Effects Meta-analysis of Inconsistent Effects: A Time for Change. Annals of Internal Medicine. 2014;160(4):267–270. [DOI] [PubMed] [Google Scholar]
  • [32].Areti Angeliki Veroniki, Dan Jackson, Wolfgang Viechtbauer, et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Research Synthesis Methods. 2016;7(1):55–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Wolfgang Viechtbauer. Bias and Efficiency of Meta-Analytic Variance Estimators in the Random-Effects Model. Journal of Educational and Behavioral Statistics. 2005;30(3):261–293. [Google Scholar]
  • [34].Smith Teresa C, Spiegelhalter David J, Andrew Thomas. Bayesian approaches to random-effects meta-analysis: A comparative study. Statistics in Medicine. 1995;14(24):2685–2699. [DOI] [PubMed] [Google Scholar]
  • [35].Hussam Alshraideh, Hazem Smadi, Jalal Abo-Taha, Obaidah Alomari. Reference Range Estimation: Accounting for Measurement System Errors. Quality and Reliability Engineering International. 2016;32(3):901–908. [Google Scholar]
  • [36].Hoffmann Robert G. Statistics in the Practice of Medicine. JAMA: The Journal of the American Medical Association. 1963;185(11):864. [DOI] [PubMed] [Google Scholar]
  • [37].Mbius Thomas WD. metagen: Inference in Meta Analysis and Meta Regression2014. R package version 1.0.
  • [38].Martyn Plummer. rjags: Bayesian Graphical Models using MCMC2019. R package version 4–9.
  • [39].Martyn Plummer, Nicky Best, Kate Cowles, Karen Vines. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6(1):7–11. [Google Scholar]
  • [40].R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical ComputingVienna, Austria: 2019. [Google Scholar]
  • [41].Royette Tavernier, Choo Sungsub B, Kathryn Grant, Adam Emma K. Daily Affective Experiences Predict Objective Sleep Outcomes among Adolescents. Journal of sleep research. 2016;25(1):62–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Baggio Jussara AO, Mazin Suleimy SC, Alessio-Alves Frederico F, et al. Verticality Perceptions Associate with Postural Control and Functionality in Stroke Patients. PLoS ONE. 2016;11(3):e0150754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Seymour Geisser. Predictive inference : an introduction. Monographs on statistics and applied probability (Series) ; 55 New York: Chapman & Hall; 1993. [Google Scholar]
  • [44].Horn Paul S, Pesce Amadeo J. Reference intervals: an update. Clinica Chimica Acta. 2003;334(1–2):5–23. [DOI] [PubMed] [Google Scholar]
  • [45].George Casella. Statistical inference. Australia; Pacific Grove, Calif: Thomson Learning; 2nd ed.. ed.2002. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES