Synthesis analysis of regression models with a continuous outcome

Xiao-Hua Zhou; Nan Hu; Guizhou Hu; Martin Root

doi:10.1002/sim.3563

. Author manuscript; available in PMC: 2010 Oct 12.

Published in final edited form as: Stat Med. 2009 May 15;28(11):1620–1635. doi: 10.1002/sim.3563

Synthesis analysis of regression models with a continuous outcome

Xiao-Hua Zhou ^1,^2,^*,^†, Nan Hu ², Guizhou Hu ³, Martin Root ³

PMCID: PMC2952887 NIHMSID: NIHMS219175 PMID: 19326397

SUMMARY

To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113–123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method relies on the normality assumption of the data, and it does not provide variance estimates. In this paper we propose a new synthesis method that improves on the existing synthesis method by eliminating the normality assumption, reducing bias, and allowing for the variance estimation of the estimated parameters.

Keywords: synthesis analysis, meta-analysis, linear models

1. INTRODUCTION

Meta-analysis is a statistical technique for amalgamating, summarizing, and reviewing previous quantitative research. A typical meta-analysis is to summarize all the research results on one topic and to discuss reliability of this summary. It is based on the condition that each individual study reports the same finding for the same research question. The potential advantage of meta-analysis is the increase in the sample size and the validity of statistical inference. It would be difficult to utilize meta-analysis methodologies if individual studies only provide partial findings.

In a practical example, meta-analysis could be used to build a comprehensive and multivariate prediction model for the risk of chronic diseases such as coronary heart disease (CHD). A wide range of CHD risk factors have been reported in the literature, but a comprehensive multivariate CHD prediction model has yet to be found. The Framingham CHD model is widely considered the most comprehensive model, although many well-known CHD risk factors, such as body mass index (BMI), family history of CHD, and c-reactive protein, are not included in the model [1–3].

We propose a new process to solve several of the problems presented above. This novel multivariate meta-analysis modeling method is called synthesis analysis. Using multiple study results reported in the scientific and medical literature, the objective of our synthesis analysis is to estimate the multivariate relations between multiple predictors (Xs) and an outcome variable (Y) from the univariate relation of each X with Y and the two-way correlations between each pair of Xs. All the inputs may come from various studies in the literature, while a cross-sectional population survey may provide correlations of all Xs. We reported the first method of synthesis analysis (the Samsa-Hu-Root or SHR method) in which the partial regression coefficients were calculated using the following matrix equation:

B = (R^{- 1} (B u # S)) / S

where B is the vector of partial (excluding the intercept, B₀) regression coefficients, Bu is the vector of univariate regression coefficients, R is the vector of Pearson correlation coefficients among all independent variables, S is the vector of standard deviations of the independent variables, # stands for element-wise multiplication, and/stands for element-wise division. The intercept, B₀, can be calculated using the resulting multivariate formula, the mean of the predictors and outcome, and the newly calculated partial regression coefficient for each predictor.

In the present study, we propose an improvement to the existing synthesis analysis. Compared with the previous method, this method has at least two advantages: (1) it includes a method to compute the variances for predicted outcomes and estimated regression coefficients and (2) the estimates of predicted outcomes and regression coefficients can be more robust when the independent variables are not normally distributed.

Our paper is organized as follows. In Section 2, we describe our new method. In Section 3, we report a simulation study on finite-sample performance of the proposed method in comparison with the existing synthesis method. In Section 4, we illustrate the use of the proposed method in a real-life example from the 1999–2000 National Health and Nutritional Examination Survey. Finally, in Section 5, we conclude our paper with a discussion on some extensions.

2. NEW METHOD FOR SYNTHESIS ANALYSIS

2.1. Estimation of synthesized parameters

Suppose that we know the individual relationships between an outcome Y and each of p risk factors, X₁, X₂, …, and X_p, which are given as follows:

E [Y ∣ X_{i}] = γ_{0}^{i} + γ_{1}^{i} X_{i}

(1)

where i = 1,2, …, p. In addition, we assume that we know the mean relationships between any two pairs among the p risk factors:

E [X_{j} ∣ X_{i}] = α_{0}^{i j} + α_{1}^{i j} X_{i}

(2)

where i, j = 1,2, …, p and i ≠ j.

The goal of synthesis analysis is to determine the multivariate linear regression model between Y and the p risk factors:

E (Y ∣ X_{1}, \dots, X_{p}) = β_{0} + \sum_{i = 1}^{p} β_{i} X_{i}

(3)

Note that the linear regression assumption (1) automatically holds under assumptions (2) and (3).

Taking the conditional expectation of the both sides of (3) given X_i, we obtain the following equation:

E (Y ∣ X_{i} = x) = β_{0} + β_{1} E (X_{1} ∣ X_{i} = x) + \dots + β_{i - 1} E (X_{i - 1} ∣ X_{i} = x) + β_{i} x + \dots + β_{p} E (X_{p} ∣ X_{i} = x)

(4)

for i = 1, …, p. Combining (1), (2), and (4), we obtain the following result:

γ_{0}^{i} + γ_{1}^{i} x = β_{0} + (β_{1} α_{0}^{i 1} + \dots + β_{i - 1} α_{0}^{i (i - 1)} + β_{i + 1} α_{0}^{i (i + 1)} + \dots + β_{p} α_{0}^{i p}) + (β_{1} α_{1}^{i 1} + \dots + β_{i - 1} α_{1}^{i (i + 1)} + β_{i} + β_{i + 1} α_{1}^{i (i + 1)} + \dots + β_{p} α_{1}^{i p}) x

for all x, where i = 1, …, p. Therefore, we obtain the following two sets of equations:

\begin{array}{l} γ_{0}^{1} = β_{0} + (β_{2} α_{0}^{11} + \dots + β_{p} α_{0}^{1 p}) \\ γ_{0}^{i} = β_{0} + (β_{1} α_{0}^{i 1} + \dots + β_{i - 1} α_{0}^{i (i - 1)} + β_{i + 1} α_{0}^{i (i + 1)} + \dots + β_{p} α_{0}^{i p}) \end{array}

(5)

for i = 2, …, p; and

\begin{array}{l} γ_{1}^{1} = β_{1} + β_{2} α_{1}^{12} + \dots + β_{p} α_{1}^{1 p} \\ γ_{1}^{i} = β_{1} α_{1}^{i 1} + \dots + β_{i - 1} α_{1}^{i (i - 1)} + β_{i} + β_{i + 1} α_{1}^{i (i + 1)} + \dots + β_{p} α_{1}^{i p} \end{array}

(6)

for i = 2, …, p.

Let M be a p × p matrix with diagonal elements 1, and element $α_{1}^{i j}$ when i ≠ j; let β = (β_k, k = 1, …, p), and $γ_{1} + (γ_{1}^{k}, k = 1, \dots, p)$ . From (6), we obtain the following p equations for the p unknown slope parameters, β₁, …, β_p:

M β = γ_{1}

(7)

Using Cramer’s rule, we can easily solve the above p simultaneous linear equations. Let us define the following determinants:

\begin{array}{l} D = | \begin{matrix} 1 & α_{1}^{12} & α_{1}^{13} & \dots & α_{1}^{1 p} \\ α_{1}^{21} & 1 & α_{1}^{23} & \dots & α_{1}^{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ α_{1}^{p 1} & α_{1}^{p 2} & α_{1}^{p 3} & \dots & 1 \end{matrix} | \\ D_{1} = | \begin{matrix} γ_{1}^{2} & α_{1}^{12} & α_{1}^{13} & \dots & α_{1}^{1 p} \\ γ_{1}^{2} & 1 & α_{1}^{23} & \dots & α_{1}^{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ γ_{1}^{p} & α_{1}^{p 2} & α_{1}^{p 3} & \dots & 1 \end{matrix} | and \\ D_{p} = | \begin{matrix} 1 & α_{1}^{12} & α_{1}^{13} & \dots & γ_{1}^{1} \\ α_{1}^{21} & 1 & α_{1}^{23} & \dots & γ_{1}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ α_{1}^{p 1} & α_{1}^{p 2} & α_{1}^{p 3} & \dots & γ_{1}^{p} \end{matrix} | \end{array}

Cramer’s rule gives us the following unique solution to the system of equations (8):

β_{k} = \frac{D_{k}}{D}

(8)

where k = 1, …, p.

After obtaining estimates of the vector of slope parameters, β, we can derive an estimate for the intercept parameter, β₀, using any one of the p equations given in (6). Hence, we have the following p equations for the unknown intercept parameter β₀:

\begin{array}{l} β_{0} + 0 + α_{0}^{12} β_{2} + α_{0}^{13} β_{3} + \dots + α_{0}^{1 (p - 1)} β_{p - 1} + α_{0}^{1 p} β_{p} = γ_{0}^{1} \\ β_{0} + α_{0}^{12} β_{1} + 0 + α_{0}^{23} β_{3} + \dots + α_{0}^{2 (p - 1)} β_{p - 1} + α_{0}^{2 p} β_{p} = γ_{0}^{2} \\ ⋮ \\ β_{0} + α_{0}^{p 1} β_{1} + α_{0}^{p 2} β_{2} + α_{0}^{p 3} β_{3} + \dots + α_{0}^{p (p - 1)} β_{p - 1} + 0 = γ_{0}^{p} \end{array}

Although there are p equations for the parameter β₀, we show that the solution of β₀ is unique in Appendix A. We give a detailed description of our solution for the two-covariate case in Appendix B, and in Appendix C, we give an explicit formula for our synthesized parameters in cases with three and four covariates.

2.2. Variance estimation

The variance can be estimated using the delta method by assuming that the univariate parameter estimates $γ_{0}^{(i)}$ and $γ_{1}^{(i)} (i = 1, \dots, p)$ from individual univariate linear regression models, given by (1), are independent of each other [4]. Let $α = (α_{0}^{(i j)}, α_{1}^{(i j)}, i, j = 1, \dots, p)$ and $γ = (γ_{0}^{(k)}, γ_{1}^{(k)}, k = 1, \dots, p)$ .

By the well-known result from simple linear regression, we know:

n^{1 / 2} [{(α, γ)}^{T} - {(α_{0}, γ_{0})}^{T}] \to_{d} N (0, \sum)

where α₀ and γ₀ are the true expected values of α and γ,

\sum = (\begin{matrix} \sum_{α} & 0 \\ 0 & \sum_{γ} \end{matrix})

Here

\sum_{α} = (σ_{α_{i}^{k l} α_{j}^{k^{'} l^{'}}}, i, j = 0, 1; k, l, k^{'}, l^{'} = 1, 2, \dots, p)

where $σ_{α_{i}^{k l} α_{j}^{k^{'} l^{'}}} (i, j = 0, 1; k, l, k^{'}, l^{'} = 1, 2, \dots, p)$ is the covariance between $α_{i}^{(k l)}$ and $α_{j}^{(k^{'} l^{'})}$ , and

\sum_{γ} = (\begin{matrix} σ_{γ_{0}^{1} γ_{0}^{1}} & 0 & \dots & 0 \\ 0 & σ_{γ_{1}^{1} γ_{1}^{1}} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & \dots & ⋱ & σ_{γ_{1}^{p} γ_{1}^{p}} \end{matrix})

is the covariance matrix of the estimated parameters γ̂.

The synthesized parameter estimates β =(β₀, β_l, …, β_p)^T are functions of α’s and γ’s, which can be expressed mathematically as:

β = g (α, γ)

If the function g is differentiable, then the delta method gives the asymptotic variance of β as follows:

\sum_{β} = \nabla g {(α, γ)}^{T} \sum \nabla g (α, γ)

(9)

where ∇g(α, γ) is the vector of derivatives of function g with respect to β=( β₀, β_1, …, β_p). We give an explicit formula for ∇g(α, γ) when p = 2 in Appendix B. Many programs, such as Mathematica, can perform derivatives symbolically, thereby making the variance calculation much easier, since the derivation of the exact form of the ∇g is not required before the calculation.

2.3. Variance of predicted value

Once the estimates of parameters and their variances have been derived, we can calculate the covariance matrix of predicted values as follows:

Cov (Y ∣ X) = Cov (X^{T} β ∣ X) = X^{T} \sum_{β} X

where X^T is the transpose of the X matrix, and Σ_β is the covariance matrix of β, given by (9).

2.4. Mean-squared error of the predicted value and correlation between predicted and observed values

The mean-squared error (MSE) of the predicted value is given by

{MSE}_{\hat{Y}} = \frac{\sum_{i = 1}^{n} ({\hat{Y}}_{i} - Y_{i})}{n}

where Ŷ_i and Y_i are the predicted and observed value of subject i, respectively. The correlation coefficient between Ŷ_i and Y_i, ρ, can be calculated by

ρ = \frac{Cov ({\hat{Y}}_{i}, Y_{i})}{\sqrt{Var ({\hat{Y}}_{i}) Var (Y_{i})}}

where Cov(Ŷ_i_, Y_i) is the covariance between predicted and observed values.

3. SIMULATION STUDY

We conducted a simulation study to assess the performance of the proposed method in comparison with our previous method [5], denoted by SHR. We simulated data with two, three, and four predictor variables. For simplicity of presentation, we only reported the results for the two-predictors here, because the results for three-predictor and four-predictor cases are similar to those in the two-predictor case.

In each of these cases, we simulated independent variables from (1) a multivariate normal distribution, (2) a multivariate log-normal distribution, (3) a multivariate exponential distribution, and (4) a multivariate gamma distribution. We chose the variances of all the independent variables to be 1 and correlations for pairs of the independent variables to be 0.5. After simulating the independent variables X, we generated the dependent variable Y by adding random normal errors to the mean model:

Y = β_{0} + \sum_{i = 1}^{p} β_{i} X_{i} + ε (p = 2, 3, 4)

(10)

where ε is a random error following the standard normal distribution.

We set the true regression parameters as follows: (β₀, β_l, β₂) = (−5, 5, 3) for the two-variable setting, (β₀, β_l, β₂, β₃) = (−5, 1, 3, 5) for the three-variable setting, and (β₀, β_l, β₂, β_3,β₄) = (−5, 5, 4, 3, 1) for the four-variable setting. We divided each data set into $C_{2}^{p + 1}$ (p = 2,3,4) subsets with equal sample sizes. Here, $C_{2}^{p + 1}$ denoted the total number of combinations of choosing 2 items from (p + 1) items. In simulated data, each subset contained only one pair of variables chosen from Y, X₁, …, X_p. The sample size (the total number of observations) used in simulation was 300 and 3000 (with equal size for each subset). For each of the above settings, we simulated a total number of 1000 data sets. As the results for the data from the skewed log-normal distribution were similar to those from the other skewed distributions, we only reported the results for the normal and log-normal distributions. We reported the mean bias and MSE for estimated parameters in Tables I and II.

Table I.

Mean bias and MSE of estimated regression parameters with two independent variables following a normal distribution.

Sample size, method	Mean bias			MSE
Sample size, method	β₀	β₁	β₂	β₀	β₁	β₂
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , New	−0.190	−0.016	0.041	14.808	1.708	2.763
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , SHR	0.486	−0.033	−0.090	26.897	0.939	1.527
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , New	0.031	0.000	−0.007	1.346	0.033	0.067
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , SHR	0.050	−0.004	−0.009	2.628	0.079	0.139

Open in a new tab

The sample size for subsets with only outcome Y and predictor X₁.

^†

The sample size for subsets with only outcome Y and predictor X₂.

^‡

The sample size for subsets with only predictors X₁ and X₂.

Table II.

Mean bias and MSE of estimated regression parameters with two independent variables following a log-normal distribution.

Sample size, method	Mean bias			MSE
Sample size, method	β₀	β₁	β₂	β₀	β₁	β₂
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , New	0.146	−0.081	−0.042	42.032	3.676	4.799
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , SHR	10.377	−1.104	−1.412	933.764	82.249	80.029
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , New	−0.051	−0.004	0.010	1.259	0.033	0.063
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , SHR	−0.015	−0.013	0.006	2.349	0.080	0.126

Open in a new tab

The sample size for subsets with only outcome Y and predictor X₁.

^†

The sample size for subsets with only outcome Y and predictor X₂.

^‡

The sample size for subsets with only predictors X₁ and X₂.

In order to evaluate the accuracy of predicted values using the new model, we simulated two data sets with equal sample sizes. One was used as the training set for model derivation, while the other was used as the validation data set. To evaluate prediction performance, we reported mean bias, MSE, and the mean of standard error estimates (SEEs) for predicted values in Tables III and IV. The SEEs were derived using the method developed in Sections 2.2 and 2.3. The correlations between predicted and observed values were also reported in the two tables.

Table III.

Mean bias, MSE, correlation and SE for predicted values with two independent variables following a normal distribution.

Sample size, method	Mean bias	MSE	Correlation	SEE
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , New	0.0108	0.8046	0.9949	6.0496
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , SHR	14.1519	221.1321	0.9900	—
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , New	−0.0092	0.0723	0.9996	1.8656
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , SHR	14.0304	209.9250	0.9954	—

Open in a new tab

Note: Correlation is the mean correlation between observed and predicted values across simulations. SEE is the mean of standard error estimates for predicted values.

The sample size for a subset with only outcome Y and predictor X₁.

^†

The sample size for a subset with only outcome Y and predictor X₂.

^‡

The sample size for a subset with only predictors X₁ and X₂.

Table IV.

Mean bias, MSE, correlation and SE for predicted values with two independent variables following a log-normal distribution.

Sample size, method	Mean bias	MSE	Correlation	SEE
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , New	−10.2079	199764.1000	0.9376	254.6255
$n = 300 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 100)$ , SHR	85.9998	47835.6600	0.9335	—
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , New	1.0546	17442.6700	0.9918	71.3051
$n = 3000 (m_{1}^{*} = m_{2}^{†} = m_{3}^{‡} = 1000)$ , SHR	66.5488	12226.2700	0.9328	—

Open in a new tab

Note: Correlation is the mean correlation between observed and predicted values across simulations. SEE is the mean of standard error estimates for predicted values.

The sample size for subset with only outcome Y and predictor X₁.

^†

The sample size for subset with only outcome Y and predictor X₂.

^‡

The sample size for subset with only predictors X₁ and X₂.

Simulation results for the regression parameters showed that the mean bias and MSE of the estimated regression parameters using our new method were, in general, better than those using the SHR method, across all of the distributions and sample sizes considered here. The results also indicated that when the distributions of independent variables X were heavily skewed (log-normal distribution), the bias and MSE of the estimated regression parameters using both methods were large, especially when sample sizes were small. Nonetheless, the results from our new method were much better than those from the SHR method under this situation.

The results for predicted values indicated that both the new method and the SHR method had similar correlations between observed and predicted values across all sample sizes and distributions. However, mean bias and MSE for predicted values derived from our new method were much smaller than those from the SHR method.

4. EXAMPLE

In this section, we analyzed a real-world example and compared the results using our new synthesis method and the SHR method. The data came from the 1999–2000 National Health and Nutritional Examination Survey [6]. There were five variables in this data set, including one outcome Y, systolic blood pressure, and four predictors, X₁, X₂, X₃, and X₄, which represented age, body mass index (BMI), serum total cholesterol level, and the natural log of serum triglycerides, respectively. First, we fitted a multivariate regression model to this data set, which would serve as the gold standard for this analysis. Next, we randomly divided the data set into the five mutually exclusive subsets with approximately equal sample sizes. The first four subsets included the outcome Y and each of the four covariates, X₁, X₂, X₃, and X₄, respectively. The last subset contained all four covariates, which was used to derive pairwise correlations among the covariates. We applied the two synthesis methods to these five subsets to obtain estimated parameters in the multivariate regression model and reported the results in Table V. For comparison purposes, we also included the estimated parameters in the multivariate regression models obtained by the gold standard model in Table V.

Table V.

Parameter estimates (SE) for the NHANES blood pressure example.

Variables	Gold standard β̃	NEW method β̂_NEW	SHR method^*β̂_SHR
Intercept	76.207 (2.556)	73.482 (4.531)	83.401
AGE	0.601 (0.017)	0.634 (0.050)	0.681
BID	0.379 (0.045)	0.403 (0.128)	0.337
TCHOL	0.024 (0.007)	0.029 (0.018)	0.006
LOGTRIG	1.374 (0.529)	1.506 (0.931)	0.160

Open in a new tab

Cannot calculate SE using this method.

The estimated parameters and their standard errors (SEs) from the gold standard and from both our new method and SHR method were listed in Table V (SE was not available by the SHR method). From these results, we observed that the new method produced the coefficient estimates that were comparable to those derived using the gold standard. However, the estimates for Intercept and LOGTRIG from the SHR method were varied somewhat from those derived using the gold standard method. As an illustration, the predicted value for a 65-year-old subject with the BMI of 19, the serum total cholesterol level of 190, and the serum triglycerides of 160 would be 134, 135, and 136, using the gold standard method, the new method, and the SHR method, respectively.

5. DISCUSSION

In this paper, we provided several enhancements to the existing SHR synthesis analysis methodology. These improvements allow for more robust estimates of the regression parameters and predicted values when covariates are not normally distributed. Additionally, the new method allows for estimation of the variance of the resulting parameters and predicted outcomes.

Both the previously reported SHR method and our improved method allow for the building of multivariate regression models using univariate regression coefficients and two-way correlation coefficient data that are derived from different data sources. The underlying assumption is that each individual study is representative of the target population. However, the validity of the previously reported SHR synthesis analysis methodology relies on the normality assumption of the data. Although synthesis analysis is related to both meta-analysis and analysis of missing data, it is also different from these two traditional analyses in two important ways. First, while the goal of traditional meta-analysis is to combine the multivariate regression models with the same covariates from different studies, the goal of synthesis analysis is to create a multivariate linear regression model from univariate linear regression models on different covariates. Although the statistical problem that synthesis analysis address may be considered as one particular type of missing-data problem, unlike a traditional analysis, synthesis analysis does not require individual level data; rather, synthesis analysis only requires coefficient estimates of univariate linear regression models between the outcome and a covariate and between any two covariates.

Although the proposed method was developed to synthesize different univariate linear regression models with different covariates into multivariate linear regression models, it can be easily extended to the setting in which several studies are available for some (or all) of the univariate regression models. In this case, there would be variation among the parameter estimates. For example, if there are five studies available for the linear model, E(Y | X₁), and six studies for the linear model, E(X₁ | X₂), then we would have the five sets of estimates for the intercept and slope of the linear model of Y on X, denoted by $γ_{0}^{j 1}$ and $γ_{1}^{j 1}$ , for j = 1, …, 5, and the six sets of estimates for the intercept and slope of the linear model of X₁ on X₂, denoted by $α_{0}^{k 21}$ and $α_{1}^{k 21}$ , for k = l, …, 6.

In this case, we propose to first combine the results on the same univariate regression model from different studies into the one univariate regression model using the weighted mean of $α_{i}^{j k}$ and $γ_{i}^{j}$ , with the weight being the inverse sample size; that is,

γ_{0}^{1} = \sum_{j = 1}^{5} \frac{N_{j}}{N} γ_{0}^{j 1}, γ_{1}^{1} = \sum_{j = 1}^{5} \frac{N_{j}}{N} γ_{1}^{j 1}

where N_j is the sample size for the jth univariate model between Y and X₁, and $N = \sum_{j = 1}^{5} N_{j}$ . Then, we apply the proposed synthesis method in Section 2 to obtain the multivariate regression model.

We performed a simulation study to assess the performance of the modified method in the two independent variables case, with one independent variables following a normal distribution and another following a log-normal distribution. We also compared this modified method with other combining methods, including mean, median, minimum, and maximum of multiple estimates for a same regression parameter. From these simulation results, we concluded that parameter estimates using the weighted mean had the smallest bias and MSE, and were very close to the bias and MSE using the gold standard. In addition, the predicted value using the weighted mean had the smallest bias, MSE, and SEE. We give a detailed description on our simulation study and results in Appendix D. The computer software for implementing the proposed method is available at http://faculty.washington.edu/azhou.

Table DII.

Mean Bias, MSE, Correlation and SEE for predicted values with equal sample sizes.

Method	Mean bias	MSE	Correlation	SEE
Total sample size N = 1000 × 3 × 5 equal sample size) = 15 000
Weighted mean (Mean)	0.0019	0.0301	0.9998	0.9109
Total sample size N = 100 × 3 × 5 (equal sample size) = 1500
Weighted mean (Mean)	0.0126	0.3741	0.9956	3.0272

Open in a new tab

Table DIII.

Bias and MSE for estimated parameters with unequal sample sizes.

Method	Bias			MSE
Method	β₀	β₁	β₂	β₀	β₁	β₂
Total sample size N =(100+200+500+1200+3000) × 3 = 15000
Weighted mean	0.0196	0.0049	−0.0056	0.5540	0.0251	0.0496
Mean	−0.0231	0.0067	−0.0076	0.8445	0.0567	0.0875
Median	0.0208	0.0073	−0.0082	0.6676	0.0680	0.0329
Minimum	−0.0538	0.0211	−0.0103	3.0387	0.0733	0.1526
Maximum	−0.0236	0.0040	−0.0123	5.8060	0.1549	0.2748
Total sample size N = (10+20+50+120+300) × 3 = 1500
Weighted mean	0.1147	0.0268	−0.0283	3.0217	0.3488	0.3621
Mean	0.2007	0.0234	0.0322	4.4266	0.3396	0.4212
Median	0.1583	0.0283	−0.0379	7.2861	0.4095	0.3714
Minimum	−2.8130	−0.4905	0.6229	73.6571	2.0423	3.8998
Maximum	−0.5346	0.1130	0.0830	529.7432	96.6978	61.0214

Open in a new tab

Acknowledgments

We would like to thank Vicki Ding and Hua Chen for their help in preparing this manuscript. Xiao-Hua Zhou, PhD, is presently a Core Investigator and Biostatistics Unit Director at the Northwest HSR&D Center of Excellence, Department of Veterans Affairs Medical Center, Seattle, WA. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs. This study has been partially supported by NSFC 30728019.

Contract/grant sponsor: NSFC; contract/grant number: 30728019

APPENDIX A: SKETCH PROOF FOR UNIQUENESS OF INTERCEPT COEFFICIENT

Here we show that there is a unique solution for the intercept term β₀ with the p equations (5), meaning that we need to show that the following p solutions are equivalent:

\begin{array}{l} β_{0}^{(1)} = γ_{0}^{1} - (α_{0}^{12} β_{2} + α_{0}^{13} β_{3} + \dots + α_{0}^{1, p - 1} β_{p - 1} + α_{0}^{1 p} β_{p}) \\ β_{0}^{(2)} = γ_{0}^{2} - (α_{0}^{21} β_{1} + 0 + α_{0}^{23} β_{3} + \dots + α_{0}^{2, p - 1} β_{p - 1} + α_{0}^{2 p} β_{p}) \\ ⋮ \\ β_{0}^{(p)} = γ_{0}^{p} - (α_{0}^{p 1} β_{1} + α_{0}^{p 2} β_{2} + α_{0}^{p 3} β_{3} + \dots + α_{0}^{p, p - 1} β_{p - 1} + 0) \end{array}

Without losing generality, we only show that the solutions of the first two equations are equal, that is, $β_{0}^{(1)} = β_{0}^{(2)}$ . The proof for other solutions is similar.

In order to show

γ_{0}^{1} - α_{0}^{12} β_{2} - α_{0}^{13} β_{3} - \dots - α_{0}^{1, p - 1} β_{p - 1} - α_{0}^{1 p} β_{p} = γ_{0}^{2} - α_{0}^{21} β_{1} - α_{0}^{23} β_{3} - \dots - α_{0}^{2, p - 1} β_{p - 1} - α_{0}^{2 p} β_{p}

(A1)

we add E(X₁)β₁ + E(X₂)β₂ + ··· + E(X_p)β_p to both sides of (A1), and then the left side of (A1) becomes

γ_{0}^{1} + E (X_{1}) β_{1} + (E (X_{2}) - α_{0}^{12}) β_{2} + \dots (E (X_{p - 1}) - α_{0}^{1, p - 1}) β_{p - 1} + (E (X_{p}) - α_{0}^{1 p}) β_{p}

(A2)

Because $E (X_{j} ∣ X_{i}) = α_{0}^{i j} + α_{1}^{i j} X_{i}$ , we can get the following result:

E (X_{j}) = E (E (X_{j} ∣ X_{i})) = α_{0}^{i j} + α_{1}^{i j} E (X_{i})

(A3)

Hence, we can replace ( $E (X_{j}) - α_{0}^{i j}$ ) with $α_{1}^{1 j} E (X_{1})$ in (A2) and obtain the following result:

γ_{0}^{1} + E (X_{1}) β_{1} + α_{1}^{12} β_{2} E (X_{1}) + α_{1}^{1, p - 1} β_{p - 1} E (X_{1}) + α_{1}^{1 p} β_{p} E (X_{1}) = γ_{0}^{1} + (β_{1} + α_{1}^{12} β_{2} + \dots + α_{1}^{1 p} β_{p}) E (X_{1})

(A4)

Because β₁, …, and β_p are the solutions of Mβ = γ₁, we can obtain the following result:

β_{1} + α_{1}^{12} β_{2} + \dots + α_{1}^{1 p} β_{p} = γ_{1}^{1}

(A5)

Hence, the right side of (A4) becomes $γ_{0}^{1} + γ_{1}^{1} E (X_{1})$ , which equals to E(Y) because $E (Y) = E (E (Y ∣ X_{1})) = E (γ_{0}^{1} + γ_{1}^{1} X_{1}) = γ_{0}^{1} + γ_{1}^{1} E (X_{1})$ .

Similarly, we can proof the right side of (A1) plus E(X₁)β₁ + E(X₂) β₂ + ··· + E(X_p)β_p is also equal to E(Y). This completes the proof.

APPENDIX B: SOLUTION FOR TWO PREDICTORS CASE

When p = 2, we can also have an explicit formula for the derivative of β = g(α, γ) with respect to agr; and γ, ∇g(α, γ), for the two independent variables case. Here, ∇g(α, γ) is used to calculate the variance of β and predicted values.

\begin{array}{l} \nabla g (α, γ) = (\begin{matrix} \frac{\partial {\hat{β}}_{0}}{\partial α_{0}^{12}} & \frac{\partial {\hat{β}}_{1}}{\partial α_{0}^{12}} & \frac{\partial {\hat{β}}_{2}}{\partial α_{0}^{12}} \\ \frac{\partial {\hat{β}}_{0}}{\partial α_{1}^{12}} & \frac{\partial {\hat{β}}_{1}}{\partial α_{1}^{12}} & \frac{\partial {\hat{β}}_{2}}{\partial α_{1}^{12}} \\ \frac{\partial {\hat{β}}_{0}}{\partial α_{0}^{21}} & \frac{\partial {\hat{β}}_{1}}{\partial α_{0}^{21}} & \frac{\partial {\hat{β}}_{2}}{\partial α_{0}^{21}} \\ \frac{\partial {\hat{β}}_{0}}{\partial α_{1}^{21}} & \frac{\partial {\hat{β}}_{1}}{\partial α_{1}^{21}} & \frac{\partial {\hat{β}}_{2}}{\partial α_{1}^{21}} \\ \frac{\partial {\hat{β}}_{0}}{\partial γ_{0}^{1}} & \frac{\partial {\hat{β}}_{1}}{\partial γ_{0}^{1}} & \frac{\partial {\hat{β}}_{2}}{\partial γ_{0}^{1}} \\ \frac{\partial {\hat{β}}_{0}}{\partial γ_{1}^{1}} & \frac{\partial {\hat{β}}_{1}}{\partial γ_{1}^{1}} & \frac{\partial {\hat{β}}_{2}}{\partial γ_{1}^{1}} \\ \frac{\partial {\hat{β}}_{0}}{\partial γ_{0}^{2}} & \frac{\partial {\hat{β}}_{1}}{\partial γ_{0}^{2}} & \frac{\partial {\hat{β}}_{2}}{\partial γ_{0}^{2}} \\ \frac{\partial {\hat{β}}_{0}}{\partial γ_{1}^{2}} & \frac{\partial {\hat{β}}_{1}}{\partial γ_{1}^{2}} & \frac{\partial {\hat{β}}_{2}}{\partial γ_{1}^{2}} \end{matrix}) \\ = (\begin{matrix} - \frac{γ_{1}^{2} - α_{1}^{21} γ_{1}^{1}}{1 - α_{1}^{12} α_{1}^{21}} & 0 & 0 \\ \frac{α_{0}^{12} α_{1}^{21}}{{(1 - α_{1}^{12} α_{1}^{21})}^{2}} & - \frac{γ_{1}^{2}}{1 - α_{1}^{12} α_{1}^{21}} + \frac{α_{1}^{21} (γ_{1}^{1} - α_{1}^{21} γ_{1}^{2})}{{(1 - α_{1}^{12} α_{1}^{21})}^{2}} & \frac{α_{1}^{21} (γ_{1}^{2} - α_{1}^{21} γ_{1}^{1})}{1 - α_{1}^{12} α_{1}^{21}} \\ 0 & 0 & 0 \\ - α_{0}^{12} [- \frac{γ_{1}^{1}}{1 - α_{1}^{12} α_{1}^{21}} - \frac{α_{1}^{12} (γ_{1}^{2} - α_{1}^{21} γ_{1}^{1})}{{(1 - α_{1}^{12} α_{1}^{21})}^{2}}] & \frac{α_{1}^{12} (γ_{1}^{1} - α_{1}^{12} γ_{1}^{2})}{{(1 - α_{1}^{12} α_{1}^{21})}^{2}} & - \frac{γ_{1}^{1}}{1 - α_{1}^{12} α_{1}^{21}} - \frac{α_{1}^{21} (γ_{1}^{2} - α_{1}^{21} γ_{1}^{1})}{{(1 - α_{1}^{12} α_{1}^{21})}^{2}} \\ 1 & 0 & 0 \\ \frac{α_{0}^{12} α_{1}^{21}}{1 - α_{1}^{12} α_{1}^{21}} & \frac{1}{1 - α_{1}^{12} α_{1}^{21}} & - \frac{α_{1}^{21}}{1 - α_{1}^{12} α_{1}^{21}} \\ 0 & 0 & 0 \\ - \frac{α_{0}^{12}}{1 - α_{1}^{12} α_{1}^{21}} & - \frac{α_{1}^{12}}{1 - α_{1}^{12} α_{1}^{21}} & \frac{1}{1 - α_{1}^{12} α_{1}^{21}} \end{matrix}) \end{array}

APPENDIX C: SOLUTION FOR THREE AND FOUR PREDICTORS

When there are three predictors in the model, D and D_i, (i = 1, 2, 3) are given as follows:

\begin{array}{l} D = | \begin{matrix} 1 & α_{1}^{12} & α_{1}^{13} \\ α_{1}^{21} & 1 & α_{1}^{23} \\ α_{1}^{31} & α_{1}^{32} & 1 \end{matrix} | = (1 + α_{1}^{12} α_{1}^{23} α_{1}^{31} + α_{1}^{13} α_{1}^{21} α_{1}^{32}) - (α_{1}^{12} α_{1}^{21} + α_{1}^{13} α_{1}^{31} + α_{1}^{23} α_{1}^{32}) \\ D_{1} = | \begin{matrix} γ_{1}^{1} & α_{1}^{12} & α_{1}^{13} \\ γ_{1}^{2} & 1 & α_{1}^{23} \\ γ_{1}^{3} & α_{1}^{32} & α_{1}^{33} \end{matrix} | = (γ_{1}^{1} α_{1}^{33} + α_{1}^{12} α_{1}^{23} γ_{1}^{3} + α_{1}^{13} γ_{1}^{2} α_{1}^{32}) - (α_{1}^{13} γ_{1}^{3} + α_{1}^{12} γ_{1}^{2} α_{1}^{33} + γ_{1}^{1} α_{1}^{23} α_{1}^{32}) \\ D_{2} = | \begin{matrix} 1 & γ_{1}^{1} & α_{1}^{13} \\ α_{1}^{21} & γ_{1}^{2} & α_{1}^{23} \\ α_{1}^{31} & γ_{1}^{3} & α_{1}^{33} \end{matrix} | = (γ_{1}^{2} α_{1}^{33} + γ_{1}^{1} α_{1}^{23} α_{1}^{31} + α_{1}^{13} α_{1}^{21} γ_{1}^{3}) - (α_{1}^{13} γ_{1}^{2} α_{1}^{31} + γ_{1}^{1} α_{1}^{21} α_{1}^{33} + α_{1}^{23} γ_{1}^{3}) \end{array}

and

D_{3} = | \begin{matrix} 1 & α_{1}^{12} & γ_{1}^{1} \\ α_{1}^{21} & 1 & γ_{1}^{2} \\ α_{1}^{31} & α_{1}^{32} & γ_{1}^{3} \end{matrix} | = (γ_{1}^{3} + α_{1}^{12} γ_{1}^{2} α_{1}^{31} + γ_{1}^{1} α_{1}^{21} α_{1}^{32}) - (γ_{1}^{1} α_{1}^{31} + α_{1}^{12} α_{1}^{21} γ_{1}^{3} + γ_{1}^{2} α_{1}^{32})

If there are four predictors in the regression model, the D and D_i, (i = 1, 2, 3, 4) are as follows:

\begin{array}{l} D = & | \begin{matrix} 1 & α_{1}^{12} & α_{1}^{13} & α_{1}^{14} \\ α_{1}^{21} & 1 & α_{1}^{23} & α_{1}^{24} \\ α_{1}^{31} & α_{1}^{32} & 1 & α_{1}^{34} \\ α_{1}^{41} & α_{1}^{42} & α_{1}^{43} & 1 \end{matrix} | = [(1 + α_{1}^{23} α_{1}^{34} α_{1}^{42}) + α_{1}^{24} α_{1}^{32} α_{1}^{43}) - (α_{1}^{23} α_{1}^{32} + α_{1}^{24} α_{1}^{42} + α_{1}^{34} α_{1}^{43})] \\ - α_{1}^{12} [(α_{1}^{21} + α_{1}^{23} α_{1}^{34} α_{1}^{41} + α_{1}^{24} α_{1}^{31} α_{1}^{43}) - (α_{1}^{24} α_{1}^{41} + α_{1}^{23} α_{1}^{31} + α_{1}^{21} α_{1}^{34} α_{1}^{43})] \\ + α_{1}^{13} [(α_{1}^{21} α_{1}^{32} + α_{1}^{34} α_{1}^{41} + α_{1}^{24} α_{1}^{31} α_{1}^{42}) - (α_{1}^{24} α_{1}^{32} α_{1}^{41} + α_{1}^{21} α_{1}^{34} α_{1}^{42} + α_{1}^{31})] \\ - α_{1}^{14} [(α_{1}^{21} α_{1}^{32} α_{1}^{43} + α_{1}^{41} + α_{1}^{23} α_{1}^{31} α_{1}^{42}) - (α_{1}^{23} α_{1}^{32} α_{1}^{41} + α_{1}^{31} α_{1}^{43} + α_{1}^{21} α_{1}^{42})] \\ D_{1} = & | \begin{matrix} γ_{1}^{1} & α_{1}^{12} & α_{1}^{13} & α_{1}^{14} \\ γ_{1}^{2} & 1 & α_{1}^{23} & α_{1}^{24} \\ γ_{1}^{3} & α_{1}^{32} & 1 & α_{1}^{34} \\ γ_{1}^{4} & α_{1}^{42} & α_{1}^{43} & 1 \end{matrix} | = γ_{1}^{1} [(1 + α_{1}^{23} α_{1}^{34} α_{1}^{42}) + α_{1}^{24} α_{1}^{32} α_{1}^{43}) - (α_{1}^{23} α_{1}^{32} + α_{1}^{24} α_{1}^{42} + α_{1}^{34} α_{1}^{43})] \\ - α_{1}^{12} [(γ_{1}^{2} + α_{1}^{23} α_{1}^{34} γ_{1}^{4} + α_{1}^{24} γ_{1}^{3} α_{1}^{43}) - (α_{1}^{24} γ_{1}^{4} + α_{1}^{23} γ_{1}^{3} + α_{1}^{34} α_{1}^{43} γ_{1}^{2})] \\ + α_{1}^{13} [(γ_{1}^{2} α_{1}^{32} + α_{1}^{34} γ_{1}^{4} + α_{1}^{24} γ_{1}^{3} α_{1}^{42}) - (α_{1}^{24} α_{1}^{32} γ_{1}^{4} + γ_{1}^{3} + α_{1}^{34} α_{1}^{42} γ_{1}^{2})] \\ - α_{1}^{14} [(γ_{1}^{2} α_{1}^{32} α_{1}^{43} + γ_{1}^{4} + α_{1}^{23} γ_{1}^{3} α_{1}^{42}) - (α_{1}^{23} α_{1}^{32} γ_{1}^{4} + α_{1}^{43} γ_{1}^{3} + α_{1}^{42} γ_{1}^{2})] \\ D_{2} = & | \begin{matrix} 1 & γ_{1}^{1} & α_{1}^{13} & α_{1}^{14} \\ α_{1}^{21} & γ_{1}^{2} & α_{1}^{23} & α_{1}^{24} \\ α_{1}^{31} & γ_{1}^{3} & 1 & α_{1}^{34} \\ α_{1}^{41} & γ_{1}^{4} & α_{1}^{43} & 1 \end{matrix} | = [(γ_{1}^{2} + α_{1}^{23} α_{1}^{34} γ_{1}^{4} + α_{1}^{24} γ_{1}^{3} α_{1}^{43}) - (α_{1}^{24} γ_{1}^{4} + α_{1}^{23} γ_{1}^{3} + α_{1}^{34} α_{1}^{43} γ_{1}^{2})] \\ - γ_{1}^{1} [(α_{21} + α_{1}^{23} α_{1}^{34} α_{1}^{41} + α_{1}^{24} α_{1}^{31} α_{1}^{43}) - (α_{1}^{24} α_{1}^{41} + α_{1}^{23} α_{1}^{31} + α_{1}^{21} α_{1}^{34} α_{1}^{43})] \\ + α_{1}^{13} [(α_{1}^{21} γ_{1}^{3} + γ_{1}^{2} α_{1}^{34} α_{1}^{41} + α_{1}^{24} α_{1}^{31} γ_{1}^{4}) - (α_{1}^{24} γ_{1}^{3} α_{1}^{41} + γ_{1}^{2} α_{1}^{31} + α_{1}^{21} α_{1}^{34} γ_{1}^{4})] \\ - α_{1}^{14} [(α_{1}^{21} γ_{1}^{3} α_{1}^{43} + γ_{1}^{2} α_{1}^{41} + α_{1}^{23} α_{1}^{31} γ_{1}^{4}) - (α_{1}^{23} γ_{1}^{3} α_{1}^{41} + γ_{1}^{2} α_{1}^{31} α_{1}^{43} + α_{1}^{21} γ_{1}^{4})] \\ D_{3} = & | \begin{matrix} 1 & α_{1}^{12} & γ_{1}^{1} & α_{1}^{14} \\ α_{1}^{21} & 1 & γ_{1}^{2} & α_{1}^{24} \\ α_{1}^{31} & α_{1}^{32} & γ_{1}^{3} & α_{1}^{34} \\ α_{1}^{41} & α_{1}^{42} & γ_{1}^{4} & 1 \end{matrix} | = [(γ_{1}^{3} + γ_{1}^{2} α_{1}^{34} α_{1}^{42} + α_{1}^{24} α_{1}^{32} γ_{1}^{4}) - (α_{1}^{24} α_{1}^{42} γ_{1}^{3} + γ_{1}^{2} α_{1}^{32} + α_{1}^{34} γ_{1}^{4})] \\ - α_{1}^{12} [(α_{1}^{21} γ_{1}^{3} + γ_{1}^{2} α_{1}^{34} α_{1}^{41} + α_{1}^{24} α_{1}^{31} γ_{1}^{4}) - (α_{1}^{24} γ_{1}^{3} α_{1}^{41} + γ_{1}^{2} α_{1}^{31} + α_{1}^{21} α_{1}^{34} γ_{1}^{4})] \\ + γ_{1}^{1} [(α_{1}^{21} α_{1}^{32} + α_{1}^{34} α_{1}^{41} + α_{1}^{24} α_{1}^{31} α_{1}^{42}) - (α_{1}^{24} α_{1}^{32} α_{1}^{41} + α_{1}^{31} + α_{1}^{21} α_{1}^{34} α_{1}^{42})] \\ - α_{1}^{14} [(α_{1}^{21} α_{1}^{32} γ_{1}^{4} + γ_{1}^{3} α_{1}^{41} + γ_{1}^{2} α_{1}^{31} α_{1}^{42}) - (γ_{1}^{2} α_{1}^{32} α_{1}^{41} + α_{1}^{31} γ_{1}^{4} + α_{1}^{21} γ_{1}^{3} α_{1}^{42})] \end{array}

and

\begin{array}{l} D_{4} = ∣ \begin{matrix} 1 & α_{1}^{12} & α_{1}^{13} & γ_{1}^{1} \\ α_{1}^{21} & 1 & α_{1}^{23} & γ_{1}^{2} \\ α_{1}^{31} & α_{1}^{32} & 1 & γ_{1}^{3} \\ α_{1}^{41} & α_{1}^{42} & α_{1}^{43} & γ_{1}^{4} \end{matrix} ∣ = [(γ_{1}^{4} + α_{1}^{23} γ_{1}^{3} α_{1}^{42}) + γ_{1}^{2} α_{1}^{32} α_{1}^{43}) - (γ_{1}^{2} α_{1}^{42} + α_{1}^{23} α_{1}^{32} γ_{1}^{4} + γ_{1}^{3} α_{1}^{43})] \\ - α_{1}^{12} [(α_{1}^{21} γ_{1}^{4} + α_{1}^{23} γ_{1}^{3} α_{1}^{41} + γ_{1}^{2} α_{1}^{31} α_{1}^{43}) - (γ_{1}^{2} α_{1}^{41} + α_{1}^{23} α_{1}^{31} γ_{1}^{4} + α_{1}^{21} γ_{1}^{3} α_{1}^{43})] \\ + α_{1}^{13} [(α_{1}^{21} α_{1}^{32} γ_{1}^{4} + γ_{1}^{3} α_{1}^{41} + γ_{1}^{2} α_{1}^{31} α_{1}^{42}) - (γ_{1}^{2} α_{1}^{32} α_{1}^{41} + α_{1}^{31} γ_{1}^{4} + α_{1}^{21} γ_{1}^{3} α_{1}^{42})] \\ - γ_{1}^{1} [(α_{1}^{21} α_{1}^{32} α_{1}^{43} + α_{1}^{41} + α_{1}^{23} α_{1}^{31} α_{1}^{42}) - (α_{1}^{23} α_{1}^{32} α_{1}^{41} + α_{1}^{31} α_{1}^{43} + α_{1}^{21} α_{1}^{42})] \end{array}

APPENDIX D: SIMULATION STUDY ON THE MODIFIED SYNTHESIS

We performed a simulation study to assess the performance of the modified method, as described in the discussion section, for the two independent-variable case when the vector of two covariates follows a bivariate normal distribution or bivariate log-normal distribution. We also compared this modified method with the other combining methods, including mean, median, minimum, and maximum of multiple estimates for a same regression parameter. For each of the three univariate linear models, E(Y | X₁), E(Y | X₂), and E(X₁ | X₂), there were the estimates from five different studies. We selected the sample size for each of the five studies for each univariate model to be equal (1000 and 100) or unequal (100, 200, 500, 1200, 3000) or (10, 20, 50, 120, 300). We assessed the performance of the modified synthesis method using the weighted mean, mean, median, minimum, and maximum of combing results from the five studies.

Since our results on the simulated data from the bivariate normal distribution are similar to those on the simulated data from the bivariate log-normal distribution, we only report the results on the bivariate normal distribution case. Tables DI–DIV show the bias and MSE for each of the regression parameters β₀, β₁, β₂ as well as the mean bias, MSE, correlation, and SEE (mean of SE estimates) for the predicted values.

Table DI.

Bias and MSE for estimated parameters with equal sample sizes.

Method	Bias			MSE
Method	β₀	β₁	β₂	β₀	β₁	β₂
Total sample size N = 1000 × 3 × 5 (equal sample size) = 15000
Weighted mean (Mean)	0.0023	0.0005	−0.0005	0.2126	0.0026	0.0068
Median	−0.0055	−0.0016	0.0007	0.3792	0.0099	0.0183
Minimum	0.0219	0.0075	−0.0036	0.5250	0.0140	0.0266
Maximum	−0.0428	−0.0084	0.0083	0.8344	0.0214	0.0399
Total sample size N = 100 × 3 × 5 (equal sample size) = 1500
Weighted mean (Mean)	0.1066	0.0107	−0.0272	2.8586	0.0708	0.1509
Median	0.1781	0.0286	−0.0433	4.2857	0.1156	0.2228
Minimum	−0.2240	−0.0181	0.0502	5.4686	0.1158	0.2820
Maximum	−0.1285	−0.0037	−0.0373	11.4781	0.3338	0.5221

Open in a new tab

Table DIV.

Mean Bias, MSE, Correlation and SEE for predicted values with unequal sample sizes.

Method	Mean bias	MSE	Correlation	SEE
Total sample size N = (100+200+500 +1200+3000) × 3 = 15 000
Weighted mean	0.0201	0.0994	0.9886	1.1105
Mean	−0.0219	0.1134	0.9825	1.2773
Total sample size N = (10+20+50+120+300) × 3 = 1500
Weighted mean	−0.01580	0.3394	0.9900	4.1135
Mean	0.1993	0.3550.	0.9789	4.3768

Open in a new tab

References

1.Hackam DG, Anand SS. Emerging risk factors for atherosclerotic vascular disease. A critical review of the evidence. Journal of the American Medical Association. 2003;290:932–940. doi: 10.1001/jama.290.7.932. [DOI] [PubMed] [Google Scholar]
2.Fruchart-Najib J, Bauge E, Niculescu LS, Pham T, Thomas B, Rommens C, Majd Z, Brewer B, Pennacchio LA, Fruchart JC. Mechanism of triglyceride lowering in mice expressing human apolipoprotein. Biochemical and Biophysical Research Communications. 2004;319:397–404. doi: 10.1016/j.bbrc.2004.05.003. [DOI] [PubMed] [Google Scholar]
3.Vasan RS. Biomarkers of cardiovascular disease: molecular basis and practical considerations. Circulation. 2006;113:2335–2362. doi: 10.1161/CIRCULATIONAHA.104.482570. [DOI] [PubMed] [Google Scholar]
4.Casella G, Berger RL. Statistical Inference. 2. Thomson Learning; Pacific Grove, CA: 2002. [Google Scholar]
5.Samsa G, Hu G, Root M. Combining information from multiple data sources to create multivariable risk models: illustration and preliminary assessment of a new method. Journal of Biomedicine and Biotechnology. 2005;2:113–123. doi: 10.1155/JBB.2005.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.National Center for Health Statistics. National Health and Nutrition Examination Survey (NHANES) 1999–2000 Available from: http://www.cdc.gov/nchs/about/major/nhanes/

[R1] 1.Hackam DG, Anand SS. Emerging risk factors for atherosclerotic vascular disease. A critical review of the evidence. Journal of the American Medical Association. 2003;290:932–940. doi: 10.1001/jama.290.7.932. [DOI] [PubMed] [Google Scholar]

[R2] 2.Fruchart-Najib J, Bauge E, Niculescu LS, Pham T, Thomas B, Rommens C, Majd Z, Brewer B, Pennacchio LA, Fruchart JC. Mechanism of triglyceride lowering in mice expressing human apolipoprotein. Biochemical and Biophysical Research Communications. 2004;319:397–404. doi: 10.1016/j.bbrc.2004.05.003. [DOI] [PubMed] [Google Scholar]

[R3] 3.Vasan RS. Biomarkers of cardiovascular disease: molecular basis and practical considerations. Circulation. 2006;113:2335–2362. doi: 10.1161/CIRCULATIONAHA.104.482570. [DOI] [PubMed] [Google Scholar]

[R4] 4.Casella G, Berger RL. Statistical Inference. 2. Thomson Learning; Pacific Grove, CA: 2002. [Google Scholar]

[R5] 5.Samsa G, Hu G, Root M. Combining information from multiple data sources to create multivariable risk models: illustration and preliminary assessment of a new method. Journal of Biomedicine and Biotechnology. 2005;2:113–123. doi: 10.1155/JBB.2005.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.National Center for Health Statistics. National Health and Nutrition Examination Survey (NHANES) 1999–2000 Available from: http://www.cdc.gov/nchs/about/major/nhanes/

PERMALINK

Synthesis analysis of regression models with a continuous outcome

Xiao-Hua Zhou

Nan Hu

Guizhou Hu

Martin Root

SUMMARY

1. INTRODUCTION

2. NEW METHOD FOR SYNTHESIS ANALYSIS

2.1. Estimation of synthesized parameters

2.2. Variance estimation

2.3. Variance of predicted value

2.4. Mean-squared error of the predicted value and correlation between predicted and observed values

3. SIMULATION STUDY

Table I.

Table II.

Table III.

Table IV.

4. EXAMPLE

Table V.

5. DISCUSSION

Table DII.

Table DIII.

Acknowledgments

APPENDIX A: SKETCH PROOF FOR UNIQUENESS OF INTERCEPT COEFFICIENT

APPENDIX B: SOLUTION FOR TWO PREDICTORS CASE

APPENDIX C: SOLUTION FOR THREE AND FOUR PREDICTORS

APPENDIX D: SIMULATION STUDY ON THE MODIFIED SYNTHESIS

Table DI.

Table DIV.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Synthesis analysis of regression models with a continuous outcome

Xiao-Hua Zhou

Nan Hu

Guizhou Hu

Martin Root

SUMMARY

1. INTRODUCTION

2. NEW METHOD FOR SYNTHESIS ANALYSIS

2.1. Estimation of synthesized parameters

2.2. Variance estimation

2.3. Variance of predicted value

2.4. Mean-squared error of the predicted value and correlation between predicted and observed values

3. SIMULATION STUDY

Table I.

Table II.

Table III.

Table IV.

4. EXAMPLE

Table V.

5. DISCUSSION

Table DII.

Table DIII.

Acknowledgments

APPENDIX A: SKETCH PROOF FOR UNIQUENESS OF INTERCEPT COEFFICIENT

APPENDIX B: SOLUTION FOR TWO PREDICTORS CASE

APPENDIX C: SOLUTION FOR THREE AND FOUR PREDICTORS

APPENDIX D: SIMULATION STUDY ON THE MODIFIED SYNTHESIS

Table DI.

Table DIV.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases