Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors

Dane R Van Domelen; Emily M Mitchell; Neil J Perkins; Enrique F Schisterman; Amita K Manatunga; Yijian Huang; Robert H Lyles

doi:10.1093/biostatistics/kxz028

. 2019 Aug 2;22(2):250–265. doi: 10.1093/biostatistics/kxz028

Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors

Dane R Van Domelen ^1,^✉, Emily M Mitchell ², Neil J Perkins ³, Enrique F Schisterman ³, Amita K Manatunga ⁴, Yijian Huang ⁴, Robert H Lyles ⁴

PMCID: PMC8035988 PMID: 31373355

SUMMARY

Measuring a biomarker in pooled samples from multiple cases or controls can lead to cost-effective estimation of a covariate-adjusted odds ratio, particularly for expensive assays. But pooled measurements may be affected by assay-related measurement error (ME) and/or pooling-related processing error (PE), which can induce bias if ignored. Building on recently developed methods for a normal biomarker subject to additive errors, we present two related estimators for a right-skewed biomarker subject to multiplicative errors: one based on logistic regression and the other based on a Gamma discriminant function model. Applied to a reproductive health dataset with a right-skewed cytokine measured in pools of size 1 and 2, both methods suggest no association with spontaneous abortion. The fitted models indicate little ME but fairly severe PE, the latter of which is much too large to ignore. Simulations mimicking these data with a non-unity odds ratio confirm validity of the estimators and illustrate how PE can detract from pooling-related gains in statistical efficiency. These methods address a key issue associated with the homogeneous pools study design and should facilitate valid odds ratio estimation at a lower cost in a wide range of scenarios.

Keywords: Biomarkers, Discriminant function, Gamma, Maximum likelihood, Measurement error, Pooling

1. Introduction

In the regression setting where measuring a continuous exposure requires an expensive assay, a pooling study design can be extremely cost-effective (Weinberg and Umbach, 1999; Mitchell and others, 2014; Lyles and others, 2016). We consider a design in which the assay is applied to pooled rather than individual biospecimen samples, with each pooled sample comprised of an equal volume from some number of like participants with respect to case status (all cases or all controls). Assuming the assay returns the mean biomarker level for members of a given pool, the logistic regression model provided by Weinberg and Umbach (1999) can be used to estimate the log-odds ratios (log-OR’s) of interest with poolwise data.

However, two types of error may affect pooled biomarker measurements and induce bias if ignored. Measurement error (ME) is extra variability due to assay imprecision, and processing error (PE) is extra variability due to physically combining biospecimens into pools (Schisterman and others, 2010). In a hybrid design that includes some single-specimen pools (“singles” or “pools of size 1”), ME would affect all assay measurements while PE would only affect multi-specimen pools. ME could be assumed non-existent or negligible in scenarios where the assay is known to be highly accurate, but it seems generally dubious to assume no PE. That would require precise formation of exactly equal-volume pools, complete mixing, and no changes in the pooled biomarker concentration due to cross-reactions from mixing biospecimen samples from different subjects.

Following the framework of Schisterman and others (2010), Lyles and others (2015) used maximum likelihood (ML) to estimate the covariate-adjusted log-OR for a pooled exposure subject to ME and PE. They used a discriminant function approach in which the exposure log-OR is estimated not from a logistic regression, but from a linear regression of the exposure on case status and covariates. The primary assumptions were as follows: (i) exposure level given case status and covariates is normally distributed with homoscedastic errors; (ii) MEs and PEs are additive, independent, and normally distributed with mean 0 and variances Inline graphic and (independent of pool size); and (iii) MEs affect all measurements, while PEs only affect pools of size 2 or larger. Advantages of this approach include its computational simplicity, its applicability to designs with homogeneous or heterogeneous pools with respect to case status, the availability of a small-sample bias correction, and the ability to correct for both ME and PE without replicate assay measurements, provided there are at least three different pool sizes including pools of size 1. A notable disadvantage is that it produces a log-OR estimate for the pooled exposure, but not for covariates.

Van Domelen and others (2018) relied on similar error assumptions as Schisterman and others (2010) and Lyles and others (2015) to correct for assay errors in fitting the Weinberg and Umbach (1999) poolwise logistic regression model. Taking a classical ME modeling approach, they wrote the likelihood contribution for the Inline graphic pool as the product of three densities—case status given true biomarker level and covariates, imprecise biomarker level given true biomarker level, and true biomarker level given covariates—with the unobserved true biomarker level integrated out. This approach allows estimation of all coefficients in the logistic regression of interest.

An important limitation of the Lyles and others (2015) and Van Domelen and others (2018) methods is that they assume normally distributed biomarkers (conditional on covariates), whereas biomarkers are often skewed. In this work, we address this limitation by providing Gamma model-based analogues to both of these approaches.

In considering how to adapt the Van Domelen and others (2018) logistic regression to account for skewness, a natural idea is to assume a linear model for the log-biomarker level given covariates. This would imply a linear model for the sum of the log-biomarker levels for members of a given pool vs. summed covariates. But the summed log-biomarker level cannot be recovered from the poolwise mean, which is what the assay is assumed to measure (Mitchell and others, 2014).

We propose an alternative approach with a more convenient poolwise-sum result: a Gamma regression model for biomarker level given covariates. Following Mitchell and others (2015) “alternative gamma” model, we assume a constant scale parameter and covariates linearly related to the log of the shape parameter. This implies each covariate is linearly related to the log of the expected value of the biomarker, and the variance of the biomarker is directly proportional to the expected value. Assuming independence among members of a pool, the summed biomarker level is also Gamma distributed. Unlike the log-transformed linear regression, this result is compatible with observed data, provided individual-level covariates are available.

As for the errors, strictly positive poolwise measurements are more compatible with mean-1 multiplicative errors than mean-0 additive errors. The latter would be incongruent with observed measurements having considerable density around 0, but no 0’s or negative values, as is the case in our motivating example. So we assume the PE and ME are mean-1 lognormal and act multiplicatively on the true poolwise means, using a generally similar set of assumptions as in previous work (Schisterman and others, 2010; Lyles and others, 2015). We accommodate replicate assay measurements, which are not strictly required for identifiability but may help stabilize ML estimation at feasible sample sizes. An interesting feature of the Gamma setup is that correcting for errors is theoretically possible even with no replicates and a single pool size.

To adapt the Lyles and others (2015) discriminant function approach to account for right-skewed biomarkers, we generalize the framework of Whitcomb and others (2012) to include covariates and multiplicative errors. The Gamma discriminant function model is similar to the Gamma model just described, but with a different scale parameter for cases and controls and a case status coefficient incorporated into the shape parameter. Applying Bayes rule produces an expression for the adjusted log-OR for Inline graphic , which is constant (and the same expression as in Whitcomb and others (2012)) only when the case status coefficient is 0. This is analogous to the normal discriminant function model, where different residual error variances for cases and controls implies a logistic regression model with a quadratic term for Inline graphic and thus a non-constant odds ratio (Cornfield, 1962). This approach extends to the pooling scenario and permits incorporating multiplicative lognormal errors as in the logistic regression approach.

Our motivating example is estimation of the covariate-adjusted log-OR relating serum levels of a positive, right-skewed cytokine to odds of miscarriage, with the cytokine measured in pools of size 1 (some with replicates) and 2. We use Akaike information criterion (AIC) to confirm better model fit for the Gamma analogues of the previously developed logistic regression and discriminant function approaches and perform simulations to confirm validity, gauge whether estimation is reasonably stable in scenarios where the Gamma models are identifiable but the normal models are not, and examine how errors affect the relative efficiency of pooling vs. traditional designs.

2. Methods

2.1. Homogeneous pools logistic regression

Consider a design in which a continuous biomarker Inline graphic is measured in pooled samples, with the sample () comprised of equal-volume aliquots from participants () that are either all cases () or all non-cases (), but assumed independent of each other given case status.

The assay is assumed to produce the arithmetic mean biomarker level for members of the Inline graphic pool, , from which the summed biomarker level can be calculated as . Weinberg and Umbach (1999),Weinberg and Umbach (2014) showed that if a logistic regression model relates individual-level case status to exposure and covariates , i.e. , then the corresponding model relating to and Inline graphic is:

(2.1)

with the offset Inline graphic defined as:

(2.2)

where Inline graphic and are accrual probabilities for individual cases and controls, and and are the total number of cases and controls sampled (Weinberg and Umbach, 1999, 2014). The first term is 0 for prospective or cross-sectional sampling, where . For case–control sampling it is nonzero, but omitting it (e.g. if accrual probabilities are unknown) only compromises validity of the estimated intercept Inline graphic . Thus with poolwise observations and no errors in one can fit 2.1 to estimate , the same parameters that would be targeted via a traditional study without biospecimen pooling.

2.2. Normal logistic regression (NLR)

Here, we describe the Van Domelen and others (2018) normal- Inline graphic logistic regression (“NLR”) approach for handling errors. Suppose the assay does not return the true poolwise mean biomarker level , but rather an imprecise version, , from which the imprecise poolwise sum can be calculated as . To immediately accommodate replicate measurements on the same pool, suppose there are Inline graphic such measurements for the pool, such that are observed and can be calculated. The likelihood contribution is , which can be factored:

(2.3)

where Inline graphic . Van Domelen and others (2018) make two assumptions to arrive at the second line of 2.3: (i) the imprecise does not inform given the true (and covariates), and (ii) errors in are unrelated to covariates. These are fairly standard assumptions for ME problems, although as a reviewer noted it is conceivable that the second assumption might be violated, say if some other biomarker in the blood cross-reacts with the biomarker of interest and is itself correlated with covariates. For the purposes of this article, we follow the framework of Lyles and others (2015) and Van Domelen and others (2018) and leverage these two assumptions.

The first term under the integral is specified by 2.1. For the second term, suppose there is additive mean-0 ME and PE acting on the poolwise mean Inline graphic such that can be written:

(2.4)

The errors are assumed to be independent of each other, independent of the true Inline graphic and , and independent of pool size. Errors are also assumed to be normally distributed: and , where is a -dimensional identity matrix. This suggests each of the assay measurements is the sum of the true poolwise mean, a common PE if , and a unique ME, with all errors independent of each other. Multivariate normal theory leads to:

(2.5)

where Inline graphic .

For the third term in 2.3, an individual-level normal linear regression leads to the poolwise result Inline graphic with:

(2.6)

Optimization routines can be used to obtain ML estimates (MLEs) for Inline graphic , and standard errors estimated as the inverse of the numerically approximated Hessian matrix at the MLEs. While a closed-form approximation for the integral is available (Carroll and others, 1984, 2006; Lyles and Kupper, 2013), we consider full ML, which requires integrating out the unobserved Inline graphic ’s numerically for each pool at each iteration.

2.3. Gamma logistic regression (GLR)

We propose a Gamma- Inline graphic logistic regression approach (“GLR”) to accommodate a right-skewed biomarker. The likelihood in 2.3 carries over, but we specify different models for the second and third densities. For the typical case where an assay produces strictly positive measurements, we assume mean-1 lognormal errors acting multiplicatively on the poolwise mean. The analogue of 2.4 becomes:

(2.7)

with Inline graphic and .

To determine the form of Inline graphic , note that . The product of the independent lognormal error terms, , is multivariate lognormal: , and thus is also multivariate lognormal:

(2.8)

Note that 2.8 can be viewed as a multiplicative lognormal analogue of 2.5.

For Inline graphic , we assume a constant-scale Gamma model for individual biomarker levels: . This implies , such that there is a monotone, non-linear relationship between each covariate and the expected value of the biomarker. It also implies , which means , i.e. the variance is directly proportional to the mean.

The sum of independent Gamma variables with shape parameters Inline graphic and the same scale parameter is , so the poolwise sum biomarker level is distributed as follows:

(2.9)

The likelihood is now fully specified for GLR, with the three densities in the 2.3 likelihood given by 2.1, 2.8, and 2.9. Similar computational procedures as in Van Domelen and others (2018) can be used to obtain MLEs and Hessian-based standard errors; the unobserved Inline graphic ’s have to be integrated out numerically.

2.4. Normal discriminant function approach (NDFA)

Lyles and others (2015) proposed a normal- Inline graphic discriminant function approach (“NDFA”) to estimate the log-OR for a biomarker measured in pools while correcting for additive normal errors. The assumed model for individual biomarker level given case status and covariates is: . If this model holds, the quantity represents the covariate-adjusted log-OR relating Inline graphic and . The corresponding poolwise sum model is:

(2.10)

where Inline graphic is the number of cases in the pool, which is under homogeneous pooling. The likelihood contribution for the observed (allowing for replicates) is . The same additive normal error assumptions as for NLR lead to:

(2.11)

This closed-form likelihood can be maximized to obtain Inline graphic , , and . A delta method-based variance estimator is with .

2.5. Gamma discriminant function approach (GDFA)

Towards a Gamma- Inline graphic discriminant function approach (“GDFA”), we assume the following individual-level Gamma model:

(2.12)

The scale parameter is Inline graphic for cases and for controls. Applying Bayes rule and taking the logit gives:

(2.13)

where Inline graphic is a constant term which is a function of but not . In general, the log-OR for a 1-unit increase in is given by:

(2.14)

The log-OR depends on Inline graphic (and ) unless , in which case it simplifies to . This is very similar to the Whitcomb and others (2012) scenario without covariates, where the log-OR is constant if the shape parameter is the same for cases and controls.

For poolwise data, if the Inline graphic pool is comprised of cases or controls with distributed as in 2.12, then the poolwise sum biomarker level is distributed:

(2.15)

To incorporate errors, we make the same multiplicative lognormal error assumptions as for GLR. The likelihood for the Inline graphic pool with replicates is (utilizing the two simplifying assumptions from Section 2.2), with the terms given by 2.8 and 2.15. The ’s are integrated out numerically.

2.6. Implementation

We previously developed the R (R Core Team, 2019) package pooling (Van Domelen, 2019) with functions for fitting poolwise regression models, including p_ndfa (originally named p_dfa_xerrors) for implementing the Lyles and others (2015) NDFA approach and Inline graphic for the Van Domelen and others (2018) NLR approach. We have added two functions for the Gamma methods introduced in this article: p_gdfa for GDFA and p_logreg_xerrors2 for GLR. Function inputs include the data, an indicator for which error types to model, and various options for likelihood maximization; outputs include the MLEs, a variance–covariance matrix, and AIC.

Numerical integration, which is necessary for NLR, GLR, and GDFA, is performed via the hcubature function from the cubature package v. 2.0.3 (Narasimhan and others, 2018). For log-likelihood maximization, we use the nlminb function in base R, which implements a quasi-Newton method that accommodates bounds. Starting values of 0.01 and 1 are used for regression coefficients and variance terms, respectively, and lower bounds of 0.0001 are used for variance terms (note: this may not be appropriate for biomarkers on a different scale). Hessian matrices are approximated numerically via hessian from numDeriv v. 2016.8-1 (Gilbert and Varadhan, 2016).

Some additional details on optimization and how occasional numerical issues were handled are included as Appendix B of the supplementary material available at Biostatistics online.

2.7. Reproducibility

A figshare depository is available at https://figshare.com/s/0f73bf55d944e7158f31 with the following items: (i) a snapshot build of the pooling package (pooling.tar.gz and source_files.zip), which includes a simulated dataset intended to mimic our motivating example; (ii) R code for reproducing our analyses on that dataset (biostatistics_analysis.R); and (iii) R code for running one trial of each simulation (run_simulations.R).

3. Results

3.1. Motivating example: MCP-1 and odds of miscarriage

We use data from a nested case–control follow-up study to the Collaborative Perinatal Project (CPP) (Hardy, 2003; Whitcomb and others, 2007) to estimate the association between serum levels of the cytokine monocyte chemotactic protein (MCP-1) and odds of miscarriage controlling for mother’s age, race, and current smoking. Our dataset consists of 126 single-specimen pools, 30 of which have replicate MCP-1 measurements, and 280 pools of size 2. The pools of size 2 are all homogeneous with respect to case status, i.e. contain samples from women whose pregnancies both did or did not result in miscarriages.

Given that the 126 single-specimen pools are not subject to PE, and the 30 replicates suggest only a small amount of ME (Figure 1a of the supplementary material available at Biostatistics online), a histogram of these values should give a reasonable indication of the marginal MCP-1 distribution (Figure 1b of the supplementary material available at Biostatistics online). The data are more compatible with lognormal and Gamma distributions than normal.

Table 1 summarizes model fits for the two logistic regression methods, using all available data including replicates and modeling both error types. Covariates Inline graphic – represent mother’s age, non-white race, and current smoking, respectively, and represents the pooled exposure MCP-1. The ’s are logistic regression coefficients in 2.1, the ’s, , and are parameters in the models (2.6 for NLR, 2.9 for GLR), and are the PE and ME variances.

Table 1.

Logistic regression fits for odds of miscarriage. Values are point estimates (standard errors). Inline graphic represents the covariate-adjusted log-OR relating MCP-1 and miscarriage

Naive	NLR	GLR
	(AIC = 2340.8)	(AIC = 1787.5)
1.57 (0.37)	1.58 (0.37)	1.60 (0.39)
0.01 (0.02)	0.05 (0.08)	0.05 (0.12)
0.04 (0.01)	0.04 (0.01)	0.04 (0.01)
0.56 (0.18)	0.57 (0.18)	0.57 (0.18)
0.34 (0.16)	0.34 (0.16)	0.34 (0.16)
—	0.50 (0.38)	0.38 (0.26)
—	0.03 (0.01)	0.01 (0.01)
—	0.17 (0.17)	0.33 (0.11)
—	0.02 (0.16)	0.01 (0.09)
—	—	0.69 (0.08)
—	1.58 (0.21)	—
—	0.73 (0.18)	0.62 (0.09)
—	0.11 (0.03)	0.02 (0.01)

Open in a new tab

AIC favored GLR over NLR. The estimated log-OR was higher for NLR and GLR than for the naive poolwise logistic regression ignoring MCP-1 errors but still not significantly different from 0. The other logistic regression coefficients were virtually identical for the three models. Both NLR and GLR suggested much larger PE than ME.

Notably, if the replicate MCP-1 measurements had not been included, the NLR model could not be fit with both PE and ME, while the GLR model could. For identifiability, NLR would require a third pool size in addition to 1 and 2 (further details on identifiability requirements are provided in Appendix A of the supplementary material available at Biostatistics online). GLR fit without replicates gave a somewhat larger log-OR ( Inline graphic , ) and very different variance estimates (, ) compared with the fit with replicates. It is unclear whether GLR’s identifiability is practical in this scenario, given the estimates here and the instability of NLR without replicates reported by Van Domelen and others (2018) in what seemed to be an easier identifiability scenario. It may be that GLR’s identifiability is not practical when the pooled biomarker is close to being normally distributed (Carroll and others, 2006). We explore this issue later via simulations.

Table 2 summarizes fits for the two discriminant function methods, under the simplifying assumptions such that the log-OR is constant with Inline graphic . For NDFA, the parameters are from the model in 2.10; for GDFA, are from the model in 2.15 with set to 0.

Table 2.

Discriminant function approach estimates for odds of miscarriage. Values are point estimates (standard errors)

	NDFA	GDFA
	(AIC = 1796.5)	(AIC = 1242.9)
	0.50 (0.38)	0.41 (0.26)
	0.08 (0.13)	—
	0.02 (0.01)	0.01 (0.01)
	0.19 (0.17)	0.34 (0.11)
	0.01 (0.16)	0.02 (0.09)
	—	0.72 (0.09)
	—	0.67 (0.09)
	1.58 (0.21)	—
	0.73 (0.18)	0.62 (0.09)
	0.11 (0.03)	0.02 (0.01)
log-OR	0.05 (0.08)	0.10 (0.13)

Open in a new tab

Results were similar to Table 1 in that AIC favored the Gamma approach over normal and neither suggested a significant association between MCP-1 and odds of miscarriage. Also mirroring the logistic regression results, GDFA parameters were identifiable without replicates ( Inline graphic , , , ) while NDFA parameters were not.

The constant log-OR models reported in Table 2 are the result of restrictions corresponding to testable hypotheses. For NDFA, the log-OR is constant if the residual error variance in the Inline graphic model is the same for cases and controls, i.e. under (see 2.10). For GDFA, it is constant under (see 2.15). Likelihood ratio tests did not reject (, ) or (, ).

Despite not rejecting Inline graphic , one could visualize the non-constant log-OR implied by the fitted GDFA model with by plotting the estimated log-OR vs. MCP-1. This is somewhat complicated by the fact that the log-OR also depends on covariates. With two binary and one continuous covariate, we plotted the association for the four combinations of race and smoking, each with age held fixed at its median of 26 years (Figure 2 of the supplementary material available at Biostatistics online). The curves suggest higher MCP-1 levels are associated with higher odds of miscarriage at the lower end of the MCP-1 range and lower odds of miscarriage at the upper end; however, confidence bands are compatible with no association over the entire range.

In summary, the Gamma models fit the CPP data better than the corresponding normal models and were noteworthy in that they could be fit without replicates. Substantive results were similar for all four methods: the estimated log-OR is small, there is little evidence of an association between MCP-1 and odds of miscarriage, and poolwise MCP-1 measurements seem to be more severely impacted by PE than by ME.

3.2. Simulation study

The purpose of the first simulation study is to confirm validity of the proposed Gamma methods, while also gauging robustness of the four methods to model misspecification. For each of the four methods, data were generated under the corresponding models described in Sections 2.2, 2.3, 2.4, or 2.5, mimicking the CPP data and using the CPP point estimates for parameter values (see Tables 1 and 2). In each case, log-OR’s were estimated by fitting the data-generating model as well as the three others.

For each trial under GLR, individual-level covariates ( Inline graphic = mother’s age, = non-white race, = current smoking) were generated independently for 686 participants as follows: with sampling probabilities equal to the CPP proportions; ; and . Individual-level (MCP-1) values were then generated from the individual version of 2.9 with , , and Inline graphic , and (miscarriage) generated from the individual version of 2.1 with , (increased from ), and . Observations were then split into cases and controls. The cases were randomly formed into (rounded up) pools of size 2 and the rest left as singles, and similarly for the controls, to produce an approximately equal number of pools of size 1 and 2.

Poolwise means Inline graphic were then calculated, multiplied by mean-1 lognormal PEs with (if ) and mean-1 lognormal MEs with , and multiplied by to produce imprecise poolwise sums . For 30 randomly selected single-specimen pools, was generated from the same process but for two independent MEs rather than one.

For GDFA, Inline graphic was generated via the same process, based on a logistic regression with and , and based on 2.12 with , , , , and (increased from to induce a log-OR of 0.15). Poolwise data were generated via the same process as above, again with and .

For NLR, individual-level Inline graphic were generated from the individual version of 2.6 with and . To avoid negative ’s, which would preclude fitting GLR and GDFA, was set to 6.5 (increased from ). was generated from the individual version of 2.1 with , (increased from ), and . Error variances were set to and .

For NDFA, Inline graphic was generated as for GDFA, and based on the individual version of 2.10 with , (increased from to avoid negative ’s), (increased from to induce a log-OR of 0.15), and . Error variances were set to and .

Results are summarized in Table 3. For data generated under GLR, the naive poolwise logistic regression (i.e. ignoring PE and ME) underestimated the true log-OR and had poor confidence interval (CI) coverage. The correctly specified GLR estimator had a slight upward bias and nominal coverage; GDFA performed about the same as GLR. NLR and NDFA performed surprisingly well despite assuming additive normal rather than multiplicative lognormal errors; they were nearly unbiased, only slightly less efficient than the Gamma methods and had approximately nominal coverage. For data generated under GDFA, all four methods were unbiased and had good coverage, while the correctly specified GDFA estimator was slightly more efficient than GLR (SD = 0.123 vs. 0.125) and moderately more efficient than NLR (SD = 0.133) and NDFA (SD = 0.134). For data generated under NLR and NDFA, all four methods performed well; surprisingly, GDFA had a slight efficiency advantage.

Table 3.

Simulation results for estimating log-odds ratios with an approximately equal number of pools of size 1 (30 with replicates) and 2 (1000 trials)

	log-OR = 0.15
					95% CI	95% CI	95% CI	95% CI
	Mean bias	SD	Mean SE	MSE	coverage	coverage	coverage	coverage
Generated GLR
Naive	0.100	0.056	0.051	0.013	0.470	0.947	0.943	0.957
GLR	0.007	0.129	0.127	0.017	0.948	0.951	0.945	0.957
NLR	0.005	0.140	0.135	0.020	0.948	0.956	0.940	0.956
GDFA	0.005	0.126	0.125	0.016	0.942	—	—	—
NDFA	0.002	0.137	0.133	0.019	0.940	—	—	—
Generated GDFA
Naive	0.106	0.055	0.049	0.014	0.392	—	—	—
GLR	0.000	0.125	0.123	0.016	0.958	—	—	—
NLR	0.000	0.133	0.131	0.018	0.959	—	—	—
GDFA	0.001	0.123	0.121	0.015	0.953	—	—	—
NDFA	0.000	0.134	0.130	0.018	0.951	—	—	—
Generated NLR
Naive	0.061	0.052	0.054	0.006	0.777	0.950	0.957	0.945
GLR	0.001	0.087	0.089	0.008	0.958	0.946	0.958	0.940
NLR	0.006	0.088	0.091	0.008	0.958	0.950	0.963	0.944
GDFA	0.002	0.085	0.087	0.007	0.957	—	—	—
NDFA	0.004	0.087	0.090	0.008	0.956	—	—	—
Generated NDFA
Naive	0.060	0.049	0.049	0.006	0.752	—	—	—
GLR	0.001	0.082	0.081	0.007	0.955	—	—	—
NLR	0.004	0.084	0.083	0.007	0.959	—	—	—
GDFA	0.006	0.077	0.078	0.006	0.961	—	—	—
NDFA	0.004	0.081	0.082	0.007	0.961	—	—	—

Open in a new tab

AIC favored GLR over NLR in 100% of trials generated under GLR, GDFA over NDFA in 100% of trials under GDFA, NLR over GLR in 96.8% of trials under NLR, and NDFA over GDFA in 98.3% of trials under NDFA.

A reviewer asked about performance when Inline graphic is similar or larger in magnitude than , so we re-ran Table 3 with the error variances flipped (see Table 3 of the supplementary material available at Biostatistics online). The correctly specified models performed fairly well, although all four had some upward mean bias. GLR and GDFA performed well under misspecification, while NLR and NDFA performed somewhat poorly for data generated under GLR and GDFA. Perhaps the robustness of NLR and NDFA in Table 3 stemmed from having very small ME and thus many observations that were nearly error-free (the singles).

The next set of simulations is aimed at assessing whether the Gamma models’ identifiability absent replicates is practically useful. We consider the CPP scenario: pools of size 1 and 2 and biomarker measurements subject to multiplicative lognormal PE and ME. Data were generated under GLR and GDFA in the same manner as in previous simulations, but for various sample sizes, with and without the 30 replicates. After initially observing good performance without replicates despite Inline graphic frequently hitting the lower bound of 0.0001, simply because the ME was nearly small enough to ignore, we increased from 0.02 to 0.2 and decreased from 0.62 to 0.42. In trials where or hit the 0.0001 boundary, PE-only and ME-only models were fit and the one with lower AIC selected. Results are summarized in Table 4.

Table 4.

Simulation results for estimating log-odds ratio with an approximately equal number of pools of size 1 and 2, with and without replicates (1000 trials, true value = 0.15)

	GLR				GDFA
	Mean	Median	95% CI	Median	Mean	Median	95% CI	Median
	bias	bias	coverage	CI width	bias	bias	coverage	CI width
n = 686
No replicates	0.023	0.011	0.978	0.560	0.141	0.010	0.970	0.543
30 replicates	0.010	0.006	0.961	0.546	0.009	0.006	0.964	0.518
n = 2000
No replicates	0.005	0.003	0.951	0.319	0.011	0.005	0.950	0.308
30 replicates	0.004	0.005	0.951	0.316	0.009	0.003	0.946	0.303

Open in a new tab

Overall performance was surprisingly good for the no-replicates estimators, although there was upward mean bias at n = 686 due to occasional extreme log-OR estimates ( Inline graphic in 3 trials for GLR, 3 trials for GDFA). CIs were wider without replicates, but not much, especially for n = 2000. The ME variance estimate occasionally hit 0.0001 for the no replicates scenarios (3.8% of trials for GLR and 2.8% for GDFA at n = 686, 0.1% for GLR and 0.3% for GDFA at n = 2000), while the PE variance estimate Inline graphic never did.

Next, we compare efficiency of a pooling vs. traditional design for the same number of total assays in a no-ME (PE only) scenario, where the pooling design is perhaps most attractive. For each trial we generate 686 observations via the same procedure as previously under GDFA. For the pooling design, we form Inline graphic (rounded up) case pools of size 4 and leave the remaining cases as singles, and similarly for controls, to produce approximately twice as many pools of size 4 as there are singles. For the traditional design, we randomly sample the same number of cases and controls as there are case pools and control pools in the same trial and obtain individual-level Inline graphic values, which are precise because singles are not impacted by PE. Figure 1 shows that the pooling design is more efficient than the traditional design for small , but that efficiency advantage erodes and eventually reverses as gets larger. This agrees with two-sample t-test efficiency arguments of Van Domelen and others (2018).

Fig. 1. — Boxplots of log-odds ratio estimates for pooling and traditional designs (1000 trials each, true value = 0.15).

We performed additional simulations to address issues raised by reviewers; results are included in the supplementary material available at Biostatistics online. In addition to those already described, we re-ran Table 3 with a negative log-OR (Table 4 of the supplementary material available at Biostatistics online) and a null log-OR (Table 5 of the supplementary material available at Biostatistics online: type-1 error rates roughly nominal) and assessed performance of the Gamma methods under misspecified error structures (Tables 6 and 7 of the supplementary material available at Biostatistics online: estimation was robust with mean-1 Gamma and uniform errors).

4. Discussion

We have presented two Gamma-based methods for estimating the adjusted log-OR relating a binary outcome to a continuous exposure measured in pools and subject to errors. This work integrates the poolwise logistic regression approach of Weinberg and Umbach (1999) with the error modeling assumptions of Schisterman and others (2010) and the discriminant function ideas of Lyles and others (2015) and Whitcomb and others (2012). Accommodating skewed biomarkers should broaden the scope of scenarios where a highly cost-effective homogeneous pools study design can be utilized.

The homogeneous pools design is compelling as it offers potentially large gains in statistical power over a traditional design (e.g. Figure 1 with Inline graphic ). Absent errors, there would be no need to worry about the distribution of the biomarker; one could simply fit the Weinberg and Umbach (1999) logistic regression model. However, while assay ME may be negligible in certain scenarios, we believe negligible PE is a strong and seldom justifiable assumption. In our motivating example, the estimated PE variance was much larger than the estimated ME variance according to all four corrective methods, and together these errors were much too large to ignore (e.g. the naive logistic regression estimator was badly biased in Table 3 simulations). Thus, performing valid inference with poolwise data will typically require error modeling. Our Gamma-based methods extend prior approaches to accommodate skewed biomarkers, which tend to be much more common than normally distributed biomarkers.

Our methods are not limited to the pooling scenario for which they were developed. A special case for all four methods is all Inline graphic , i.e. a traditional design with no pooling. Our R functions apply to a wide range of scenarios for estimating exposure–disease associations. They can handle pooling or traditional designs with or without covariates, for a normal or skewed exposure measured precisely or with errors (additive or multiplicative), incorporating replicates if available, and either assuming a constant log-OR or allowing it to vary with exposure level and covariates.

One potential problem with the logistic regression methods is that they are based on a likelihood function that assumes prospective sampling. The Inline graphic part is not problematic given the Prentice and Pyke (1979) results, but the model could be affected by case-oversampling. That is, even if the individual-level models (linear regression for NLR, constant-scale Gamma for GLR) are correctly specified for the population, that relationship may not hold within cases and controls, and thus may not hold in a case–control study where the proportion of cases is far higher than in the population. Guolo (2008) suggests that using the prospective likelihood is valid if the specified distribution for the error-prone covariate ( Inline graphic in our framework) is correct in the case–control sampling scheme, which is intuitive. Specifying and assessing a model for an imperfectly measured exposure is typically one of the hardest parts of a ME correction (Carroll and others, 2006). But a unique feature of the pooling context is that if there is PE only, the singles are actually precisely measured, and thus the Inline graphic model can be directly assessed with that data. So our logistic regression methods should be valid in case–control studies, provided the model is supported by the data on hand, which can be directly assessed in certain cases. The discriminant function methods are based on models for Inline graphic and are therefore unaffected by sampling rates for .

For the same reason, the discriminant function methods can immediately accommodate covariate-dependent or “(y,c)-pooling” (Lyles and others, 2016), where pools are formed on similar covariate values in addition to like case status. This was noted by Lyles and others (2015) and the same result holds for our Gamma extension. The rationale is that members of a pool formed on Inline graphic ’s and are still independent given the ’s and ’s, and that conditional independence is sufficient to justify the poolwise-sum results for the ’s. Adapting the logistic regression methods to accommodate (y,c)-pooling should be possible, although we leave this as future work. Alternatively, covariate-dependent pooling may fit nicely into the conditional logistic regression framework of Saha-Chaudhuri and others (2011), for which we are examining extensions to handle errors in a similar manner as our NLR and GLR approaches.

A natural problem for investigators considering pooling is how to choose design parameters, especially the pool size(s), whether to include replicates, and how to form pools. These are difficult questions. Even in the two-sample t-test setting, the optimal pool size (and number of pools needed for target power) is highly sensitive to magnitudes of MEs and PEs and the relative cost to run each assay and to recruit each subject. We do not wish to make broad recommendations but will share that we currently favor a design with pools of size 1 (with replicates, if the assay has ME) and one other pool size. The larger pools drive efficiency gains; a single non-unity pool size avoids having to specifying the relationship between pool size and PEs; and replicate singles isolate MEs, which stabilizes estimation of parameters including the log-OR of primary interest. Some of the more nuanced scenarios that permit identifiability may be useful for analyzing existing data, but we would not recommend leveraging them in the design of new studies. For example, Van Domelen and others (2018) reported that NLR and NDFA were somewhat unstable under a “two pool sizes, neither of which is 1” design with both error types and no replicates. As for deciding which method to use, AIC may be helpful for choosing normal vs. Gamma models. While the discriminant function methods are somewhat obscure compared with logistic regression, they may offer better precision when the relevant distributional assumptions are met (Lyles and others, 2009).

As a reviewer noted, the idea of replicates is somewhat paradoxical, as a pooling design might be chosen for the very purpose of reducing the number of assays that are required. This brings to mind issues of optimal design, which may warrant future work. So far, our impression from simulation studies is that a small number of replicate singles can greatly improve stability, likely justifying the additional costs (e.g. about 7% more assays in our motivating example).

While both ME and PE have the effect of reducing the efficiency advantage of a pooling design over a traditional design, PE is particularly worrisome because it can render the pooling design counterproductive. In fact, this may have occurred in our motivating example. The GDFA model gave Inline graphic , and in simulations mimicking the CPP data the pooling design was less efficient than traditional for (Figure 1). Absent PE and ME, pooling designs offer gains in statistical efficiency limited only by the number of samples that can feasibly be combined in the lab. With PE, if is large enough, the pooling design may be less efficient than traditional for the same number of assays regardless of how large the pools are. Adaptive study designs could be considered, whereby a pooling study is initiated, but a stopping rule is in place to transition to all Inline graphic if it becomes clear that is prohibitively large. We note that pooling may be warranted regardless of in cases where it is used to reach minimum assay volumes.

In future applied work, it will be valuable to search for ways to minimize PE and determine whether certain types of biospecimens (blood, saliva, etc.) are more or less susceptible to PE. On the statistical side, the assumption that the PE variance is constant with pool size needs to be vetted and perhaps modified, as it seems likely that larger pools would have larger errors. This is a key assumption that directly affects identifiability requirements and efficiency results. Additionally, it would be useful to develop less parametric approaches for improved robustness, ideally relaxing distributional assumptions on the errors and not having to specify the exposure given covariates distribution.

In summary, the two Gamma approaches presented here, in conjunction with the normal versions previously developed (Van Domelen and others, 2018), permit valid odds ratio estimation with a normal or skewed biomarker measured in pools and subject to errors. These methods help to quell an important concern associated with the Weinberg and Umbach (1999) homogeneous pools design, making this very cost-effective design more feasible to deploy.

Supplementary Material

kxz028_Supplementary_Material

Click here for additional data file.^{(765.7KB, pdf)}

Acknowledgments

Conflict of Interest: None declared.

Funding

This research was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903. The views expressed in this article are those of the authors, and no official endorsement by the Department of Health and Human Services, or the Agency for Healthcare Research and Quality, or the National Science Foundation, is intended or should be inferred.

References

Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. CRC Press, Boca Raton, FL. [Google Scholar]
Carroll, R. J., Spiegelman, C. H., Lan, K. K. G., Bailey, K. T. and Abbott, R. D. (1984). On errors-in-variables for binary regression models. Biometrika 71, 19–25. [Google Scholar]
Cornfield, J. (1962). Joint dependence of risk of coronary heart disease on serum cholesterol and systolic blood pressure: a discriminant function analysis. Federation proceedings 21, 58–61. [PubMed] [Google Scholar]
Gilbert, P. and Varadhan, R. (2016). numDeriv: Accurate Numerical Derivatives. R package version 2016.8-1. https://CRAN.R-project.org/package=numDeriv. [Google Scholar]
Guolo, A. (2008). A flexible approach to measurement error correction in case–control studies. Biometrics 64, 1207–1214. [DOI] [PubMed] [Google Scholar]
Hardy, J. B. (2003). The collaborative perinatal project: lessons and legacy. Annals of Epidemiology 13, 303–311. [DOI] [PubMed] [Google Scholar]
Lyles, R. H., Guo, Y. and Hill, A. N. (2009). A fresh look at the discriminant function approach for estimating crude or adjusted odds ratios. The American Statistician 63, 320–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lyles, R. H. and Kupper, L. L. (2013). Approximate and pseudo-likelihood analysis for logistic regression using external validation data to model log exposure. Journal of Agricultural, Biological, and Environmental Statistics 18, 22–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lyles, R. H., Mitchell, E. M., Weinberg, C. R., Umbach, D. M. and Schisterman, E. F. (2016). An efficient design strategy for logistic regression using outcome-and covariate-dependent pooling of biospecimens prior to assay. Biometrics 72, 965–975. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lyles, R. H., Van Domelen, D., Mitchell, E. M. and Schisterman, E. F. (2015). A discriminant function approach to adjust for processing and measurement error when a biomarker is assayed in pooled samples. International Journal of Environmental Research and Public Health 12, 14723–14740. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mitchell, E. M., Lyles, R. H., Manatunga, A. K., Danaher, M., Perkins, N. J. and Schisterman, E. F. (2014). Regression for skewed biomarker outcomes subject to pooling. Biometrics 70, 202–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mitchell, E. M., Lyles, R. H. and Schisterman, E. F. (2015). Positing, fitting, and selecting regression models for pooled biomarker data. Statistics in Medicine 34, 2544–2558. [DOI] [PMC free article] [PubMed] [Google Scholar]
Narasimhan, B., Johnson, S. G., Hahn, T., Bouvier, A., and Kiãšu, K. (2018). cubature: Adaptive Multivariate Integration over Hypercubes. R package version 2.0.3. https://CRAN.R-project.org/package=cubature. [Google Scholar]
Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66, 403–411. [Google Scholar]
R Core Team. (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
Saha-Chaudhuri, P., Umbach, D. M. and Weinberg, C. R. (2011). Pooled exposure assessment for matched case-control studies. Epidemiology (Cambridge, Mass.) 22, 704. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schisterman, E. F., Vexler, A., Mumford, S. L. and Perkins, N. J. (2010). Hybrid pooled–unpooled design for cost-efficient measurement of biomarkers. Statistics in Medicine 29, 597–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van Domelen, D. R. (2019). pooling: Fit Poolwise Regression Models. R package version 1.1.2. https://github.com/vandomed/pooling. [Google Scholar]
Van Domelen, D. R., Mitchell, E. M., Perkins, N. J., Schisterman, E. F., Manatunga, A. K., Huang, Y. and Lyles, R. H. (2018). Logistic regression with a continuous exposure measured in pools and subject to errors. Statistics in Medicine 37, 4007–4021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weinberg, C. R. and Umbach, D. M. (1999). Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics 55, 718–726. [DOI] [PubMed] [Google Scholar]
Weinberg, C. R. and Umbach, D. M. (2014). Correction to “using pooled exposure assessment to improve efficiency in case-control studies,” by clarice r. weinberg and david m. umbach; 55, 718-726, September 1999. Biometrics 70, 1061. [DOI] [PubMed] [Google Scholar]
Whitcomb, B. W., Perkins, N. J., Zhang, Z., Ye, A. and Lyles, R. H. (2012). Assessment of skewed exposure in case-control studies with pooling. Statistics in Medicine 31, 2461–2472. [DOI] [PMC free article] [PubMed] [Google Scholar]
Whitcomb, B. W., Schisterman, E. F., Klebanoff, M. A., Baumgarten, M., Rhoton-Vlasak, A., Luo, X. and Chegini, N. (2007). Circulating chemokine levels and miscarriage. American Journal of Epidemiology 166, 323–331. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxz028_Supplementary_Material

Click here for additional data file.^{(765.7KB, pdf)}

[B1] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. CRC Press, Boca Raton, FL. [Google Scholar]

[B2] Carroll, R. J., Spiegelman, C. H., Lan, K. K. G., Bailey, K. T. and Abbott, R. D. (1984). On errors-in-variables for binary regression models. Biometrika 71, 19–25. [Google Scholar]

[B3] Cornfield, J. (1962). Joint dependence of risk of coronary heart disease on serum cholesterol and systolic blood pressure: a discriminant function analysis. Federation proceedings 21, 58–61. [PubMed] [Google Scholar]

[B4] Gilbert, P. and Varadhan, R. (2016). numDeriv: Accurate Numerical Derivatives. R package version 2016.8-1. https://CRAN.R-project.org/package=numDeriv. [Google Scholar]

[B5] Guolo, A. (2008). A flexible approach to measurement error correction in case–control studies. Biometrics 64, 1207–1214. [DOI] [PubMed] [Google Scholar]

[B6] Hardy, J. B. (2003). The collaborative perinatal project: lessons and legacy. Annals of Epidemiology 13, 303–311. [DOI] [PubMed] [Google Scholar]

[B7] Lyles, R. H., Guo, Y. and Hill, A. N. (2009). A fresh look at the discriminant function approach for estimating crude or adjusted odds ratios. The American Statistician 63, 320–327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Lyles, R. H. and Kupper, L. L. (2013). Approximate and pseudo-likelihood analysis for logistic regression using external validation data to model log exposure. Journal of Agricultural, Biological, and Environmental Statistics 18, 22–38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Lyles, R. H., Mitchell, E. M., Weinberg, C. R., Umbach, D. M. and Schisterman, E. F. (2016). An efficient design strategy for logistic regression using outcome-and covariate-dependent pooling of biospecimens prior to assay. Biometrics 72, 965–975. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Lyles, R. H., Van Domelen, D., Mitchell, E. M. and Schisterman, E. F. (2015). A discriminant function approach to adjust for processing and measurement error when a biomarker is assayed in pooled samples. International Journal of Environmental Research and Public Health 12, 14723–14740. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Mitchell, E. M., Lyles, R. H., Manatunga, A. K., Danaher, M., Perkins, N. J. and Schisterman, E. F. (2014). Regression for skewed biomarker outcomes subject to pooling. Biometrics 70, 202–211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Mitchell, E. M., Lyles, R. H. and Schisterman, E. F. (2015). Positing, fitting, and selecting regression models for pooled biomarker data. Statistics in Medicine 34, 2544–2558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Narasimhan, B., Johnson, S. G., Hahn, T., Bouvier, A., and Kiãšu, K. (2018). cubature: Adaptive Multivariate Integration over Hypercubes. R package version 2.0.3. https://CRAN.R-project.org/package=cubature. [Google Scholar]

[B14] Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66, 403–411. [Google Scholar]

[B15] R Core Team. (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]

[B16] Saha-Chaudhuri, P., Umbach, D. M. and Weinberg, C. R. (2011). Pooled exposure assessment for matched case-control studies. Epidemiology (Cambridge, Mass.) 22, 704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Schisterman, E. F., Vexler, A., Mumford, S. L. and Perkins, N. J. (2010). Hybrid pooled–unpooled design for cost-efficient measurement of biomarkers. Statistics in Medicine 29, 597–613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Van Domelen, D. R. (2019). pooling: Fit Poolwise Regression Models. R package version 1.1.2. https://github.com/vandomed/pooling. [Google Scholar]

[B19] Van Domelen, D. R., Mitchell, E. M., Perkins, N. J., Schisterman, E. F., Manatunga, A. K., Huang, Y. and Lyles, R. H. (2018). Logistic regression with a continuous exposure measured in pools and subject to errors. Statistics in Medicine 37, 4007–4021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Weinberg, C. R. and Umbach, D. M. (1999). Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics 55, 718–726. [DOI] [PubMed] [Google Scholar]

[B21] Weinberg, C. R. and Umbach, D. M. (2014). Correction to “using pooled exposure assessment to improve efficiency in case-control studies,” by clarice r. weinberg and david m. umbach; 55, 718-726, September 1999. Biometrics 70, 1061. [DOI] [PubMed] [Google Scholar]

[B22] Whitcomb, B. W., Perkins, N. J., Zhang, Z., Ye, A. and Lyles, R. H. (2012). Assessment of skewed exposure in case-control studies with pooling. Statistics in Medicine 31, 2461–2472. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Whitcomb, B. W., Schisterman, E. F., Klebanoff, M. A., Baumgarten, M., Rhoton-Vlasak, A., Luo, X. and Chegini, N. (2007). Circulating chemokine levels and miscarriage. American Journal of Epidemiology 166, 323–331. [DOI] [PubMed] [Google Scholar]

PERMALINK

Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors

Dane R Van Domelen

Emily M Mitchell

Neil J Perkins

Enrique F Schisterman

Amita K Manatunga

Yijian Huang

Robert H Lyles

SUMMARY

1. Introduction

2. Methods

2.1. Homogeneous pools logistic regression

2.2. Normal logistic regression (NLR)

2.3. Gamma logistic regression (GLR)

2.4. Normal discriminant function approach (NDFA)

2.5. Gamma discriminant function approach (GDFA)

2.6. Implementation

2.7. Reproducibility

3. Results

3.1. Motivating example: MCP-1 and odds of miscarriage

Table 1.

Table 2.

3.2. Simulation study

Table 3.

Table 4.

Fig. 1.

4. Discussion

Supplementary Material

Acknowledgments

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors

Dane R Van Domelen

Emily M Mitchell

Neil J Perkins

Enrique F Schisterman

Amita K Manatunga

Yijian Huang

Robert H Lyles

SUMMARY

1. Introduction

2. Methods

2.1. Homogeneous pools logistic regression

2.2. Normal logistic regression (NLR)

2.3. Gamma logistic regression (GLR)

2.4. Normal discriminant function approach (NDFA)

2.5. Gamma discriminant function approach (GDFA)

2.6. Implementation

2.7. Reproducibility

3. Results

3.1. Motivating example: MCP-1 and odds of miscarriage

Table 1.

Table 2.

3.2. Simulation study

Table 3.

Table 4.

Fig. 1.

4. Discussion

Supplementary Material

Acknowledgments

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases