Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Jun 6;16(1):169–178. doi: 10.1093/biostatistics/kxu024

Logistic analysis of epidemiologic studies with augmentation sampling involving re-stratification and population expansion

Yan Li 1,*, Mahboobeh Safaeian 2, Hilary A Robbins 2, Barry I Graubard 3
PMCID: PMC4263221  PMID: 24907707

Abstract

Epidemiologic cross-sectional, case-cohort, or case–control studies often select augmentation samples to supplement an existing (baseline) sample, primarily for the two reasons: (1) to increase the sample sizes from certain subdomains of interest that were not originally considered in the design of the baseline study and (2) to obtain samples from an extension of the target population. To address these two objectives, two-stage stratified sample designs are considered, where the stratification based on the expanded population at the second stage is not nested in the first stage strata. The sample weighting and Taylor linearization variance estimation for the two-stage stratified sample designs, involving re-stratification and population expansion, are provided for estimating population totals and logistic regression coefficients. Results from limited simulation studies and a logistic regression analysis of a study of human papillomavirus serology are provided.

Keywords: Pseudo-likelihood function, Sample weighting, Taylor linearization variance estimation, Two-stage stratified sampling

1. Introduction

Epidemiologic cross-sectional, case-cohort, or case–control studies are sometimes found to be inadequate for future investigations that utilize the original (baseline) study sample. The original study may fail to have adequate sample sizes to obtain precise estimates within population subdomains of interest to future studies. Or, the target population of the original study may no longer be considered adequate for future studies. To address these limitations, augmentation samples are often collected primarily: (1) to increase the sample sizes from certain subdomains of interest and (2) to obtain samples from an extension of the target population. In this paper, we consider a two-stage stratified sample design. In the first stage, a baseline sample is selected using stratified simple random sampling (SSRS). Upon evaluation of the baseline sample with regard to the sample sizes of the subdomains of interest and the extension of the target population, a second stage stratified sample using a different set of stratification variables (that identify the subdomains) is selected from the remaining original target population (i.e. the target population without the baseline sample) plus an extension of the target population. Similar types of two-stage stratified sample designs, also called re-stratification sampling, but without additional sampling from an extension of the target population, have been described in the ecology literature (Stehman and others, 2012). Re-stratification sample designs differ from two-phase designs used in epidemiology studies, in which a subsample from the first phase sample is selected at the second phase (Breslow and Cain, 1988); whereas in augmentation sampling, a sample of additional individuals is selected.

This paper is motivated by a cross-sectional study on human papillomavirus (HPV) serology. There are over 100 HPV types; approximately 40 infect the genital tract, of which 12 are considered carcinogenic because infections with these are a necessary cause of cervical cancer (Vaccarella and others, 2010; Schiffman and others, 2009). The genital HPV types are sexually transmitted, and therefore infections with more than one genotype are common, especially among young women (Vaccarella and others, 2010; Schiffman and others, 2009). Two of the 12 carcinogenic types cause 70% of cervical cancer worldwide, with HPV16 causing 50% and HPV18 causing 20% of cancers. For a natural history study, a baseline sample based on HPV16 infection status and enrollment serology was firstly selected, and then augmented by the sample based on HPV18 infection status and enrollment serology.

Considering the two-stage augmentation sample design described above, this paper develops (1) sample weighting method to account for differential sampling rates across the strata in both stages of sampling and (2) variance estimation method for estimating finite population totals under the two-stage sample design considered in this paper. Because nearly all estimators can be expressed as explicit or implicit functions (through estimating equations) of estimated finite population totals, including adjusted odds ratios using logistic regression, our variance estimators will have wide application to many estimators of interest in epidemiology.

2. Methods

2.1. Motivating study

The HPV serological study that motivated our methods is nested within the control (unvaccinated) arm of an HPV vaccine trial in Guanacaste, Costa Rica (Herrero and others, 2008). Using a sample of size Inline graphic collected by a two-stage re-stratification sample design, we estimated the association between age and lifetime number of sex partners, and seropositivity by the glutathione S-transferase (GST) multiplex Luminex assay (which can be used as a measure of current or past HPV exposure) based on laboratory cutoffs (Inline graphic1 if seropositive by GST; 0 otherwise) (Coseo and others, 2010; 2011). In addition, the associations of GST with incident cervical HPV infection, with and without adjusting for the number of sex partners, are also evaluated. All analyses were performed separately for both HPV16 and HPV18 infections.

At the first stage of sampling, 388 unvaccinated women were selected from the initial study population of 2814 women who were HPV16 DNA negative at enrollment, using SSRS with stratum defined by “baseline HPV16 ELISA levels (Inline graphic8, 8–86, or Inline graphic86 EU/ml)” and “HPV16 incident infection status over follow-up (Inline graphic1 if HPV16 infected; 0 otherwise)”. This baseline sample, however, resulted in insufficient sample sizes for estimation of the association of anti-HPV18 antibodies (seropositivity) with HPV18 infection status. For example, there were only 9 women (out of an available 42) who were HPV18 seropositive with HPV18 infection in the baseline sample. Therefore, to improve the precision for estimating the association between seropositivity and incident infection for HPV18, an augmentation sample of 112 women was selected in the second stage using a re-stratified sample design from the 2582 women who were HPV18 DNA negative at enrollment and not selected in the baseline sample (Inline graphic). The strata in the augmented sample are formed by a new set of stratification variables of “HPV18 ELISA levels” (Inline graphic7, 7–40, or Inline graphic40 EU/ml) and “HPV18 incident infection status” (Inline graphic1 if HPV18 infected; 0 otherwise). After assays were performed for the entire sample (Inline graphic), 12 women for whom an assay failed were dropped from the analysis to give the final sample of size Inline graphic. This study differs from typical studies with augmentation sampling, in which the augmentation sample is selected from the same target population as is the baseline sample selected from, our study population was expanded from control women who were HPV16 DNA negative only at enrollment in the HPV trial to include women who were HPV16 and/or 18 DNA negative at enrollment. Figure 1 shows the two-stage sample design involving re-stratification and the population expansion described above.

Fig. 1.

Fig. 1.

In Stage 1, 388 women were selected using SSRS from 2814 women who are HPV16 DNA negative (HPV16Inline graphic) at baseline. In Stage 2, 122 women were augmented using SSRS design with a different set of stratification variables from 2582 women who are HPV18 DNA negative (HPV18Inline graphic) at baseline, excluding the first-stage sample. The overlapped population was the women who are both HPV16Inline graphic and HPV18Inline graphic at baseline, denoted by Inline graphic; Inline graphic denotes the first-stage population but excluding Inline graphic; and Inline graphic denotes the second-stage population but excluding Inline graphic. After assays were performed for the entire sample (Inline graphic), 12 women for whom an assay failed were dropped from the analysis to give the final sample of size Inline graphic.

2.2. Sample weighting

This section describes the computation of the sample weights for the re-stratification sample design with population expansion. The population represented by the baseline sample is different from but overlapped with the population represented by the augmented sample. Therefore, we partition the combined population into three domains—Inline graphic and Inline graphic—corresponding to the overlapped population, the population represented by the baseline sample (Inline graphic) excluding Inline graphic, and the population represented by the augmented sample (Inline graphic) excluding Inline graphic. During the augmentation sampling, the baseline sample is excluded from the population to avoid duplicate selections.

Suppose that the baseline population (i.e. Inline graphic) is stratified into Inline graphic strata, the Inline graphicth stratum consisting of Inline graphic subjects (Inline graphic), from which Inline graphic subjects are selected by simple random sampling (SRS). The sample weights under stratified sampling are calculated as the population size divided by the sample size in each stratum, i.e. Inline graphic for subject Inline graphic sampled in stratum Inline graphic in the baseline sample, and the sampling fractions are the inverse of the sample weights, i.e. Inline graphic for stratum Inline graphic. During augmentation sampling, the population (i.e. Inline graphic) is re-stratified into Inline graphic strata with stratum Inline graphic consisting of Inline graphic subjects (Inline graphic), from which, excluding the baseline sample selected in Inline graphic, Inline graphic subjects are selected by SRS. Define

2.2.

The sample weights for the augmentation sample are Inline graphic for the subject Inline graphic selected in stratum Inline graphic and the sampling fractions are Inline graphic, where Inline graphic denotes the collection of subjects sampled in stratum Inline graphic in the baseline (augmented) sample. Note that the sample weights in the augmented sample depend on the baseline sample via Inline graphic, i.e. the number of subjects in the baseline sample falling in stratum Inline graphic, which is random.

To make the augmented sample fully represent the population in Inline graphic, we adjust the sample weights by adding the value of one to the sample weights for subjects selected in Inline graphic in the baseline sample. The populations in Inline graphic and Inline graphic are well represented by the baseline and augmentation sample weights, respectively. The population in Inline graphic, however, is overrepresented (represented twice) because the weighted baseline sample and the weighted augmentation sample in Inline graphic are representing the same population in Inline graphic. To account for this over-representation, we multiply both the adjusted baseline sample weights and the augmentation sample weights by 0.5 for subjects selected in Inline graphic.

In summary, the final weights for a subject Inline graphic selected in Inline graphic, Inline graphic or Inline graphic are, respectively,

2.2.

and

2.2.

2.3. Estimation of population totals

The parameters of interest, such as proportions, odds ratios, or logistic regression coefficients, can be expressed explicitly or implicitly as functions of totals. Therefore, in this section we describe how to estimate population totals using sample weights described in the preceding section. A population total Inline graphic for the three domains (Inline graphic, Inline graphic, and Inline graphic) is Inline graphic is, where Inline graphic is the variable of interest for the subject Inline graphic. We define the indicator functions Inline graphic if subject Inline graphic is in domain Inline graphic corresponding to Inline graphic, Inline graphic, Inline graphic, respectively, and Inline graphic otherwise, and then Inline graphic can be estimated by Inline graphic, where Inline graphic Inline graphic and Inline graphic can be re-expressed as Inline graphic where Inline graphic and Inline graphic. Note that Inline graphic is calculated using data selected from only the baseline sample, whereas Inline graphic depends on the augmented sample and the baseline sample via Inline graphic, which has the random term of Inline graphic.

The variance of Inline graphic is estimated using standard stratified variance estimation for totals (Cochran, 1977). In order to account for the random variability in Inline graphic in the estimation of Inline graphic we decompose the variance into the sum of the variance of the conditional expectation of Inline graphic and the expectation of the conditional variance of Inline graphic, where we condition on the baseline sample. In addition, the variance of Inline graphic includes a covariance between Inline graphic and Inline graphic because the re-stratification depends on the (random) number of sampled subjects in the baseline sample that are in the stratum Inline graphic. See supplementary material available at Biostatistics online for further details.

2.4. Estimation of logistic regression coefficients

Let Inline graphic be a binary variable Inline graphic and be a Inline graphic-vector of covariates or risk factors. Consider a logistic regression model

2.4.

where Inline graphic is the intercept and Inline graphic a Inline graphic-vector of unknown model parameters. The corresponding score equations can be derived by taking the derivatives of the (weighted) pseudo-loglikelihood

2.4.

with respect to the model parameters, given by

2.4. (1)

with Inline graphic being the final weight described in Section 2.2 “Sample Weighting” and Inline graphic. The parameters Inline graphic and Inline graphic can be estimated by solving Inline graphic for unknown parameter Inline graphic using iterative methods, such as Newton–Raphson iterative method. We denote the resulting estimates by Inline graphic and Inline graphic.

A convenient way of approximating the variance for non-linear estimators, such as Inline graphic, involves calculating the Taylor deviate for each observation. Taylor deviates for an estimator are observation-level terms whose sample weighted sum is a way to write the first-order Taylor approximation of the estimator. Expressing estimators as a sample weighted sum of the Taylor deviates is convenient for complex samples because variance estimators of weighted sums are readily available from standard sample survey method theory. These variance estimators are called Taylor linearization variance estimators. Shah (2004) describes how the Taylor deviate can be obtained by differentiating the weighted-estimator with respect to its weights. For further applications of Taylor deviates to obtain Taylor linearization estimators for logistic and linear regression coefficients, see Graubard and others (2005).

Evaluating the score equations (2.1) at Inline graphic and Inline graphic, we obtain

2.4.

where Inline graphic is Inline graphic evaluated at Inline graphic and Inline graphic. To obtain the Taylor deviate associated with the Inline graphicth subject, we first take the derivative of Inline graphic with respect to its final sample weight Inline graphic, leading to

2.4.

Solving Inline graphic for Inline graphic, we have

2.4.

since Inline graphic.

The Taylor deviate for the Inline graphicth sampled subject is Inline graphic, and the VarInline graphic can be approximated by the variance of the weighted total Inline graphic. The variance estimator of Inline graphic depends on the sample design of the stratified baseline sample and the augmentation sample, as discussed in Section 2.3.

3. Results

3.1. Simulation study

We conducted a limited simulation study to evaluate the finite-sample performance of the proposed estimators of the population total and the logistic regression coefficients under a stratified baseline and augmentation sample design with re-stratification and population expansion. To reflect the design of the HPV study in Costa Rica, we partition our population into three domains Inline graphic, Inline graphic, and Inline graphic, with population size of Inline graphic, respectively. Five equal-sized strata are randomly formed in Inline graphic and Inline graphic, and four equal-sized strata are randomly formed in Inline graphic and Inline graphic. We generated a binary outcome variable Inline graphic based on the logistic regression model

3.1.

where the covariate Inline graphic is generated as a surrogate of Inline graphic as Inline graphic if Inline graphic; Inline graphic otherwise, and Inline graphic is assumed to follow an independent normal distribution with differing means Inline graphic, corresponding to each of the five equal-sized strata that form our baseline population of size Inline graphic. The Inline graphic in the expanded population of Inline graphic size Inline graphic follows an independent normal distribution with mean zero. A common standard deviation Inline graphic is specified for all of the independent normal distributions. We set the value of Inline graphic and Inline graphic to be Inline graphic1 and log(1.5), respectively, so that the marginal probability of the outcome, Inline graphic, is approximately 28.5%.

In each simulation, we randomly sample 50, 100, 30, 80, and 40 subjects from the five strata in the baseline population Inline graphic and Inline graphic. Excluding the sampled subjects in Inline graphic, an augmented re-stratified sample of size 30, 130, 50, and 30 is randomly selected from the four reformed strata in Inline graphic and Inline graphic. Note that although the sample sizes across strata in both the baseline sample of size Inline graphic and the augmented sample of size Inline graphic are fixed, the distribution of the 300 selected baseline subjects across the four strata, which are formed for the selection of the augmentation sample in the populations Inline graphic and Inline graphic, are random, whose variability is accounted for in the variance of estimates of the population total of Inline graphicand the regression coefficients Inline graphic and Inline graphic.

To evaluate the performance of the proposed methods, we calculate (1) the relative bias, i.e. RelBias Inline graphic average of the estimates minus the true value, and then divided by the true value and (2) the ratio of the estimated variances, i.e. VarRatio Inline graphic ratios of the mean of the estimated Taylor linearization variances to the empirical variances of estimates.

Based on 2500 replications, we observe that the proposed estimators are approximately unbiased and the ratios of the two estimated variances are close to one for both the total of Inline graphic (with Inline graphic and Inline graphic) and regression coefficient Inline graphic (with Inline graphic and Inline graphic), implying that the Taylor linearization variance estimators perform well in approximating the true variances.

3.2. Analyses of HPV16/18 serological study in Costa Rica

We apply our proposed methods using data from the HPV16/18 study described in Section 2.1. Sampling weights for the baseline and augmentation samples were calculated as described in Section 2.2 “Sample Weighting”.

Using the weighted sample of size Inline graphic collected from the baseline sample and the augmented sample, we estimated the association between age and lifetime number of sex partners and GST (Coseo and others, 2010; 2011). In addition, the associations of GST with incident cervical HPV infection, with and without adjusting for the number of sex partners, are also evaluated. All analyses were performed separately for both HPV16 and HPV18.

Our analysis results showed, HPV 16 seropositivity by GST increased with age (Inline graphic for a 1-year increase, 95% CI 1.03–1.40) and the number of sexual partners (Inline graphic for a 1-partner increase, 95% CI 1.49–2.83). Incident HPV16 infection was more likely among women HPV 16 seropositive by GST (Inline graphic, 95% CI 1.36–3.37). We adjusted for number of sexual partners in the logistic regression model. This resulted in a statistically non-significant association of HPV16 incidence with HPV 16 seropositivity by GST (Inline graphic, 95% CI 0.95–3.13). Logistic regression analyses showed HPV 18 seropositivity by GST increased with age (Inline graphic for a 1-year increase, 95% CI 1.04–1.38) and number of sexual partners (Inline graphic for a 1-partner increase, 95% CI 1.18–2.15). In contrast to HPV16, seropositivity by GST for HPV18 showed a statistically non-significant inverse association with incident infection (Inline graphic, 95% CI 0.12–1.46). This result was not altered by adjustment for number of sexual partners (Inline graphic, 95% CI 0.12–1.23). For more details about the study on which this example is based, see (Robbins and others, 2014) who applied the proposed methods to the same data and evaluated the GST as a measure of cumulative HPV infection and future immune protection among HPV-unvaccinated women.

4. Discussion

In this paper, we have developed innovative statistical methods for estimating population totals and logistic regression coefficients under a two-stage stratified sample design. Specifically, a stratified baseline sample is selected at the first stage and a re-stratified augmentation sample from an expanded population is selected at the second stage. Under the two-stage stratified sample design, the sample sizes from certain subdomains of interest that were not originally considered in the design of the baseline study are increased; and the original target population is expanded so that an additional random sample of individuals is included in the augmentation sample. Our simulation results showed that for finite samples the proposed estimators are approximately unbiased and the Taylor linearization variance estimators perform well with finite samples in approximating the true variances. The proposed methods are applied to an HPV 16/18 serological study.

The two-stage augmentation sample design should not be confused with two-phase designs used in epidemiologic studies to reduce the costs by limiting ascertainment of expensive covariates to a small subsample selected from the baseline sample (Breslow and Cain, 1988; White, 1982). Outcomes and some covariate information, which may be cheaply gathered for all subjects in the baseline sample, can be used for informative sampling of subjects at phase II. In such a setting, a subsample from the baseline sample is selected based on sample information gathered in the first phase. In contrast, the augmentation sample designs considered in this paper are used to increase the baseline sample size to improve estimation of subdomains that may have insufficient sample sizes in the baseline sample and/or to make the sample representative of an expanded target population from the target population represented by the baseline sample.

Somewhat related to our work on augmentation sampling is dual frame or multiple frame sampling (Hartley, 1962) that is used in survey methodology. In dual frame sampling two sample frames, A and B, each consisting of sampling units, e.g. a list of telephone numbers and a list of household addresses, are used to sample the target population of individuals. The combination of the two frames covers the entire (target) population where often the frames A and B overlap, which is called the overlap domain. Information from samples selected from frames A and B, is combined to estimate the quantities of interest. In our example, the individuals who are DNA HPV16Inline graphic that are sampled from in the first stage and the individuals who are HPV18Inline graphic that are sampled from in the second stage might be viewed as two frames. However, different from our augmentation sampling, dual frame sampling independently select samples from each of frame A and frame B, and therefore the part of the population covered by units in the overlap of frame A and frame B may be selected twice; whereas our augmentation sampling excludes the baseline sample that is selected in the first stage, and therefore depends on the first stage sampling, which introduces a random component and a covariance between estimates from each stage that needs to be accommodated in the variance estimation.

In dual frame sampling, various approaches have been discussed for estimating the population total in the overlap domain (Lohr and Rao, 2000). The population total in the overlap domain is usually estimated by a linear combination of the estimated total using the sample from frame A only and the estimated total using the sample from frame B only, where a factor Inline graphic, Inline graphic is used to linearly combine these two estimated totals. Different methods have been proposed to choose or estimate the Inline graphic (Hartley, 1962; Fuller and Burmeister, 1972; Skinner and Rao, 1992). The reader is referred to (Lohr and Rao, 2000) for a thorough comparison of different methods. In Section 2.3, we chose the value of Inline graphic. Obtaining an optimal value of Inline graphic to achieve the minimal asymptotic variance is a topic for future research interest.

During the augmentation sampling for the HPV16/18 serological study in Costa Rica, no observations were selected from the stratum of women who were HPV18 seronegative at enrollment and had no incident HPV18 infection, because it was expected that this stratum would be like the stratum of women who were HVP16 seronegative at enrollment and had no incident HPV16 infection (were less risky, with low likelihood of exposure) based on HPV natural history studies (Coseo and others, 2010). To account for the non-representation in this stratum, we randomly sampled 10 individuals from the baseline sample of 25 women who were HPV16 seronegative and had no incident HPV16 infection, and then considered this sample as a pseudo-sample that had been selected during augmentation sampling from the stratum of HPV18 seronegative women without an incident HPV18 infection. Stratified sampling weights for these women were calculated as if they had been sampled from that stratum. We verified that the 10 selected women were HPV18 DNA negative at enrollment, HPV18 seronegative at enrollment, and did not develop an incident HPV18 infection. This is consistent with their assignment to the stratum of HPV18 seronegative women without an incident HPV18 infection.

Re-stratification sample designs addressed in this paper can be useful to augment subcohort samples of case-cohort studies. For instance, the subcohort sample that was obtained for an original case series may not be appropriate for a newly considered case series that has quite different risk factor and/or confounder distributions than the original cases. As an example, consider a case-cohort study of primarily Caucasian cancers such as breast cancer and skin cancer where the subcohort was sampled based on age and gender. Suppose that the investigators want to study kidney cancer, which has a higher incidence rate among African Americans. The subcohort sample could be augmented using re-stratification sampling based on race. The methods of Li and others (2012) for estimating variances for regression coefficients using Poisson regression analyses of case-cohort studies with sample weighting of the subcohort can be extended to re-stratification sampling of the subcohort. Similar augmentation sampling could also be considered to enhance a control sample from an existing case–control study to be used for a new study of a different case series type.

Supplementary material

Supplementary Material is available at http//biostatistics.oxfordjournals.org.

Funding

This work was supported by the Intramural Research Program of the National Cancer Institute at the National Institutes of Health.

Supplementary Material

Supplementary Data
supp_16_1_169__index.html (1,024B, html)

Acknowledgements

Authors are grateful to anonymous AE and referees for their constructive comments. The authors are grateful to the investigators and staff of the Costa Rica Vaccine Trial at the National Cancer Institute, USA, and Proyecto Epidemiologico Guanacaste, Costa Rica. We thank Brian Befano and Greg Rydzak of Information Management Services, Inc. for their assistance with data management. Conflict of Interest: None declared.

References

  1. Breslow N. E., Cain K. C. Logistic regression for two-stage case-control data. Biometrika. 1988;75:11–20. [Google Scholar]
  2. Cochran W. G. Sampling Techniques. New York: John Wiley & Sons; 1977. [Google Scholar]
  3. Coseo S., Porras C., Hildesheim A., Rodriguez A. C., Schiffman M., Herrero R., Wacholder S., Gonzalez P., Wang S. S., Scherman M. E. Costa Rica HPV Vaccine Trial (CVT) Group. Seroprevalence and correlates of human papillomavirus 16/18 seropositivity among young women in Costa Rica. Sexually Transmitted Diseases. 2010;37:706–714. doi: 10.1097/OLQ.0b013e3181e1a2c5. and others. [DOI] [PubMed] [Google Scholar]
  4. Coseo S. E., Porras C., Dodd L. E., Hildesheim A., Rodriguez A. C., Schiffman M., Herrero R., Wacholder S., Gonzalez P., Sherman M. E. Costa Rica HPV Vaccine Trial (CVT) Group. Evaluation of the polyclonal ELISA HPV serology assay as a biomarker for human papillomavirus exposure. Sexually Transmitted Diseases. 2011;38:976–982. doi: 10.1097/OLQ.0b013e31822545c0. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fuller W. A., Burmeister L. F. Estimators for Samples Selected From Two Overlapping Frames. ASA Proceedings of the Social Statistics Section. 1972:245–249. [Google Scholar]
  6. Graubard B. I., Rao R. S., Gastwirth J. L. Using the Peters–Belson method to measure health care disparities from complex survey data. Statistics in Medicine. 2005;24:2659–2668. doi: 10.1002/sim.2135. [DOI] [PubMed] [Google Scholar]
  7. Hartley H. O. Multiple Frame Surveys. Proceedings of the Social Statistics Section, American Statistical Association. 1962:203–206. [Google Scholar]
  8. Herrero R., Hildesheim A., Rodriguez A. C., Wacholder S., Bratti C., Solomon D., Gonzalez P., Porras C., Jimenez S., Guillen D. Costa Rica Vaccine Trial (CVT) Group. Rationale and design of a community-based double-blind randomized clinical trial of an HPV 16 and 18 vaccine in Guanacaste, Costa Rica. Vaccine. 2008;26:4795–4808. doi: 10.1016/j.vaccine.2008.07.002. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Li Y., Gail M. H., Preston D. L., Graubard B. I., Lubin J. H. Piecewise exponential survival times and analysis of case-cohort data. Statistics in Medicine. 2012;31:1361–1368. doi: 10.1002/sim.4441. [DOI] [PubMed] [Google Scholar]
  10. Lohr S. L., Rao J. N. K. Inference from dual frame surveys. Journal of the American Statistical Association. 2000;95:271–280. [Google Scholar]
  11. Robbins H. A., Li Y., Porras C., Pawlita M., Ghosh A., Rodriguez A. C., Schiffman M., Wacholder S., Kemp T. J., Gonzalez P. Glutathione S-transferase L1 multiplex serology as a measure of cumulative infection with human papillomavirus. BMC Infectious Diseases. 2014;14:120. doi: 10.1186/1471-2334-14-120. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Schiffman M., Clifford G., Buonaguro F. M. Classification of weakly carcinogenic human papillomavirus types: addressing the limits of epidemiology at the borderline. Infectious Agents and Cancer. 2009;4:8. doi: 10.1186/1750-9378-4-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Shah B. V. Comment on linearization variance estimators for survey data by A Demnati and JNK Rao. Survey Methodology. 2004;30:29. [Google Scholar]
  14. Skinner C. J., Rao J. N. K. Estimation in dual frame surveys with complex designs. Journal of the American Statistical Association. 1992;91:349–356. [Google Scholar]
  15. Stehman S. V., Olofsson P., Woodcock C. E., Herold M., Friedl M. A. A global land-cover validation data set, II: augmenting a stratified sampling design to estimate accuracy by region and land-cover class. International Journal of Remote Sensoring. 2012;33:6975–6993. [Google Scholar]
  16. Vaccarella S., Franceschi S., Snijders P. J., Herrero R., Meijer C. J., Plummer M. Concurrent infection with multiple human papillomavirus types: pooled analysis of the IARC HPV Prevalence Surveys. Cancer Epidemiology, iomarkers & Prevention. 2010;19:503–510. doi: 10.1158/1055-9965.EPI-09-0983. IARC HPV Prevalence Surveys Study Group. [DOI] [PubMed] [Google Scholar]
  17. White J. E. A two stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology. 1982;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
supp_16_1_169__index.html (1,024B, html)

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES