Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2022 Aug 2;191(11):1970–1974. doi: 10.1093/aje/kwac138

Cost-Efficient Multiply Matched Case-Control Study Designs

Grecio J Sandoval , Ionut Bebu, John M Lachin
PMCID: PMC10144705  PMID: 35916344

Abstract

In multiply matched case-control studies, a number of cases and controls may be included in each matched set. However, when per-participant costs between cases and controls differ, investigators should be aware of how the numbers of cases and controls per matched set affect the overall total study cost. Traditional statistical approaches to designing case-control studies do not account for study costs. Given an effect size, the power to detect differences is typically a function of the numbers of cases and controls within each matched set. Therefore, the same level of statistical power will be achieved based on various combinations of the numbers of cases and controls. Typical matched case-control studies match a case to a number of controls by levels of 1 or more known factors. Several authors have shown that for study designs with 1 case per matched set, the optimal number of controls within each matched set that minimizes the total study cost is the square root of the ratio of the cost of a case to the cost of a control. Herein, we extend this result to the setting of a multiply matched case-control study design, when 1 or more cases are matched to controls within each matched set. A Shiny web application implementation of the proposed methods is presented.

Keywords: case-control studies, matched case-control studies, observational studies, research costs

Abbreviations

HCV

hepatitis C virus

OR

odds ratio

Matched case-control studies are observational studies in which cases and controls are matched with respect to 1 or more known factors in order to potentially increase efficiency and reduce the effect of confounding (1), although that may not always be the case (2). In a typical matched case-control study, cases are participants who have a particular disease or condition present, and controls are participants who are free of that disease or condition. A variation of a matched case-control study is a multiply matched case-control study. This study design involves a number of controls matched to 1 or more cases within the matched set. The numbers of cases and controls in each matched set can be selected to provide the desired level of statistical power. For the same level of statistical power to detect a given effect size, if the costs of a case and a control are similar, then the optimal total study cost is achieved when the numbers of cases and controls are equal (3, 4). However, for studies where cases are defined by a rare disease, the fixed costs of enrolling and following up a case are usually higher than the costs for a control. In designing such studies, investigators should be aware of how study design parameters contribute to the overall study cost. Parameter values that contribute to the overall study cost include the numbers of cases and controls within each matched set, the number of matched sets, and the study cost for each case and control. However, traditional statistical approaches to designing matched case-control studies do not take costs into account.

For study designs with 1 case per matched set, Miettinen (5) and Walter (6) provided sample-size calculations and de rived the optimal number of controls minimizing the total study cost. Schlesselman (7) provided similar sample-size calculations for cohort and case-control studies. Using Schlesselman’s (7) sample-size calculations, Meydrech and Kupper (8) and Pike and Casagrande (9) considered study costs and reaffirmed the optimal values obtained by Miettinen (5) (see also Moussa (10) and Nam and Fears (11)). For matched case-control studies in which 1 case is matched with multiple controls, the optimal number of controls for each matched set is the square root of the ratio of the cost of a case to the cost of a control.

Lachin (12, 13) evaluated the sample size for case-control studies when 1 or more cases and controls are considered in each matched set (multiply matched case-control studies). These sample-size calculations were focused on a single qualitative or quantitative covariate and were derived from a score test based on a conditional logistic regression model, which is equivalent to a stratified Cox proportional hazards model with adjustment for ties (14). Following Lachin’s approach, Tang (15) simplified the score equation and Fisher information and provided useful expressions for calculating the power and sample size. These calculations are generalizations of the tests used for the 1-case-per-matched-set designs provided by Miettinen (5) and Walter (6).

Herein, we build on these previous results and describe the assessment of sample size in terms of total study cost rather than power alone. We utilize the sample-size calculations provided by Lachin (12, 13) and Tang (15) to determine the total cost for the study and to obtain the optimal numbers of cases and controls for each matched set that minimize the total study cost. The approach is then illustrated using examples.

METHODS

Consider a case-control study with N matched sets, and assume a fixed number of cases and controls for all matched sets. In each matched set, the number of participants is Inline graphic, with Inline graphic denoting the number of cases and Inline graphic denoting the number of controls. The total number of cases for the study is Inline graphic, and the total number of controls for the study is Inline graphic, for a total of Inline graphic participants in the study.

Sample size

Lachin (12, 13) (see also Tang (15)) obtained the number of matched sets N that provides Inline graphic power in detecting an effect size (e.g., the log odds ratio (OR)) at significance level Inline graphic,

graphic file with name DmEquation1.gif (1)

where K is a multiplicative factor independent of d and m.

The value of K depends on the research objective and comparison for the study. For a quantitative factor X with a common variance Inline graphic among the matched sets and Inline graphic (OR), the log OR for a case versus a control per unit change in X, then Inline graphic. For the average mean difference between cases and controls Inline graphic with variance Inline graphic, then Inline graphic. For a binary factor X with a common Bernoulli variance Inline graphic among the sets and Inline graphic, the log OR for the positive value of the factor (e.g., Inline graphic), then Inline graphic. For the difference in probabilities between cases and controls Inline graphic and common Bernoulli variance Inline graphic, then Inline graphic. (See Lachin (12) for details.)

The number of matched sets N is a decreasing function with respect to the number of cases d for a fixed number of controls. Likewise, the number of matched sets N decreases as a function of the number of controls m for a fixed number of cases.

Total cost for the study

The total cost for the study is the cost for each case multiplied by the number of cases plus the cost for each control multiplied by the number of controls. Let Inline graphic and Inline graphic be the costs per case and control, respectively. These are aggregate costs that consist of per-participant costs related to enrollment and follow-up of a case or control.

The expected total cost for the study is

graphic file with name DmEquation2.gif (2)

This equation for expected total study cost is consistent with Miettinen (5) for d = 1. Note that equation 2 only accounts for participant costs. For completeness, any additional nonparticipant costs can simply be added to equation 2.

Cost-efficient study design

Any combination of the number of matched sets N, cases d, and controls m satisfying equation 1 provides Inline graphic power to detect an effect size (i.e., Inline graphic or Inline graphic) at significance level Inline graphic. Our goal is to select the combination of d and m that minimizes the expected total cost of the study. Using equations 1 and 2, one obtains

graphic file with name DmEquation3.gif (3)

where Inline graphic does not depend on d and m.

The optimal values of d and m in each matched set minimizing the expected total study cost (equation 2) are given b

graphic file with name DmEquation4.gif (4)

where the minimization is over Inline graphic and Inline graphic, with Inline graphic, Inline graphic, Inline graphic, and Inline graphic denoting the smallest and largest numbers of cases and controls the investigator may consider, respectively, and Inline graphic denoting the fixed study parameter values. For example, Inline graphic are the fixed study parameter values for the study design that considers a mean difference of the quantitative factor.

Optimal total cost for the study

Using the Cauchy-Schwarz inequality,

graphic file with name DmEquation5.gif (5)

with equality for Inline graphic = Inline graphic.

As a function of d and m, this means that there is a lower bound for the expected total cost of the study and therefore an optimal integer solution Inline graphic exists. Note that any set Inline graphic that is proportional to the optimal set Inline graphic yields the same expected total study cost, Inline graphic, for any Inline graphic. It follows from equation 5 that given a fixed number of cases d per matched set, the optimal number of controls is

graphic file with name DmEquation6.gif (6)

For 1 case per matched set (Inline graphic), the optimal number of controls per set is Inline graphic, which is the same optimal value as that obtained by Miettinen (5). This result is expected because the sample size for multiply matched case-control studies provided by Lachin is a generalization of Miettinen’s sample-size calculations for 1 case per matched set Inline graphic.

Illustration

Consider a scenario where an investigator is designing a study that considers enrollment of up to 15 cases and 15 controls per matched set, so that Inline graphic and Inline graphic in equation 4. Figure 1 illustrates the expected total study cost (equation 3) for the cost ratios Inline graphic = 2.0, 3.0, 4.0, 5.0. The asterisks (*) within the figure represent the optimal numbers of cases and controls Inline graphic for each cost ratio. This example illustrates that any optimal set of values is proportional to another optimal set of values. For example, if Inline graphic = 4.0 (Figure 1C), then the optimal numbers of cases and controls that minimize the expected total study cost are Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic.

Figure 1.

Figure 1

Expected total study cost (equation 3) for various cost ratios Inline graphic. The asterisk (*) indicates the optimal set (d*, m*). The total cost is Inline graphic. The cost ratios shown are Inline graphic = 2.0 (A), Inline graphic = 3.0 (B), Inline graphic = 4.0 (C), and Inline graphic = 5.0 (D).

Note that the optimal values Inline graphic and Inline graphic are symmetric for a specific cost ratio and its corresponding reciprocal. For example, Inline graphic = 0.25 (reciprocal of Inline graphic = 4.0) yields Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic.

Setting d = m yields the design with the smallest total sample size Inline graphic. However, the costs of designs such as Inline graphic = Inline graphic relative to the cost of the optimal design Inline graphic are greater. For any design Inline graphic compared with the optimal design Inline graphic, the relative cost Inline graphic is

graphic file with name DmEquation7.gif (7)

For example, for cost ratio Inline graphic = 5.0 (Figure 1D), the design that minimizes the total sample size Inline graphic (e.g., Inline graphic) is Inline graphic more costly than the optimal design Inline graphic. In this example, this means that the expected total study cost for a study design with 1 case and 1 control in each matched set is 15% greater than the total study cost for a study design with 4 cases and 9 controls per matched set.

DISCUSSION

Herein, we provided a method for minimizing the cost of a multiply matched case-control study. We described how to calculate the expected total study cost and how to obtain the optimal numbers of cases and controls for each matched set that yield a cost-efficient study design.

In designing multiply matched case-control studies, the costs to enroll and follow up cases and controls may not be the same. For example, when cases are defined as participants with a rare disease, recruitment costs for enrolling each case may be higher than those for controls who are disease-free (i.e., Inline graphic). For participant follow-up, the cost to obtain the same follow-up data may be greater in cases than in controls (e.g., unique biospecimen collection procedures for cases). For example, in the Maternal-Fetal Medicine Units (MFMU) Network Hepatitis C Virus (HCV) in Pregnancy Study (16), postpartum follow-up costs are higher among the HCV-positive cases than among the HCV-negative controls because of the increased time and effort required to contact cases (as opposed to controls), who are additionally known to be more difficult to follow. Conversely, the cost of following a case may be lower than the cost of following a control (i.e., Inline graphic). For example, in a case-control study ancillary to a randomized controlled trial that requires newly enrolled controls, enrollment costs are lower for a case than for a new control because the cases have already been recruited. If all other participant study costs are the same (i.e., the same procedures and as sessments are performed on both cases and controls), then the cost of following a case will be lower than the cost of following a control. Importantly, we showed that these cost differences (i.e., cost ratios) influence the optimal number of cases and controls per matched set. Our examples illustrate that minimizing the number of matched sets or the overall number of participants may not always yield a study design with the lowest expected total study cost.

Our results are consistent with and extend previous research on cost-efficient designs for case-control studies (5, 811). The optimal number of controls, given a fixed number of cases per matched set, is simply the number of cases times the square root of the cost ratio (i.e., Inline graphic).

The results presented here also apply to a nested case-control design where controls are selected from at-risk individuals who are without an event at the time of a case’s event (12). Case-cohort designs may also be used in such applications (17). Future research is needed to compare the 2 study designs from a cost perspective.

An online calculator (Shiny app) implementing these results is available at https://gjsandoval.shinyapps.io/cost-casecontrol. Briefly, parameter values for the sample-size calculation Inline graphic and cost values Inline graphic are elicited as input, and then the number of matched sets and expected total study cost are calculated. Additionally, the online calculator provides a range of values for d and m that yields the expected total study cost within a predefined percentage of the minimum total study cost (e.g., 5%).

ACKNOWLEDGMENTS

Author affiliations: Biostatistics Center, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Rockville, Maryland, United States (Grecio J. Sandoval, Ionut Bebu, John M. Lachin).

This work was partially supported by grants awarded by the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health (grants U01-DK-094176 and U01-DK- 094157) for the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Study.

We thank the members of the Innovations in Design, Education, and Analysis (IDEA) Committee of the Biostatistics Center, George Washington University Milken Institute School of Public Health, for helpful discussions.

Conflict of interest: none declared.

REFERENCES

  • 1. Breslow  NE, Day  NE. Statistical methods in cancer research. Volume I—The analysis of case-control studies. IARC Sci Publ.  1980;(32):5–338. [PubMed] [Google Scholar]
  • 2. Rose  S, van der  Laan  MJ. Why match? Investigating matched case-control study designs with causal effect estimation. Int J Biostat.  2009;5(1):Article 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Cole  P. The evolving case-control study. J Chronic Dis.  1979;32(1-2):15–27. [DOI] [PubMed] [Google Scholar]
  • 4. Breslow  N. Design and analysis of case-control studies. Annu Rev Public Health.  1982;3(1):29–54. [DOI] [PubMed] [Google Scholar]
  • 5. Miettinen  OS. Individual matching with multiple controls in the case of all-or-none responses. Biometrics.  1969;25(2):339–355. [PubMed] [Google Scholar]
  • 6. Walter  S. Matched case-control studies with a variable number of controls per case. J R Stat Soc Ser C Appl Stat.  1980;29(2):172–179. [Google Scholar]
  • 7. Schlesselman  JJ. Sample size requirements in cohort and case-control studies of disease. Am J Epidemiol.  1974;99(6):381–384. [DOI] [PubMed] [Google Scholar]
  • 8. Meydrech  EF, Kupper  LL. Cost considerations and sample size requirements in cohort and case-control studies. Am J Epidemiol.  1978;107(3):201–205. [DOI] [PubMed] [Google Scholar]
  • 9. Pike  MC, Casagrande  JT. Re: “Cost considerations and sample size requirements in cohort and case-control studies”  [letter]. Am J Epidemiol.  1979;110(1):100–102. [DOI] [PubMed] [Google Scholar]
  • 10. Moussa  MA. Allocation designs in cohort and case-control studies. Stat Med.  1986;5(4):319–326. [DOI] [PubMed] [Google Scholar]
  • 11. Nam  JM, Fears  TR. Optimum allocation of samples in strata-matching case-control studies when cost per sample differs from stratum to stratum. Stat Med.  1990;9(12):1475–1483. [DOI] [PubMed] [Google Scholar]
  • 12. Lachin  JM. Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model. Stat Med.  2008;27(14):2509–2523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Lachin  JM. Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model  [erratum]. Stat Med.  2018;37(10):1765–1766. [DOI] [PubMed] [Google Scholar]
  • 14. Cox  DR. Regression models and life-tables. J R Stat Soc Ser B Stat Methodol.  1972;34(2):187–202. [Google Scholar]
  • 15. Tang  Y. Comments on ‘Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model’ by J. M. Lachin, Statistics in Medicine 2008;27(14):2509–2523. Stat Med.  2009;28(1):175–177. [DOI] [PubMed] [Google Scholar]
  • 16. Prasad  M, Saade  GR, Sandoval  G, et al.  Hepatitis C virus antibody screening in a cohort of pregnant women: identifying seroprevalence and risk factors. Obstet Gynecol.  2020;135(4):778–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Prentice  RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika.  1986;73(1):1–11. [Google Scholar]

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES