Abstract
In multiply matched case-control studies, a number of cases and controls may be included in each matched set. However, when per-participant costs between cases and controls differ, investigators should be aware of how the numbers of cases and controls per matched set affect the overall total study cost. Traditional statistical approaches to designing case-control studies do not account for study costs. Given an effect size, the power to detect differences is typically a function of the numbers of cases and controls within each matched set. Therefore, the same level of statistical power will be achieved based on various combinations of the numbers of cases and controls. Typical matched case-control studies match a case to a number of controls by levels of 1 or more known factors. Several authors have shown that for study designs with 1 case per matched set, the optimal number of controls within each matched set that minimizes the total study cost is the square root of the ratio of the cost of a case to the cost of a control. Herein, we extend this result to the setting of a multiply matched case-control study design, when 1 or more cases are matched to controls within each matched set. A Shiny web application implementation of the proposed methods is presented.
Keywords: case-control studies, matched case-control studies, observational studies, research costs
Abbreviations
- HCV
hepatitis C virus
- OR
odds ratio
Matched case-control studies are observational studies in which cases and controls are matched with respect to 1 or more known factors in order to potentially increase efficiency and reduce the effect of confounding (1), although that may not always be the case (2). In a typical matched case-control study, cases are participants who have a particular disease or condition present, and controls are participants who are free of that disease or condition. A variation of a matched case-control study is a multiply matched case-control study. This study design involves a number of controls matched to 1 or more cases within the matched set. The numbers of cases and controls in each matched set can be selected to provide the desired level of statistical power. For the same level of statistical power to detect a given effect size, if the costs of a case and a control are similar, then the optimal total study cost is achieved when the numbers of cases and controls are equal (3, 4). However, for studies where cases are defined by a rare disease, the fixed costs of enrolling and following up a case are usually higher than the costs for a control. In designing such studies, investigators should be aware of how study design parameters contribute to the overall study cost. Parameter values that contribute to the overall study cost include the numbers of cases and controls within each matched set, the number of matched sets, and the study cost for each case and control. However, traditional statistical approaches to designing matched case-control studies do not take costs into account.
For study designs with 1 case per matched set, Miettinen (5) and Walter (6) provided sample-size calculations and de rived the optimal number of controls minimizing the total study cost. Schlesselman (7) provided similar sample-size calculations for cohort and case-control studies. Using Schlesselman’s (7) sample-size calculations, Meydrech and Kupper (8) and Pike and Casagrande (9) considered study costs and reaffirmed the optimal values obtained by Miettinen (5) (see also Moussa (10) and Nam and Fears (11)). For matched case-control studies in which 1 case is matched with multiple controls, the optimal number of controls for each matched set is the square root of the ratio of the cost of a case to the cost of a control.
Lachin (12, 13) evaluated the sample size for case-control studies when 1 or more cases and controls are considered in each matched set (multiply matched case-control studies). These sample-size calculations were focused on a single qualitative or quantitative covariate and were derived from a score test based on a conditional logistic regression model, which is equivalent to a stratified Cox proportional hazards model with adjustment for ties (14). Following Lachin’s approach, Tang (15) simplified the score equation and Fisher information and provided useful expressions for calculating the power and sample size. These calculations are generalizations of the tests used for the 1-case-per-matched-set designs provided by Miettinen (5) and Walter (6).
Herein, we build on these previous results and describe the assessment of sample size in terms of total study cost rather than power alone. We utilize the sample-size calculations provided by Lachin (12, 13) and Tang (15) to determine the total cost for the study and to obtain the optimal numbers of cases and controls for each matched set that minimize the total study cost. The approach is then illustrated using examples.
METHODS
Consider a case-control study with N matched sets, and assume a fixed number of cases and controls for all matched sets. In each matched set, the number of participants is
, with
denoting the number of cases and
denoting the number of controls. The total number of cases for the study is
, and the total number of controls for the study is
, for a total of
participants in the study.
Sample size
Lachin (12, 13) (see also Tang (15)) obtained the number of matched sets N that provides
power in detecting an effect size (e.g., the log odds ratio (OR)) at significance level
,
![]() |
(1) |
where K is a multiplicative factor independent of d and m.
The value of K depends on the research objective and comparison for the study. For a quantitative factor X with a common variance
among the matched sets and
(OR), the log OR for a case versus a control per unit change in X, then
. For the average mean difference between cases and controls
with variance
, then
. For a binary factor X with a common Bernoulli variance
among the sets and
, the log OR for the positive value of the factor (e.g.,
), then
. For the difference in probabilities between cases and controls
and common Bernoulli variance
, then
. (See Lachin (12) for details.)
The number of matched sets N is a decreasing function with respect to the number of cases d for a fixed number of controls. Likewise, the number of matched sets N decreases as a function of the number of controls m for a fixed number of cases.
Total cost for the study
The total cost for the study is the cost for each case multiplied by the number of cases plus the cost for each control multiplied by the number of controls. Let
and
be the costs per case and control, respectively. These are aggregate costs that consist of per-participant costs related to enrollment and follow-up of a case or control.
The expected total cost for the study is
![]() |
(2) |
This equation for expected total study cost is consistent with Miettinen (5) for d = 1. Note that equation 2 only accounts for participant costs. For completeness, any additional nonparticipant costs can simply be added to equation 2.
Cost-efficient study design
Any combination of the number of matched sets N, cases d, and controls m satisfying equation 1 provides
power to detect an effect size (i.e.,
or
) at significance level
. Our goal is to select the combination of d and m that minimizes the expected total cost of the study. Using equations 1 and 2, one obtains
![]() |
(3) |
where
does not depend on d and m.
The optimal values of d and m in each matched set minimizing the expected total study cost (equation 2) are given b
![]() |
(4) |
where the minimization is over
and
, with
,
,
, and
denoting the smallest and largest numbers of cases and controls the investigator may consider, respectively, and
denoting the fixed study parameter values. For example,
are the fixed study parameter values for the study design that considers a mean difference of the quantitative factor.
Optimal total cost for the study
Using the Cauchy-Schwarz inequality,
![]() |
(5) |
with equality for
=
.
As a function of d and m, this means that there is a lower bound for the expected total cost of the study and therefore an optimal integer solution
exists. Note that any set
that is proportional to the optimal set
yields the same expected total study cost,
, for any
. It follows from equation 5 that given a fixed number of cases d per matched set, the optimal number of controls is
![]() |
(6) |
For 1 case per matched set (
), the optimal number of controls per set is
, which is the same optimal value as that obtained by Miettinen (5). This result is expected because the sample size for multiply matched case-control studies provided by Lachin is a generalization of Miettinen’s sample-size calculations for 1 case per matched set
.
Illustration
Consider a scenario where an investigator is designing a study that considers enrollment of up to 15 cases and 15 controls per matched set, so that
and
in equation 4. Figure 1 illustrates the expected total study cost (equation 3) for the cost ratios
= 2.0, 3.0, 4.0, 5.0. The asterisks (*) within the figure represent the optimal numbers of cases and controls
for each cost ratio. This example illustrates that any optimal set of values is proportional to another optimal set of values. For example, if
= 4.0 (Figure 1C), then the optimal numbers of cases and controls that minimize the expected total study cost are
,
,
,
,
,
, and
.
Figure 1.

Expected total study cost (equation 3) for various cost ratios
. The asterisk (*) indicates the optimal set (d*, m*). The total cost is
. The cost ratios shown are
= 2.0 (A),
= 3.0 (B),
= 4.0 (C), and
= 5.0 (D).
Note that the optimal values
and
are symmetric for a specific cost ratio and its corresponding reciprocal. For example,
= 0.25 (reciprocal of
= 4.0) yields
,
,
,
,
,
, and
.
Setting d = m yields the design with the smallest total sample size
. However, the costs of designs such as
=
relative to the cost of the optimal design
are greater. For any design
compared with the optimal design
, the relative cost
is
![]() |
(7) |
For example, for cost ratio
= 5.0 (Figure 1D), the design that minimizes the total sample size
(e.g.,
) is
more costly than the optimal design
. In this example, this means that the expected total study cost for a study design with 1 case and 1 control in each matched set is 15% greater than the total study cost for a study design with 4 cases and 9 controls per matched set.
DISCUSSION
Herein, we provided a method for minimizing the cost of a multiply matched case-control study. We described how to calculate the expected total study cost and how to obtain the optimal numbers of cases and controls for each matched set that yield a cost-efficient study design.
In designing multiply matched case-control studies, the costs to enroll and follow up cases and controls may not be the same. For example, when cases are defined as participants with a rare disease, recruitment costs for enrolling each case may be higher than those for controls who are disease-free (i.e.,
). For participant follow-up, the cost to obtain the same follow-up data may be greater in cases than in controls (e.g., unique biospecimen collection procedures for cases). For example, in the Maternal-Fetal Medicine Units (MFMU) Network Hepatitis C Virus (HCV) in Pregnancy Study (16), postpartum follow-up costs are higher among the HCV-positive cases than among the HCV-negative controls because of the increased time and effort required to contact cases (as opposed to controls), who are additionally known to be more difficult to follow. Conversely, the cost of following a case may be lower than the cost of following a control (i.e.,
). For example, in a case-control study ancillary to a randomized controlled trial that requires newly enrolled controls, enrollment costs are lower for a case than for a new control because the cases have already been recruited. If all other participant study costs are the same (i.e., the same procedures and as sessments are performed on both cases and controls), then the cost of following a case will be lower than the cost of following a control. Importantly, we showed that these cost differences (i.e., cost ratios) influence the optimal number of cases and controls per matched set. Our examples illustrate that minimizing the number of matched sets or the overall number of participants may not always yield a study design with the lowest expected total study cost.
Our results are consistent with and extend previous research on cost-efficient designs for case-control studies (5, 8–11). The optimal number of controls, given a fixed number of cases per matched set, is simply the number of cases times the square root of the cost ratio (i.e.,
).
The results presented here also apply to a nested case-control design where controls are selected from at-risk individuals who are without an event at the time of a case’s event (12). Case-cohort designs may also be used in such applications (17). Future research is needed to compare the 2 study designs from a cost perspective.
An online calculator (Shiny app) implementing these results is available at https://gjsandoval.shinyapps.io/cost-casecontrol. Briefly, parameter values for the sample-size calculation
and cost values
are elicited as input, and then the number of matched sets and expected total study cost are calculated. Additionally, the online calculator provides a range of values for d and m that yields the expected total study cost within a predefined percentage of the minimum total study cost (e.g., 5%).
ACKNOWLEDGMENTS
Author affiliations: Biostatistics Center, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Rockville, Maryland, United States (Grecio J. Sandoval, Ionut Bebu, John M. Lachin).
This work was partially supported by grants awarded by the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health (grants U01-DK-094176 and U01-DK- 094157) for the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Study.
We thank the members of the Innovations in Design, Education, and Analysis (IDEA) Committee of the Biostatistics Center, George Washington University Milken Institute School of Public Health, for helpful discussions.
Conflict of interest: none declared.
REFERENCES
- 1. Breslow NE, Day NE. Statistical methods in cancer research. Volume I—The analysis of case-control studies. IARC Sci Publ. 1980;(32):5–338. [PubMed] [Google Scholar]
- 2. Rose S, van der Laan MJ. Why match? Investigating matched case-control study designs with causal effect estimation. Int J Biostat. 2009;5(1):Article 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Cole P. The evolving case-control study. J Chronic Dis. 1979;32(1-2):15–27. [DOI] [PubMed] [Google Scholar]
- 4. Breslow N. Design and analysis of case-control studies. Annu Rev Public Health. 1982;3(1):29–54. [DOI] [PubMed] [Google Scholar]
- 5. Miettinen OS. Individual matching with multiple controls in the case of all-or-none responses. Biometrics. 1969;25(2):339–355. [PubMed] [Google Scholar]
- 6. Walter S. Matched case-control studies with a variable number of controls per case. J R Stat Soc Ser C Appl Stat. 1980;29(2):172–179. [Google Scholar]
- 7. Schlesselman JJ. Sample size requirements in cohort and case-control studies of disease. Am J Epidemiol. 1974;99(6):381–384. [DOI] [PubMed] [Google Scholar]
- 8. Meydrech EF, Kupper LL. Cost considerations and sample size requirements in cohort and case-control studies. Am J Epidemiol. 1978;107(3):201–205. [DOI] [PubMed] [Google Scholar]
- 9. Pike MC, Casagrande JT. Re: “Cost considerations and sample size requirements in cohort and case-control studies” [letter]. Am J Epidemiol. 1979;110(1):100–102. [DOI] [PubMed] [Google Scholar]
- 10. Moussa MA. Allocation designs in cohort and case-control studies. Stat Med. 1986;5(4):319–326. [DOI] [PubMed] [Google Scholar]
- 11. Nam JM, Fears TR. Optimum allocation of samples in strata-matching case-control studies when cost per sample differs from stratum to stratum. Stat Med. 1990;9(12):1475–1483. [DOI] [PubMed] [Google Scholar]
- 12. Lachin JM. Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model. Stat Med. 2008;27(14):2509–2523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Lachin JM. Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model [erratum]. Stat Med. 2018;37(10):1765–1766. [DOI] [PubMed] [Google Scholar]
- 14. Cox DR. Regression models and life-tables. J R Stat Soc Ser B Stat Methodol. 1972;34(2):187–202. [Google Scholar]
- 15. Tang Y. Comments on ‘Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model’ by J. M. Lachin, Statistics in Medicine 2008;27(14):2509–2523. Stat Med. 2009;28(1):175–177. [DOI] [PubMed] [Google Scholar]
- 16. Prasad M, Saade GR, Sandoval G, et al. Hepatitis C virus antibody screening in a cohort of pregnant women: identifying seroprevalence and risk factors. Obstet Gynecol. 2020;135(4):778–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73(1):1–11. [Google Scholar]







