A Brief Guide to Statistical Analysis of Grouped Data in Preclinical Research

Colby J Vorland; Lilian Golzarri-Arroyo; David B Allison

doi:10.1038/s42255-025-01323-9

. Author manuscript; available in PMC: 2026 Feb 12.

Published in final edited form as: Nat Metab. 2025 Jul;7(7):1301–1304. doi: 10.1038/s42255-025-01323-9

A Brief Guide to Statistical Analysis of Grouped Data in Preclinical Research

Colby J Vorland ¹, Lilian Golzarri-Arroyo ¹, David B Allison ¹

PMCID: PMC12893352 NIHMSID: NIHMS2135479 PMID: 40542297

Abstract

Clustering and nesting (C&N) arise in many preclinical studies such as when animals are group-housed, share litters, or in cell culture. Ignoring C&N undermines the validity of analyses. We explain how C&N arise and valid designs and analyses.

Introduction

Many published preclinical research analyses are not valid because they fail to consider the non-independence (for a thorough elucidation of the concept of non-independence, see (1)) of data points sharing common conditions or influences (technically speaking, the non-independence of model residuals). Preclinical research often involves interventions delivered to groups, such as feeding a specific diet to a cage of mice, or applying a treatment to a plate of cells. Even if interventions are not delivered to a group, clustering may still need to be accounted for (2).

We discuss why recognizing clustering and nesting (C&N) is essential for valid analyses and substantiated results, how C&N arise in common experimental models, and provide design and analysis guidance. Attention to these issues will improve preclinical findings’ rigor and reproducibility.

What are clustering and nesting?

Clustering refers to statistical dependence among observations within a group. This dependence is often due to shared environmental or genetic influences, and can be positive (observations tend to vary together) or negative (they tend to vary inversely) (3). Nesting is a related concept describing a hierarchical structure: lower-level units (e.g. individual animals or cells) are nested within higher-level units (e.g. cages, litters, or plates) (3). Nesting needs to be accounted for if the unit of allocation is a group (e.g., cage of mice, plate of cells), and clustering needs to be accounted for even without group allocation.

Consequences of failing to account for C&N

In statistical terms, clustered or nested data violate the assumption of independence required by standard tests like the t-test or ANOVA, and require other approaches (3). Failing to do so can inflate type I error rate, leading to standard errors, p-values and associated confidence intervals that are artificially small. This can result in spurious statistical significance (i.e., false positives), undermining preclinical findings (4, 5).

Examples of C&N in preclinical research

C&N are pervasive in preclinical research (Figure 1). Common scenarios include:

Group-housed animals (cage effects)

When animals are group-housed, such as in a cage, pen, or chamber, they form a cluster. All animals in one cluster experience the same environment (e.g., bedding, temperature, light cycle, social interactions, microbiome exposure, infectious diseases (6)) and, consequently, their outcomes (body weight, glucose levels, etc.) may be dependent (Fig. 1a). If animals already housed together within a cage are allocated to the same intervention (e.g., diet, drug in water), then the cage – not the individual animal – is the experimental unit, as animals are nested within cages. In this scenario, if there are 5 mice in each of 2 cages, one does not have 5 independent mice per group; one has 2 independent experimental units (cages) per group, each containing 5 mice. At the extreme, if any treatment only has one group, it is not possible to separate the effect of the treatment and that of the group. In this case, valid statistical analyses cannot be performed (7). This sometimes occurs if entire communities of animals are allocated to treatments, as is sometimes the case in research with naturalistic enclosures (e.g., (8)). Clustering is a consideration regardless of whether the treatment is applied to the cage (such as a high-fat diet provided to the whole cage) or the individual animals (such as an injection in group-housed mice; akin to an individually randomized group treatment design (2)). Whether the cage should be considered the experimental unit here depends on how time and grouping factors are reflected in the analysis, and more work is needed to determine guidance for such scenarios.

Invertebrates and small animals

Studies in invertebrates (like Drosophila or C. elegans) often use group-based assays. For example, a feeding or preference test where dozens of flies are tested together in a single vial or chamber. The readout might be a single metric for that group (e.g., proportion of flies responding to a stimulus). Here, like mice in cages, the group or vial is the experimental unit – not each individual fly. In one analysis of a fly T-maze assay, using a t-test on n ~ 100 flies per group (when in reality there was 1 group of ~100 flies per treatment) produced “extremely optimistic P values” for tiny differences in behavior (9). Similarly, if fish are allocated to tanks, the tank is often the experimental unit, not the individual fish. Hurlbert showed that nearly half of a sample of ecological field experiments that applied inferential statistics made the mistake of failing to account for C&N, often called pseudoreplication (10).

Litter, sire, and donor effects

In studies using offspring, using multiple siblings from the same litter in a study can introduce litter variability as a clustering factor (Fig. 1b). All pups in a litter share a mother and early-life environment. If a treatment is administered to a pregnant dam (e.g., a maternal diet or drug), then the litter is the experimental unit, as pups are nested in the litter. Lazic et al. found that nearly half of a random sample of animal experiments with interventions applied to parents and outcomes measured in offspring did not appropriately consider litter effects (11). In addition, in multigeneration studies, particularly with livestock, when S sires are mated to D dams with 1 < S < D and offspring are studied, the offspring may be clustered by sire from genetic or other transmissible factors (12) (Fig. 1c). Similarly, transplant studies (including fecal transplant) in which N (subject) mice receive transplants from m (donor) mice, with 1 < m < N, can induce clustering (Fig. 1d).

Multiple measurements per animal

When multiple data points come from the same animal, those observations are nested within that animal (Fig. 1e). Examples include longitudinal repeated measures (e.g., body weight measured weekly on each animal), multiple tissue samples or organs analyzed from one animal, or measurements on multiple cells/sections from one specimen. All these within-animal observations are not independent replicates; the animal is the experimental unit. For example, if each mouse contributes two measurements from the same tumor, those two measurements are nested within the mouse.

In vitro experiments

In vitro experiments (e.g., measuring oxygen consumption or glucose uptake in cultured cells) are prone to clustering (11) (Fig. 1f). A common design is to have multiple wells per treatment on multi-well plates from the same biological sample (i.e., nested, technical replicates). Shared plate conditions (temperature variations, edge effects, or reagent batches) can cause dependence. If all treatment wells are on one plate and all control wells on another, plate-to-plate variability can confound the results. Even when treatments are mixed on a plate, measurements from the same plate may be dependent.

Site, batch, and researcher effects

Data can also be nested by laboratory site (in multi-site studies), experimental batch (e.g., different days or assay runs), or by the researcher performing experiments or collecting data (13).

Best practices when planning and analyzing for C&N

Careful planning may involve recognizing and reducing C&N at the design stage, or choosing appropriate statistical models to account for these effects. Housing animals individually can eliminate cage-level clustering, but is often not possible for ethical, social, or treatment-related reasons. When group-housing is required, the group (if allocated that way) must be treated as the experimental unit in both power calculations and analyses. Even if treatments are administered individually within a group, it is always important to account for sources of clustering, for instance by including the group as a random effect or by aggregating outcomes at the group level for valid analyses (14). If animals are singly housed but were originally group-housed, we recommend that group be included in statistical models given that clustering is plausible. Furthermore, employing stratified randomization at the design stage, using the origin group as a blocking factor makes group membership and treatment condition orthogonal, which may reduce the variance of the estimated treatment effect and increase power. Additional considerations exist for statistical power when the group is the experimental unit. While increasing the number of animals per group and the number of groups can each increase statistical power, as a general rule of thumb, increasing the number of groups will increase statistical power much more than increasing the number of animals per group for any given number of animals needed (15). This is especially true when the number of animals is large and the number of groups is relatively small.

If litter effects are possible, even if treatments are given post-weaning to individual offspring, modeling the litter as a random effect can address non-independence (14). Sire and dam should also be included as random effects (12). When feasible, use only one animal per litter per group. If that is not practical, balancing litter representation across groups using stratified randomization helps control litter-to-litter variability.

Similar considerations apply to cell culture experiments and plate effects. The plate (or experimental run/day) is a blocking factor or random effect, and multiple measurements from the same donor do not represent independent observations. Thus, the sample size for statistical tests is the number of independent experiments (e.g., animals or culture preparations), rather than the number of wells or cells. Plate effects can be addressed by randomizing treatment allocation across plates, including plate as a random factor in analysis, or averaging results per plate or per donor before comparison.

Other nesting factors (e.g., site, batch, researcher effects) can be addressed during experimental design. Blocking designs offer a strategy to account for these (“control what you can, block what you cannot, and randomize the rest”) (13).

When analyzing clustered or nested designs, two approaches are: 1) aggregate then analyze, or 2) analyze with mixed or hierarchical models. The first approach means reducing the data from each cluster to a single summary (e.g., mean per cage) and then using those summaries as independent observations in a traditional model (e.g., t-test, ANOVA). However, this approach loses information, such as the within-cluster variability, which can be of interest. The second approach uses all data but includes cluster as a random effect or as a blocking factor in a multi-level model. For example, a linear mixed-effects model can include a random intercept for each cage or litter. This correctly partitions variance into within- and between-cluster components, and when appropriately specified, adjusts degrees of freedom to account for nesting of clusters within treatment, as degrees of freedom for analyses must be based on the independent experimental unit (3). Another option is using generalized estimating equations or cluster-robust standard errors if sample sizes of experimental units are large enough, though required sample sizes are typically much larger than is common in preclinical research (3). The key point is that the analysis must reflect the data hierarchy: measurements nested within animals, animals within cages, etc., and the observations dependent from the interactions of experimental units.

Investigators are encouraged to follow reporting guidelines specific to their research approach, such as ARRIVE for in vivo animal research (14), and report statistical model details (e.g., random effects, experimental unit, degrees of freedom adjustments) and grouping factor ICCs. Good communication between non-statistician and statistician researchers is essential so each understands the research question, experimental design, and ethical considerations from planning to analysis.

Summary and Conclusions

We recommend: 1) avoid designing studies with grouping effects when one can, although often they are unavoidable; 2) identify and report grouping effects and ICCs transparently following relevant reporting guidelines; 3) regardless of ICC, always include grouping factors in analyses using suitable methods and adjust degrees of freedom when necessary. C&N are concepts common to preclinical designs, and accounting for them in design and analysis phases is essential. It is an ethical imperative that research be designed and analyzed to avoid wasting investment of animals, research dollars, and effort.

Acknowledgements

We thank Dr. Sherri Pals for her thoughtful feedback and suggestions. This work was supported in part by the National Institutes of Health under award numbers R25DK099080, P30AG050886, U24AG056053, and R25HL124208. The assertions expressed are those of the authors and not necessarily those of the NIH or any other organization.

Box 1: Glossary

Unit of allocation:: The entity (e.g., cage, litter, culture plate) assigned to treatment.
Experimental unit:: The smallest distinct entity to which a treatment can be applied (e.g., an individual animal, or a cage).
Pseudoreplication:: An error resulting from treating multiple measurements from a single experimental unit (e.g., cells from the same animal) as independent replicates.
Random effect:: A component in a statistical model that captures variation due to clustering.
Blocking factor:: A known source of variability that cannot be eliminated but is accounted for during experimental design (such as during randomization) and statistical analysis to control its influence on the results.
ICC (Intraclass Correlation Coefficient):: A statistic describing the similarity among observations within a group.

Footnotes

Competing Interests

CJV and LGA declare no competing interests. DBA and his institution have received consulting fees, grants, contracts, and donations from multiple for-profit entities with interests in obesity, nutrition, statistics, and clinical trials, but none supported or are directly related to this article.

References

1.Kenny DA, Judd CM. Consequences of violating the independence assumption in analysis of variance. Psychological bulletin. 1986;99(3):422. [Google Scholar]
2.Chusyd DE, Austad SN, Dickinson SL, Ejima K, Gadbury GL, Golzarri-Arroyo L, Holden RJ, Jamshidi-Naeini Y, Landsittel D, Mehta T. Randomization, design and analysis for interdependency in aging research: no person or mouse is an island. Nature aging. 2022;2(12):1101–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Murray DM. Design and analysis of group-randomized trials: Monographs in Epidemiology and, 1998. [Google Scholar]
4.Luciano A, Churchill GA. The impact of co-housing on murine aging studies. GeroScience. 2025:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Parker ES, Golzarri-Arroyo L, Dickinson S, Henschel B, Becerra-Garcia L-E, Mokalla TR, Robertson OC, Thapa DK, Vorland CJ, Allison DB. Improving Statistical Rigor in Animal Aging Research by Addressing Clustering and Nesting Effects: Illustration with the National Institute on Aging’s Intervention Testing Program Data. bioRxiv. 2025:2025.03. 14.642436. [Google Scholar]
6.Faes C, Hens N, Aerts M, Shkedy Z, Geys H, Mintiens K, Laevens H, Boelaert F. Estimating herd-specific force of infection by using random-effects models for clustered binary data and monotone fractional polynomials. Journal of the Royal Statistical Society Series C: Applied Statistics. 2006;55(5):595–613. [Google Scholar]
7.Varnell SP, Murray DM, Baker WL. An evaluation of analysis options for the one-group-per-condition design: can any of the alternatives overcome the problems inherent in this design? Evaluation Review. 2001;25(4):440–53. [DOI] [PubMed] [Google Scholar]
8.Ruff JS, Hugentobler SA, Suchy AK, Sosa MM, Tanner RE, Hite ME, Morrison LC, Gieng SH, Shigenaga MK, Potts WK. Compared to sucrose, previous consumption of fructose and glucose monosaccharides reduces survival and fitness of female mice. The Journal of nutrition. 2015;145(3):434–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Winklhofer M Pseudoreplication and inappropriate statistical tests in the analysis of preference index data from the T-maze group assay for Drosophila behaviour. bioRxiv. 2023:2023.12.15.571933. doi: 10.1101/2023.12.15.571933. [DOI] [Google Scholar]
10.Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187–211. [Google Scholar]
11.Lazic SE, Clarke-Williams CJ, Munafò MR. What exactly is ‘N’in cell culture and animal experiments? PLoS Biology. 2018;16(4):e2005282. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Barkhouse K, Van Vleck LD, Cundiff LV. Effect of ignoring random sire and dam effects on estimates and standard errors of breed comparisons. Journal of animal science. 1998;76(9):2279–86. [DOI] [PubMed] [Google Scholar]
13.Sorzano; COS. Statistical experiment design for animal research [Internet]. 2025; Available from: https://i2pc.es/coss/Articulos/Sorzano2023.pdf.
14.Du Sert NP, Ahluwalia A, Alam S, Avey MT, Baker M, Browne WJ, Clark A, Cuthill IC, Dirnagl U, Emerson M. Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0. PLoS biology. 2020;18(7):e3000411. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Brown AW, Li P, Bohan Brown MM, Kaiser KA, Keith SW, Oakes JM, Allison DB. Best (but oft-forgotten) practices: designing, analyzing, and reporting cluster randomized controlled trials. Am J Clin Nutr. 2015;102(2):241–8. doi: 10.3945/ajcn.114.105072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Kenny DA, Judd CM. Consequences of violating the independence assumption in analysis of variance. Psychological bulletin. 1986;99(3):422. [Google Scholar]

[R2] 2.Chusyd DE, Austad SN, Dickinson SL, Ejima K, Gadbury GL, Golzarri-Arroyo L, Holden RJ, Jamshidi-Naeini Y, Landsittel D, Mehta T. Randomization, design and analysis for interdependency in aging research: no person or mouse is an island. Nature aging. 2022;2(12):1101–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Murray DM. Design and analysis of group-randomized trials: Monographs in Epidemiology and, 1998. [Google Scholar]

[R4] 4.Luciano A, Churchill GA. The impact of co-housing on murine aging studies. GeroScience. 2025:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Parker ES, Golzarri-Arroyo L, Dickinson S, Henschel B, Becerra-Garcia L-E, Mokalla TR, Robertson OC, Thapa DK, Vorland CJ, Allison DB. Improving Statistical Rigor in Animal Aging Research by Addressing Clustering and Nesting Effects: Illustration with the National Institute on Aging’s Intervention Testing Program Data. bioRxiv. 2025:2025.03. 14.642436. [Google Scholar]

[R6] 6.Faes C, Hens N, Aerts M, Shkedy Z, Geys H, Mintiens K, Laevens H, Boelaert F. Estimating herd-specific force of infection by using random-effects models for clustered binary data and monotone fractional polynomials. Journal of the Royal Statistical Society Series C: Applied Statistics. 2006;55(5):595–613. [Google Scholar]

[R7] 7.Varnell SP, Murray DM, Baker WL. An evaluation of analysis options for the one-group-per-condition design: can any of the alternatives overcome the problems inherent in this design? Evaluation Review. 2001;25(4):440–53. [DOI] [PubMed] [Google Scholar]

[R8] 8.Ruff JS, Hugentobler SA, Suchy AK, Sosa MM, Tanner RE, Hite ME, Morrison LC, Gieng SH, Shigenaga MK, Potts WK. Compared to sucrose, previous consumption of fructose and glucose monosaccharides reduces survival and fitness of female mice. The Journal of nutrition. 2015;145(3):434–41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Winklhofer M Pseudoreplication and inappropriate statistical tests in the analysis of preference index data from the T-maze group assay for Drosophila behaviour. bioRxiv. 2023:2023.12.15.571933. doi: 10.1101/2023.12.15.571933. [DOI] [Google Scholar]

[R10] 10.Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187–211. [Google Scholar]

[R11] 11.Lazic SE, Clarke-Williams CJ, Munafò MR. What exactly is ‘N’in cell culture and animal experiments? PLoS Biology. 2018;16(4):e2005282. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Barkhouse K, Van Vleck LD, Cundiff LV. Effect of ignoring random sire and dam effects on estimates and standard errors of breed comparisons. Journal of animal science. 1998;76(9):2279–86. [DOI] [PubMed] [Google Scholar]

[R13] 13.Sorzano; COS. Statistical experiment design for animal research [Internet]. 2025; Available from: https://i2pc.es/coss/Articulos/Sorzano2023.pdf.

[R14] 14.Du Sert NP, Ahluwalia A, Alam S, Avey MT, Baker M, Browne WJ, Clark A, Cuthill IC, Dirnagl U, Emerson M. Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0. PLoS biology. 2020;18(7):e3000411. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Brown AW, Li P, Bohan Brown MM, Kaiser KA, Keith SW, Oakes JM, Allison DB. Best (but oft-forgotten) practices: designing, analyzing, and reporting cluster randomized controlled trials. Am J Clin Nutr. 2015;102(2):241–8. doi: 10.3945/ajcn.114.105072. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Brief Guide to Statistical Analysis of Grouped Data in Preclinical Research

Colby J Vorland

Lilian Golzarri-Arroyo

David B Allison

Abstract

Introduction

What are clustering and nesting?

Consequences of failing to account for C&N

Examples of C&N in preclinical research

Figure 1. Common Clustering and Nesting Scenarios in Preclinical Research.

Group-housed animals (cage effects)

Invertebrates and small animals

Litter, sire, and donor effects

Multiple measurements per animal

In vitro experiments

Site, batch, and researcher effects

Best practices when planning and analyzing for C&N

Summary and Conclusions

Acknowledgements

Box 1: Glossary

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Brief Guide to Statistical Analysis of Grouped Data in Preclinical Research

Colby J Vorland

Lilian Golzarri-Arroyo

David B Allison

Abstract

Introduction

What are clustering and nesting?

Consequences of failing to account for C&N

Examples of C&N in preclinical research

Figure 1. Common Clustering and Nesting Scenarios in Preclinical Research.

Group-housed animals (cage effects)

Invertebrates and small animals

Litter, sire, and donor effects

Multiple measurements per animal

In vitro experiments

Site, batch, and researcher effects

Best practices when planning and analyzing for C&N

Summary and Conclusions

Acknowledgements

Box 1: Glossary

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases