Abstract
The link between environmental chemical exposures and neurodevelopmental disorders such as autism and attention-deficit/hyperactivity disorder underscores the need to develop efficient developmental neurotoxicity (DNT) assays for chemical evaluation. The zebrafish Light-Dark Transition Test (LDTT) assesses changes in zebrafish larval behavioral responses to chemical exposure by recording their distance moved under alternating light and dark conditions. To gain confidence in classifying a chemical as having a DNT effect for the LDTT assay, it is important to determine the minimum sample size to obtain a robust behavioral response. We calculated statistical power under common models based on LDTT data collected from four laboratories using standard protocol parameters, where each 96-well plate contained 5–7 test concentrations and 12–16 vehicle control wells (1 larva/well). Power calculations were conducted to identify concentration effects using t-tests, analysis of variance (ANOVA), and repeated measures ANOVA (RMANOVA), with data from four endpoints: Total Distance, Movement Similarity, Distance Change, and Distance Shift. The tests showed the highest power for the Movement Similarity and Distance Change endpoints, which had the lowest intra- and inter-laboratory variability, resulting in a smaller necessary sample size to estimate dose effects. The use of these endpoints more than doubled the power of the statistical tests for the Total Distance endpoints using the same sample size and typically required between 8–32 samples to achieve 80% power at a 20% effect size. This work demonstrates that the LDTT can be improved for detecting DNT effects by careful consideration of endpoint selection, data transformation, and type of statistical test.
Keywords: Zebrafish, Power Analysis and Calculations, Reproducibility, Light-Dark Transition Test (LDTT), Developmental Neurotoxicity (DNT), New Approach Methodologies (NAMs)
1. Introduction
Much attention has been given to neurodevelopmental disorders among young children, such as autism and attention-deficit/hyperactivity disorder (ADHD), and these disorders have been linked to environmental chemical exposure (Grandjean and Landrigan 2014; Bellinger 2024). This is but one example of the chronic (and/or neurotoxic) effects of exposure, and rodent models have traditionally been employed in developmental neurotoxicity (DNT) assessments to study the effects of prenatal exposure of animals, followed by evaluation of the offspring. However, these assays are not suitable for evaluating large numbers of chemicals due to their high time, cost, and resource demands. These limitations have resulted in only a small percentage of chemicals on the market being evaluated for DNT potential (Smirnova et al. 2024; Bal-Price et al. 2018). There is, therefore, an urgent need for efficient and high-throughput testing methods to characterize and predict chemicals posing DNT hazards to humans.
In response to this urgent testing need, research has focused on the development and validation of cell-based new approach methodologies (NAMs) for faster, less expensive, and human-relevant information (Fritsche et al. 2018; Crofton et al. 2011). One notable success is the development of the DNT In Vitro Battery (DNT-IVB), a set of assays developed by a consortium of DNT expert laboratories (OECD 2023). Released in November 2023 by the Organization for Economic Co-operation and Development (OECD), the DNT-IVB is proposed to serve as a Tier 1 screening tool for chemicals with potential DNT properties (Smirnova et al. 2024). However, due to the complexity of the brain, additional assays are required to address biological gaps such as the assessment of neurobehavior. One promising model providing whole-organism readouts is the larval zebrafish (Danio rerio).
Zebrafish have emerged as a promising model for DNT studies due to their low cost, ease of maintenance, transparency, and high genetic similarity to humans (70% of human genes have zebrafish counterparts) (Howe et al. 2013). Furthermore, the embryonic zebrafish develops quickly from a single cell to a larva (i.e., 5 days post-fertilization) with measurable sensorimotor brain activity in a 96-well plate format. Research over the past decade has demonstrated a high homology of neural network structure and function between humans, zebrafish, and other species (de Esch et al. 2012; Fitzgerald et al. 2021; Nishimura et al. 2015). Like rodents, zebrafish offer valuable insights into the neurotoxic and endocrine-disrupting effects of chemical exposures, yet they differ significantly in methodologies, physiology, and behavioral contexts (Vorhees et al. 2021; Levin 2011; Bailey, Oliveri, and Levin 2013). Zebrafish, while neurologically less complex, are diurnal, whereas rodents are nocturnal – a fundamental circadian difference that profoundly influences baseline activity levels, arousal states, and responses in behavioral assays performed during the day (Chiu and Prober 2013; Cassar et al. 2020). The diurnal nature of zebrafish aligns more closely with the human sleep-wake cycle, potentially making them a more suitable model for evaluating neurobehavioral endpoints that are modulated by circadian rhythms, such as anxiety, alertness, and locomotor activity.
The Light-Dark Transition Test (LDTT) is one of the most used approaches to assess the impact of compounds on zebrafish locomotive behavior (Haigis et al. 2022). This assay measures the hatched embryo’s (i.e., larva) behavioral response (typically as early as 4 days post fertilization (dpf)) to a sudden illumination transition within a predetermined light-darkness pattern (Irons et al. 2010; MacPhail et al. 2009). During the LDTT, zebrafish larvae display basal locomotor activity during the light phase, that rapidly increases after the light sudden switch off. Ample literature demonstrates how chemical exposures can alter the prototypic behaviors observed in the LDTT, which can be used to predict the potential for DNT or acute neurotoxicity, depending on the study design (Ellis et al. 2018; Kopp, Legler, and Legradi 2018; Quevedo et al. 2019; Vorhees et al. 2021).
The value of the LDTT assay is diminished by the lack of harmonization in protocols, data analyses, and reporting standards. Another critical aspect often overlooked in alternative animal assays, but routinely examined in in vivo assays, is determining the sample size needed to detect a true effect (i.e., through power calculations) (Serdar et al. 2020; Cohen 2009; Ricci et al. 2020). However, sample size determination via power calculations for a given experiment depends on the analyses used and the endpoints investigated. In this manuscript, we conducted power calculations using three common statistical tests: t-tests, analysis of variance (ANOVA), and repeated measures ANOVA (RMANOVA). Each test sequentially increases in complexity of calculation and thus provides a comprehensive understanding of statistical power for the LDTT assay.
The power calculations are based on four types of locomotor endpoints derived from the recorded distance moved data: Total Distance, Movement Similarity, Distance Change, and Distance Shift. Total Distance is a common practice in the discipline (i.e., measuring how much distance is moved during a time frame) (Yang et al. 2021; Venkatachalam et al. 2023; Jarema et al. 2022). However, the vehicle control (VC) baseline responses are often variable, which can hinder the detection of a true effect. To address this issue, the latter three endpoint types involve plate-wise normalizations to vehicle controls. Movement Similarity was first developed for benchmark concentration (BMC) analysis due to its simplified endpoint (decreased movement similarity), which appears to reduce the scale of the benchmark response (BMR, often related to background noise) (Hsieh et al. 2019). In this study, we introduce the Distance Change and Distance Shift endpoints, which standardize the distance travelled by each larva to its plate’s VCs at each light or dark phase. We hypothesize that the variation in baseline responses can be reduced with these standardizations, potentially increasing the sensitivity of these two types of endpoints to capture effects.
To further understand the variability in these endpoints, and to inform the sample sizes of future studies, this manuscript seeks to rigorously evaluate the statistical power of the LDTT assay across four participating laboratories who followed a standard protocol design. The methods and endpoints proposed are easily transferable to other LDTT protocols.
2. Methods
The overall data pipeline from data collection to power calculation is presented in Figure 1. The workflow includes six main steps: 1) Protocol Summary, 2) Data preprocessing, 3) Calculate Endpoints, 4) Select Vehicle Controls (VC), 5) Determine Power Calculation Parameters and Statistical Tests, 6) Run Power Calculations, and 7.) Laboratory to Laboratory variability. Details are provided for each main step in the following sections, and the code and data are provided in Supplement 1.
Figure 1.

Power Calculations Methods Graphic. Dark grey boxes show the main steps of the data processing and analysis process. Light grey boxes give details about the steps.
2.1. Protocol Summary
The study, conducted by 4 laboratories (e.g., lab a, b, c, d), utilized AB wild-type adult zebrafish housed in well-monitored aquaria to provide fertilized embryos for toxicity testing (Figure 2). The following elements were harmonized across study. The zebrafish were maintained under the following conditions: 28±1°C, pH 7–7.8, 500–800 μS, and oxygen saturation levels of 80–100%, under a 14-hour light and 10-hour dark photoperiod. Zebrafish embryos were collected from synchronized egg batches and raised in E3 media (0.16 mM MgSO4, 0.4 mM CaCl2, 0.17 mM KCl, 5mM NaCl) until chemical addition at 6–8 hours post-fertilization (hpf), with 10 mM HEPES (pH 7.2–7.6).
Figure 2.

Experimental Setup Graphic. Zebrafish embryos at 6 to 8 hours post fertilization (hpf) are exposed to static compound concentrations or 0.4% (exceptionally 0.5%) DMSO vehicle control media on sealed 96-well plates with a volume of 500 μL. Plate layouts included 7 test concentrations and 1 VC concentration with 12 embryos per group or 5 test concentrations and 1 VC concentration with 16 embryos per group. The Malformation Evaluation and the Behavioral Assay occurred at 120 hpf on the zebrafish larvae. The locomotor activity of the larvae was recorded for 50 minutes after an initial acclimation period of 10 minutes (5 minutes light (ON) and 5 minutes dark (OFF). The testing light protocol was 10 minutes ON + 10 minutes OFF + 10 minutes ON + 20 minutes OFF. Temperature was 28°C, activity was binned per 1 minute, the light switched from 100% to 0% and 0% to 100% with no slope change. Records were obtained with live and video tracking via DanioVision or ZebraBox.
Each zebrafish embryo was placed individually in the wells of a 96-well plate, with 500 μL of of medium per well. The wells were treated with 0.4% DMSO (0.5% DMSO was used in certain cases where chemical solubility was an issue) as the VC with 5 (or 7) different concentrations of test chemicals, which were also dissolved in matching concentrations of DMSO. The chemical-treated E3 medium was not changed or refreshed between the initial addition at 6–8 hpf and behavioral testing at 120 hpf in order to have a static exposure with a known starting concentration. In plates with five concentrations, each concentration, as well as the DMSO control, included 16 embryos; in plates with seven concentrations, each concentration included 12 embryos. Plates were maintained at 28±1°C. At 120 hpf, the Light-Dark Transition Test (LDTT) was performed using commercially available behavioral video tracking instruments: DanioVision (Noldus, Netherlands), or ZebraBox (Viewpoint Lifesciences, France). Assays were initiated 4–8 hours after the incubator light cycle began and ran for 1 hour. The locomotor activity of the larvae was recorded for 50 minutes (tracking time) after an initial acclimation period of 10 minutes (5 minutes light (ON) and 5 minutes dark (OFF), where the ON light intensity was between 850 and 1000 lux. The testing light protocol during the tracking time was 10 minutes ON + 10 minutes OFF + 10 minutes ON + 20 minutes OFF with a light switch from 100% to 0% and 0% to 100% (no slope change). These intervals will be referred to as Light 1, Dark 1, Light 2, and Dark 2, respectively. The primary DNT-specific endpoint measured was the total distance (millimeters) moved during each minute, using automated tracking systems, listed above. After the LDTT, larvae morphological alterations (including uninflated swim bladder) and death were evaluated under a stereoscope and recorded.
2.2. Data Preprocessing
Data from 62 plates, screened between March and December 2023 by four laboratories (coded as a, b, c, and d), were collected for analysis. Eight different testing chemicals were screened using the 62 plates. In this step, the full plate data, including both the VCs and the dosed larvae that were present on the plates, was used to investigate plate and edge effects, but in the other steps, only the VC larvae were used in the power calculations. Data from larvae with malformations and swim bladder issues were not used in the calculations.
2.2.1. Plate and Edge Effects
Edge and plate effects cause undue variability in experiments conducted on multi-well plates (Mansoury et al. 2021). In zebrafish experiments, edge effects are often due to variations in evaporation rates or light exposure across a plate (Ali, van Mil, and Richardson 2011). The Total Distance data (see 2.3 for descriptions of Total Distance) were screened for edge effects and plate effects, which include other sources of variability or bias that may occur across or within plates from the same laboratory (e.g., impact of plate placement or specific analyst within the laboratory). The assessment of edge and plate effects was conducted by generating heat maps and visually examining for any abnormal trends, as well as by discussion with laboratory scientists about potential sources of variability. A total of three plates were removed from the analysis due to potential plate and edge effects (see Supplement 2 for further discussion).
2.3. Calculate Endpoints
Responses were generated based on four types of endpoints: Total Distance, Movement Similarity, Distance Change, and Distance Shift (Table 1). The Total Distance endpoint represents the larvae’s cumulative distance moved in millimeters across the time range within a phase (the 10- or 20-minute session totals per larva per phase). The Total Distance endpoint is often used as the sole endpoint to present the chemical effect in zebrafish research (di Domenico et al. 2024). The Movement Similarity endpoint is a Spearman correlation between the reference points (i.e., median of VC movements) across all four phases (50 minutes) and the individual VC larva. The Movement Similarity endpoint was first introduced in Hsieh et al. (2019), where it was used in benchmark concentration analysis. Here, two additional endpoints are introduced, the Distance Change endpoint and Distance Shift endpoint. The Distance Change endpoint is strictly positive and provides a measure of the magnitude of difference between the plate-wise VC reference level and the individual VC larva level. Contrastingly, the Distance Shift is centered at 0 and indicates both the magnitude and direction of differences between the individual larvae and the reference; positive numbers indicate hyperactivity while negative numbers indicate hypoactivity. The Distance Change endpoint and the Distance Shift endpoints are unique in that they are relative endpoints resulting from the comparison between the plate-wise VC reference trace and individual larva’s trace. In Supplement 3, the formulas of all endpoints are provided.
Table 1.
Endpoints and definitions.
| Endpoint | Phase | Definition | Normalized per plate? |
|---|---|---|---|
| Total Distance | Light 1 (L1) Dark 1 (D1) Light 2 (L2) Dark 2 (D2) |
The total distance (mm) travelled by each larva per phase. | No |
| Movement Similarity | Across all phases | The Spearman correlation between each larva and the VC median. Median calculated per plate. | Yes |
| Distance Change | Light 1 (L1) Dark 1 (D1) Light 2 (L2) Dark 2 (D2) |
Measure of the magnitude of difference between each larva and the median of its plate’s VCs. The area between the larva and VC median curve (Figure 3). Median calculated per plate. | Yes |
| Distance Shift | Light 1 (L1) Dark 1 (D1) Light 2 (L2) Dark 2 (D2) |
Measure of the magnitude and direction of difference between each larva and the VC median. Median calculated per plate. |
Yes |
In Figure 3, an illustrative example demonstrates how relative distance is calculated in the Distance Change and Distance Shift endpoints. Figure 3 shows traces of the distance moved for VC larvae from one plate at the top of the figure. The median VC larva value per minute is then calculated and used to standardize the other larvae values, called the reference (in black). Over a phase, the area between the reference curve and the individual larva’s curve is calculated in two ways, becoming the Distance Change and Distance Shift endpoints. The Distance Change endpoint is calculated per plate by summing the area of the region between each larva trace and the reference trace. In contrast, the Distance Shift endpoint is calculated as the difference between the area where the larva moved more than the reference and the area where it moved less than the reference.
Figure 3.

Creation of Relative Endpoints. Top: Distance travelled over 50 minutes of Vehicle Control (VC) larvae from one plate (n=16). The reference, or VC median, is shown in black. Lights are on for minutes 0–10 and 20–30 (white) and off for minutes 10–20 and 30–50 (grey). Middle: Two VC larvae compared to the reference. Bottom: The area of the region between each larva trace and the reference trace is used to calculate the Distance Change and Distance Shift endpoints. Red shading indicates where the larva trace is greater than the reference trace. Blue shading indicates where the larva trace is less than the reference trace. Red area plus Blue area equals the Distance Change. Red area minus Blue area equals the Distance Shift.
Values of all endpoints for Figure 3‘s example larva are presented in Table 2. Larva A and B have similar values for L1 Total Distance (369mm vs. 365mm), but Larva A moves less than Larva B in D1 (750mm vs. 1245mm). Their Distance Change values, or the area in red plus the area in blue, for L1 are also similar (325 vs. 331), but in D1, the Distance Change of Larva A is again less than Larva B (164 vs. 335). This can be visually confirmed by D1 in Figure 3; although Larva A’s curve is below the reference while Larva B’s curve is above, the shaded area for Larva B appears bigger. Notice that in D1, for both Larva A and Larva B, their values are either strictly above or strictly below the reference. The Distance Shift values, which is the area in red minus the area in blue, then become simple. Larva A remains strictly below the reference curve in D1, so its Distance Shift value is −164, the negative of its Distance Change. Larva B remains strictly above the reference curve in D1, so its Distance Shift value is 335, the same as its Distance Change. In this way, we see that the Distance Shift value contains information about the direction of the difference beyond just a magnitude. Looking to D2, we notice that the curves of both larva go above and below the reference curve. Larva A’s Distance Change value is 485, and its Distance Shift value is −401, incorporating where it is greater than the reference around Minute 30. Larva B’s Distance Change value is 395 while its Distance Shift is slightly less at 370, incorporating where it drops below the reference around Minute 40. Finally, the Movement Similarity for Larva A is less than Larva B, at 0.51 vs. 0.78. Larva A tends to move more differently than the reference, which is particularly noticeable in L2 where the reference stays flat but where Larva A’s Distance is increasing.
Table 2.
Endpoint values of Example Larvae A and B. This table accompanies Figure 3 to showcase the endpoint values for each endpoint type and phase.
| Endpoint | Phase | Larva A | Larva B |
|---|---|---|---|
| Total Distance | L1 | 369 | 365 |
| D1 | 750 | 1245 | |
| L2 | 307 | 79 | |
| D2 | 985 | 1695 | |
| Distance Change | L1 | 325 | 331 |
| D1 | 164 | 335 | |
| L2 | 244 | 28 | |
| D2 | 485 | 395 | |
| Distance Shift | L1 | 325 | 321 |
| D1 | −164 | 335 | |
| L2 | 240 | 8 | |
| D2 | −401 | 370 | |
| Movement Similarity | Overall | 0.51 | 0.78 |
2.4. Utilization of Vehicle Control (VC) Data
Power calculations are most effective when experimenters have previously collected data comparable to the data to be collected in their next experiment from which to calculate means and standard deviations. For this purpose, only the VC data were used within the power calculations presented here. The VC data for all endpoint types were pooled within a laboratory. All other data collected to assess chemical effects in the LDTT assay were not used for this assessment. After removal of edge/plate effects and malformations there were 214 VC larvae from laboratory a, 313 from laboratory b, 119 from laboratory c, and 195 from laboratory d.
2.5. Determine Power Calculation Parameters and Statistical Tests
Power is a property of a given statistical test, representing the probability the selected test will detect an effect (e.g., dose) provided such an effect exists in the data (Rousseaux, Shockley, and Gad 2022; Jeffers et al. 2024). Equivalent to a true positive rate, a test’s power depends on four parameters: (1) the probability of accepting a false positive, α; (2) the sample size of the experiment (i.e., number of larvae needed per dose group); (3) the variability in the data; and (4) the effect size, a biologically meaningful difference expected between groups. Table 3 contains the parameter values selected for this work. If a test is underpowered, true effects are likely to be missed, resulting in a waste of samples. If a test is overpowered, then the true effect could have been detected with a smaller experiment, also leading to unnecessarily tested samples. Running a power calculation is a crucial step in experimental design that allows for the sample size to be carefully selected to avoid either type of waste.
Table 3.
Statistical Tests, Change Direction, and Power calculation parameters. The parameters below were used for the two-sided t-test, ANOVA, and RMANOVA power calculations. N/A: not applicable.
| Statistical Tests used in Power Calculations | ||
|---|---|---|
| Test | Hypotheses (μi indicates the mean of VC and treatments i = 1, … , 5) | |
| Two-sided t-test |
H0: μVC = μ5 Ha: μVC ≠ μ5 |
|
| If testing for group differences: | ||
| Analysis of Variance | H0: μVC = μ1 = μ2 = ⋯ = μ5 | |
| Ha: At least one mean is not equal to the others | ||
| Repeated Measures Analysis of Variance | If testing for group differences while accounting for repeated measurements: | |
| H0: μVC = μ1 = μ2 = ⋯ = μ5 | ||
| Ha: At least one mean is not equal to the others | ||
| Endpoint Change Direction | ||
| Endpoint | Change Direction from Control | Change Meaning |
| Total Distance: Light | Increase | Hyperactivity |
| Total Distance: Dark | Decrease | Hypoactivity |
| Distance Shift: Light | Increase | Hyperactivity |
| Distance Shift: Dark | Decrease | Hypoactivity |
| Distance Change: Light | Increase | Abnormal Behavior |
| Distance Change: Dark | Decrease | Abnormal behavior |
| Movement Similarity | Decrease | Abnormal behavior |
| Power Calculation Parameters | ||
| Parameter | Value | Details |
| α | 0.05 | N/A |
| Sample Size | Two-sided t-test: n = 2, 4, 6, … 80 Two-sided t-test until 80% power: n = 4, 8, 12, …, 468 ANOVA: n = 16, 32, 48, 64, 80 RMANOVA: n = 16, 32, 48 |
N/A |
| Effect Size | 20% Change from Control Mean at highest dose -Total Distance -Distance Change -Movement Similarity 20% Interquartile Range from Control Mean at highest dose -Distance Shift |
A difference intended to reflect a biologically relevant change from control (Padilla et al. 2011). Direction of change indicated below. |
| VC Mean | Pooled per laboratory | VC sample sizes: Lab a: n = 214 Lab b: n = 313 Lab c: n = 119 Lab d: n = 195 |
| VC Standard Deviation | Pooled per laboratory | N/A |
The three statistical tests presented in this work are commonly used in laboratory data analysis: a two-sided t-test, an analysis of variance (ANOVA), and a repeated measures ANOVA (RMANOVA). All statistical tests used here assume that the data are normally distributed, and more information about the rationale for their selection is presented below. When an endpoint is not normally distributed, it is common statistical practice to transform it with a mathematical function. The Total Distance and Distance Shift endpoints were normally distributed and did not require transformation. The Distance Change endpoint was transformed with a base 10 logarithm. The Movement Similarity endpoint was transformed with a Fisher transformation, which brings a correlation bounded between 0 and 1 to a continuous, approximately normal scale.
2.6. Power Calculations
The power for each endpoint under a two-sided t-test was calculated first. The two-sided t-test checks for a difference between two groups, the direction of which is not known a priori. Power under the two-sided t-test was calculated with a closed-form equation. Next, power under an ANOVA model was assessed. An ANOVA model can include all the doses studied and assess whether there is a difference between any experimental groups. For the results shown here, a VC group and 5 simulated dose groups were tested, as if it were a typical configuration for a 96-well plate with 16 larvae per dose group. The simulated mean of the highest dose group was 20% larger (or smaller) than the VC mean, with the direction as specified in Table 3, and the other four dose group means were linearly interpolated between the VC mean and the highest dose group mean. To find the power under an ANOVA model, the approach described in Jeffers et al. 2024 was followed, and simulation studies with n = 1,000 simulations were conducted per endpoint and per laboratory. Finally, the power of an RMANOVA was calculated. The RMANOVA extends the ANOVA to testing across all four phases within one model. Because each larva is measured in each of the four phases, it is said to be longitudinal or repeated measures data. Simulations using an unstructured 4 by 4 correlation matrix across phase where each correlation between the phases was independently derived from the data were used to calculate the power, with n=1,000 simulations per endpoint and per laboratory. Power is defined as the number of times the test correctly identified an effect across the 1,000 simulations divided by the total number of simulations. Power calculations were run in R version 4.4.1.
Additionally, the power of a mixed model with dose and phase as fixed effects (effects that are constant across studies) and laboratory as a random effect (effect that could vary across studies) is presented in Supplement 4, which provides a generalizability of results across laboratory that may interest some readers.
2.7. Laboratory to Laboratory variability
Even though the data come from four laboratories following a harmonized protocol, for the body of this paper, power is calculated separately for each laboratory, presenting the mean (min, max) power across the four laboratories to show the range of possible power calculations that an experimenter within a similar laboratory could expect to see. A mixed model with laboratory as a random effect and no other predictors was generated to quantify the variability in the endpoints due to laboratory.
3. Results
3.1. Variations in Endpoint Values
The patterns present in the VC endpoint values align with biological expectations (Figure 4): the Total Distance endpoint shows higher levels of activity within the Dark phases than the Light phases, particularly for Dark phase 2 (as D2), which was 20 minutes long (all other phases were 10 minutes long). Reduced activity was seen in the Light phases. Laboratory differences can be seen within the Total Distance endpoints across phase, with Laboratory b (in black) tending to have higher measures and Laboratory c (in green) tending to have the lowest measures. These laboratory differences are somewhat lessened in the Distance Change endpoint and noticeably lessened in the Distance Shift endpoint. Values for the Distance Change endpoint are strictly positive, and the D2 measures are greater than the other phases. Values for the Distance Shift are centered around 0, with negative values indicating hypoactivity and positive values indicating hyperactivity as compared to each larva’s per-plate reference. The Movement Similarity endpoint is between 0 and 1, with most values between 0.5 and 0.75, indicating a moderate level of similarity between each larva and its per-plate reference in this endpoint.
Figure 4.

Vehicle Control data (untransformed) from all plates. Boxplots shown per laboratory. Left. Total Distance endpoint per phase. Middle Left. Distance Change endpoint per phase. Middle Right. Distance Shift endpoint per phase. Right. Movement Similarity Endpoint. Color represents different laboratories. Laboratory differences are most apparent in the Total Distance endpoints.
In Figure 4, we use boxplots to visually compare the variability (spread) of the endpoint data across different endpoints and phases, where applicable. We quantified the relationship between mean and variability in each endpoint using the coefficient of variation (CV), which is 100 times the ratio of the standard deviation (square root of an endpoint’s variance) divided by the same endpoint’s mean. Note that the mean is not pictured in Figure 4 and is sensitive to outliers. The mean, standard deviation, and CV for each endpoint and laboratory are provided in Supplement 5. While the CV is one of many factors influencing a power analysis, it is highly predictive of a statistical test’s power. Generally speaking, endpoints with a lower CV (<30) will result in a higher-powered test while endpoints with a higher CV (>100) will result in underpowered tests for the same sample size. The CVs of the endpoints per laboratory are presented in Table 4. We observed that larger CVs are present in the Light phase Total Distance endpoints and the Distance Shift endpoints. For Distance Shift endpoints, CVs are large in part because the means of the Distance Shift endpoints are near 0, and dividing anything by a small number creates large results. Unfortunately, because the Distance Shift is, by its nature and definition, centered at 0, the large CVs are unavoidable.
Table 4.
Coefficients of Variations by Laboratory (a, b, c, d) for VC data of endpoints. The coefficient of variation (CV) is 100 times the ratio of the standard deviation divided by the mean. It is used within the power calculation. Lower CVs (<30) tend to be higher powered while higher CVs (>100) tend to be noticeably underpowered.
| Endpoint | a | b | c | d |
|---|---|---|---|---|
| Total Distance L1 | 86.9 | 36.3 | 122.4 | 96.9 |
| Total Distance D1 | 22.8 | 20.6 | 46.3 | 34.0 |
| Total Distance L2 | 73.9 | 31.5 | 80.9 | 87.8 |
| Total Distance D2 | 29.2 | 21.7 | 52.2 | 45.3 |
| Distance Change L1 | 18.9 | 11.6 | 26.1 | 22.2 |
| Distance Change D1 | 11.6 | 12.1 | 14.9 | 13.7 |
| Distance Change L2 | 20.9 | 13.7 | 25.7 | 25.5 |
| Distance Change D2 | 8.4 | 9.3 | 10.3 | 10.8 |
| Distance Shift L1 | 206.2 | 606.4 | 208.7 | 296.8 |
| Distance Shift D1 | 7,041.5 | 2,070.3 | 1,406.1 | 738.3 |
| Distance Shift L2 | 206.2 | 496.2 | 191.6 | 283.3 |
| Distance Shift D2 | −3,979.1 | 23,249.1 | 1,074.9 | 533.2 |
| Movement Similarity | 21.2 | 24.2 | 31.5 | 25.6 |
3.2. Power calculation
3.2.1. Two-sided t-test
All power results shown here are for a 20% effect size. Figure 5 shows the results of the t-test power calculation, averaged over laboratory, and the power for each laboratory can be found in Supplement 5. For both Light phases (L1 and L2), the t-test reaches 80% power with 18 samples (i.e., larvae) per dose group for the Distance Change endpoints, but it requires more than 80 samples per dose group for the L1 and L2 phases of Total Distance and Distance Shift endpoints. The Distance Change endpoints have smaller confidence intervals compared to the Total Distance and Distance Shift endpoints, indicating less laboratory-to-laboratory variability. The Distance Change endpoints perform well in the power analyses in both Light phases.
Figure 5.

Power curves for a two-sided t-test on a 20% effect size as a function of the number of samples per dose, averaged over lab, with distinct colors per endpoint. Solid lines represent the first Light or Dark phase. Dashed lines represent the second Light or Dark phase. Shaded regions range from the laboratory with the lowest power for a given endpoint at the minimum, and laboratory with the highest power for a given endpoint at the maximum. Horizontal reference line at 80% power. Left panel: Light endpoints. Center panel: Dark endpoints. Right panel: Movement Similarity endpoint.
The power curves of the endpoints in the Dark phases (D1 and D2) show a similar pattern to those in the Light phases (L1 and L2). Although the differences are less pronounced, there is still a noticeable difference in the power of the t-test for the Total Distance, Distance Shift, and Distance Change endpoints in the Dark phases. The t-test on both Dark phases of Distance Change endpoints reaches 80% power within 10 samples per dose group. The Dark phases of Total Distance endpoints are powered at 80% within 62 samples per dose group, but the Dark phases of Distance Shift endpoints do not hit this mark within 80 samples. Compared to the Light phases of endpoints, the t-test is higher-powered for the Dark phases of endpoints, suggesting there is less variability relative to the mean in the Dark phases of endpoints than in the Light phases of endpoints. Similarly to the Light phase, the Distance Change Dark phase endpoints perform well in the power analyses.
The Movement Similarity endpoint requires 30 samples per dose group to reach 80% power. The intervals indicate that some inter-laboratory variability is present in these curves, but not as much as the Total Distance endpoints.
For the endpoints that did not reach 80% power within 80 samples, it was of interest to determine just how many samples would be required to meet this mark for a 20% effect size. When conducted separately for each laboratory, the results show a very wide range (Figure 6).
Figure 6.

Sample Size per dose group range for a two-sided t-test to achieve 80% power on a 20% effect size change, per laboratory. The Distance Change endpoints require the fewest number of samples per dose group, with the endpoints in the two Dark phases requiring between 4 and 8 samples, and the endpoints in the two Light phases requiring between 6 and 22 samples. The Movement Similarity endpoint required between 18 and 34 samples. The Total Distance endpoints in all four phases required more samples for the t-test to reach 80% power, with endpoints in the Dark phases requiring between 14 and 86 samples per dose, and endpoints in the Light phases requiring between 34 and 466 samples. The Distance Shift endpoints required the most samples for the t-test to reach 80% power, requiring between 182 and 650 samples per dose group.
Across all laboratories and phases, the Distance Change and Movement Similarity endpoints achieve 80% power with 42 or fewer samples. While the Total Distance endpoints in the Dark phases require fewer samples than those in the Light phases, it is suggested that between 18 and 110 samples per dose group are needed. For the Total Distance endpoints in the Light phases, between 42 and 590 samples per dose group are indicated. The Distance Shift endpoint has even more samples indicated, between 230 and 822 samples across the four phases. This wide range reflects the variability between laboratories for these endpoints. As before, the Distance Change and Movement Similarity endpoints perform well in the power analyses, requiring the lowest number of samples to reach 80% power.
3.2.2. ANOVA
The ANOVA, which compares multiple dose groups within a phase, is generally less powered than the t-test. The only endpoints in the Light phases to reach 80% power within 80 samples per dose group are the L1 and L2 Distance Change endpoints, requiring between 16 and 32 samples (Figure 7). In the Dark phases, the ANOVA for the D1 and D2 Distance Change endpoints is well powered at 8 samples per dose group. At 48 samples per dose group, the ANOVA for the Total Distance endpoints in D1 is over 80% power while D2 requires 80 samples to reach the 80% power threshold. None of the Distance Shift endpoints reach 80% power within 80 samples per dose groups. The Movement Similarity endpoint requires 32 samples for the ANOVA to reach 80% power. Despite being less powered than the t-test, the ANOVA shows ≥80% power for the Distance Change endpoints in both the Light and Dark phases.
Figure 7.

Power curves for an ANOVA on a 20% effect size as a function of the number of samples per dose, averaged over lab, with distinct colors per endpoint. Solid lines represent the first Light or Dark phase. Dashed lines represent the second Light or Dark phase. Shaded regions range from the laboratory with the lowest power for a given endpoint at the minimum, and laboratory with the highest power for a given endpoint at the maximum. Horizontal reference line at 80% power. Left panel: Light endpoints. Center panel: Dark endpoints. Right panel: Movement Similarity endpoint.
3.3.3. RMANOVA
RMANOVA, which looks at multiple dose groups over the four Light and Dark phases, was only conducted using Total Distance, Distance Change, and Distance Shift endpoints, as the Movement Similarity endpoint accounts for all Light/Dark phases in one value. The RMANOVA for the Distance Change endpoints is fully powered well within 8 samples per dose group (Figure 8). The Distance Shift endpoints stay under 35% power through 48 samples per dose group. At 80 samples, the test on the Total Distance endpoints achieves 80% power, but with large inter-laboratory variability. This test exaggerates patterns seen above, that tests on the Distance Change endpoint can detect differences within reasonable numbers of samples while the Distance Shift endpoint remains underpowered.
Figure 8.

Power curves for an RMANOVA on a 20% effect size as a function of the number of samples per dose, averaged over lab, with distinct colors per endpoint type. Shaded regions range from the laboratory with the lowest power for a given endpoint at the minimum, and laboratory with the highest power for a given endpoint at the maximum. Horizontal reference line at 80% power. Left panel. Total Distance Travelled. Center panel. Distance Change. Right panel. Distance Shift.
3.4. Variability due to Laboratory
A linear mixed model evaluating the endpoint value by means of a random effect for laboratory and no other parameters was used to understand variability due to laboratory in this study. This approach permits the estimation of variance components due to laboratory and random error. The percentages of variability attributable to laboratory show that the normalized endpoints have less variability than the nonnormalized endpoints (i.e., Total Distance) (Figure 9). For all four phases for the Total Distance endpoint, at least half of their total variability is due to the laboratories. When normalizing to the VCs per plate, as is done for the Distance Change, Distance Shift, and Movement Similarity endpoints, the percentage of variability explained by the laboratory noticeably drops. There is minimal laboratory variability observed in the Distance Change and Distance Shift endpoints at the Dark phases. The endpoints, including Distance Change, Distance Shift, and Movement Similarity, which incorporate per-plate normalization, exhibit reduced variability across laboratories.
Figure 9.

Percentage of Variability Attributable to Laboratory. The length of the bar represents how much variability in the endpoint is due to differences between the laboratories. The colors represent four types of endpoints. This figure suggests that by standardizing each plate to the median of its vehicle controls, the laboratory effects are mitigated.
4. Discussion
In this study, four laboratories tested chemicals using a harmonized protocol designed to detect chemical-induced developmental neurotoxicity (DNT) effects on zebrafish larvae, as indicated by changes in larval behavior. A total of 62 plates, each containing one of the eight testing chemicals, were screened. For each plate, VC data on the distance moved per minute were collected. To capture various types of movement, four types of endpoints were generated: Total Distance, Movement Similarity, Distance Shift, and Distance Change. Intra- and inter-laboratory variations among these endpoints were investigated, and power calculations were conducted to determine the sample size needed for this protocol. Our analyses revealed the importance of data normalization when conducting cross-laboratory studies in the zebrafish LDTT assay and how the selection of endpoints can affect the usability and reliability of the protocol.
The Total Distance endpoint is commonly used in the field (Quevedo et al. 2019; Alzualde et al. 2018; Cassar et al. 2020; Haigis et al. 2022), while the authors developed the other three endpoints to reduce response variations and/or to increase assay sensitivity. A major distinction between the Total Distance endpoint and the other three endpoints is that the generation of the Total Distance endpoint does not involve normalizing data using plate-wise VCs. At least partially due to the lack of normalization of the data, we observed noticeable inter-laboratory variations in the Total Distance endpoint, whereas the Movement Similarity, Distance Shift, and Distance Change endpoints exhibited limited inter-laboratory variability. Mixed models were used to quantify the inter-laboratory variability, which accounted for over 50% of the variation in the Total Distance endpoint, despite the use of a harmonized protocol. Consequently, for experiments conducted across multiple laboratories, the Total Distance endpoint could hinder efforts to compare results, even when a harmonized protocol was applied, as in this study. In contrast, the Movement Similarity, Distance Shift, and Distance Change endpoints provided more consistency across laboratories, with the mixed models showing significantly reduced inter-laboratory variability.
Using the VC data from the four endpoints, we conducted power calculations employing three common statistical approaches: a two-sided t-test, an ANOVA, and an RMANOVA. While the choice of statistical test for a given experiment depends on the experimental design and the research question, all three approaches compare the responses between groups, but under slightly different settings. A two-sided t-test compares two groups, such as a vehicle control and a dosed group. An ANOVA extends the t-test to compare multiple groups, including multiple dosed groups against the control. An RMANOVA further extends the ANOVA to analyze data across different phases (e.g., L1) in the experiment. We believe that these three statistical approaches cover the most common statistical tests experimenters would use on these data and serve as a starting point for power calculations to determine the sample sizes needed in the protocol.
Under three different statistical approaches for power calculations, the statistical power changed; however, we observed a few consistent result patterns:
The Distance Change and Movement Similarity endpoints required the least number of larvae to reach 80% power, the Total Distance endpoint required more larvae, and the Distance Shift endpoint required the most. These results align with the observation that the coefficient of variation (CV) is largest in the Distance Shift endpoint (Table 3). While the Distance Shift endpoint features many appealing statistical and interpretable properties—such as symmetry, approximate normality, and the ability to showhyper- or hypo-activity—it is centered at 0. This presents a numeric calculation difficulty within power calculations, which typically rely on the effect size as a percentage change from the mean. Although this issue was partially addressed by using the interquartile range in the effect size (which will be a positive non-zero number), the power calculations still indicated a large sample size requirement for achieving 80% power. Using a weighted effect size could improve power calculations, but this is not certain (Vesterinen et al. 2014). More analyses are needed to investigate the best use of Distance Shift endpoint.
More samples are required for endpoints in the Light phases to attain the same power as the endpoints in the Dark phases. Larvae generally exhibit lower average distance travelled in the Light phases than in the Dark phases, which raises their CV values. Though the variation itself is similar across the Light and Dark phases, the variation relative to the mean (or the CV) is larger in the Light phases, resulting in larger CV values. Light phases in LDTT are known to be more variable than Dark phases (Fitzgerald et al. 2019; Hsieh et al. 2019). As the power calculations presented here have been applied consistently across endpoints using the same statistical test, α level, and effect size, the inverse relationship between CV and sample size can be observed (Serdar et al. 2020).
The sample size variation between laboratories is less for the Distance Change and Movement Similarity endpoints compared to the Total Distance and Distance Shift endpoints, to reach 80% power. These results align with the Distance Change and Movement Similarity endpoints having much lower variability across laboratories. The Distance Shift endpoints, as discussed above, were numerically sensitive due to being centered at 0. While they had low inter-laboratory variability, small changes in mean and standard deviations resulted in a big change in their power.
This consistency across endpoints suggests that the Distance Change and Movement Similarity endpoints can provide higher power with fewer samples, even for the Light phases when applicable, and with reduced inter-laboratory variability, making them preferable for cross-laboratory studies. To quantify the increase in power at the same sample size, we averaged the ratio of power with a Distance Change endpoint and power with a Total Distance endpoint across sample sizes up to 80. For the light phase, the t-test for the Distance Change endpoints was powered by an average of 200% more than for the Total Distance; for the dark phase, 70% more. Under an ANOVA model, the power using light phase Distance Change endpoints was 275% greater than for the light phase Total Distance, and using dark phase endpoints, 150% greater. Under a RMANOVA model, the power using Distance Change endpoints was 125% greater than using the Total Distance endpoints. Thus, with the same number of samples, statistical testing was higher powered using the normalized endpoints.
Overall, using the Distance Change and Movement Similarity endpoints for power calculations with three statistical approaches, a 20% effect size generally required between 8 and 32 samples per group to achieve the standard benchmark of 80% power, depending on the phase. However, it is important to note that the Distance Change and Movement Similarity endpoints do not encode the directionality of effects (i.e., hyperactivity or hypoactivity). More plates are needed if the directionality of the effects is crucial for the study, as this necessitates a larger sample size to detect specific directional changes in behavior. Additionally, all of the endpoints presented here represent 10 minute or 20 minute aggregations. Examination of the minute-by-minute distance travelled data could reveal a chemical’s mode of action or impact on the nervous system in a way that the summary endpoints cannot.
A 20% effect size indicates a relevant biological change from the VC and was considered moderate for this assay. While small, medium, and large benchmarks exist surrounding the Cohen’s D statistic, one measure of effect size, the scientist should put more weight into whether an outcome is biologically relevant (Reynolds 2023). The organism, testing paradigm, and experimental question will all influence whether an effect would be considered biologically relevant. For example, in a review of effect sizes within rodent fear conditioning, a 37% effect size was considered intermediate (Carneiro et al. 2018) while in a review of zebrafish and other species’ responses to cardiovascular drugs, the relevant effect sizes were between 10% and 60%, depending on the endpoint (Margiotta-Casaluci et al. 2019). The authors felt that a 20% effect size would reflect what is commonly chosen in other biologically complex studies, namely rodent DNT studies, while also highlighting the differences between the novel endpoints presented here. If a smaller effect size, such as 10%, were used for power calculations, more samples would be required. Thus, if the true effect size in a larval zebrafish experiment were smaller, more samples would be needed to detect an effect under the same statistical tests and the same level of power. Assay developers can conduct their own power calculations, guided by the code companion provided in Supplement 1, using an effect size threshold they find biologically relevant.
Lastly, in addition to the chosen endpoints and the expected effect size, statistical power also depended on the statistical approaches used. While the three statistical tests discussed in this publication—the two-sided t-test, ANOVA, and RMANOVA—are well-known and easy to use, they may not address all sophisticated experimental questions. For instance, when multiple cohorts of larvae are used, the experimenter may wish to model the cohort with a random effect (Stroup 2016). The power of mixed models, which can effectively capture multiple sources of inter- and intra-cohort or laboratory variability, is briefly discussed in Supplement 4. The mixed model approach showed similar power to the average power of the RMANOVA and ANOVA results.
In summary, the analyses presented here inform sample sizes for the LDTT protocol, along with the developed endpoints. A deviation from the protocol presented here, including such factors as VC media, testing procedures, and experimental design, could yield different results and sample sizes. Endpoints involving normalizing with the plate VCs reduced both inter- and intra-laboratory variation, though it was an additional processing step. Consequently, the LDTT protocol with the endpoints presented here could allow for the screening of more chemicals with limited resources, facilitate comparison across labs, and promote the transfer of protocols across the research community.
5. Conclusion
In this manuscript, we use zebrafish Light-Dark Transition Test VC data from across 4 different laboratories to investigate the power and sample size. We present novel endpoints, notably the Distance Change and Distance Shift endpoints, as alternatives to the Total Distance for detecting a behavioral difference. The normalized endpoints effectively remove variability due to laboratory differences, allowing results to be more comparable across laboratories. This reduced variability contributes to the increase in power for these endpoints. The use of these endpoints provides an approach to understanding CNS-related toxicity and promotes the identification of potential neurotoxicants via high-throughput screening.
Supplementary Material
Supplement 1: Code Companion. This supplement includes links to the data and the code companion.
Supplement 2: Edge and Plate Effects. This supplement shows the three plates that had potential edge and plate effects.
Supplement 3: Endpoint Formulas. This supplement shows the formulas used to calculate the Distance Change, Distance Shift, and Movement Similarity endpoints.
Supplement 4: Mixed Model Power. This supplement describes the modeling and results for using a mixed model with a random effect for laboratory.
Supplement 5: Summary Statistics and t-test Power. This excel file contains tables with summary statistics and power calculations for a t-test to detect a 20% change at 80% power.
Highlights.
New light-dark transition endpoints streamline zebrafish neurodevelopment research
Statistical power differs across zebrafish endpoints and light-dark phases
Data normalization by vehicle control per plate minimizes inter-lab variations
Normalized data doubled statistical power in the study protocol
8–32 samples needed per dose group for 80% power at 20% effect size
Acknowledgements
This work was supported by the National Institute of Environmental Health Sciences under contract GS-00F-173CA / 75N96022F00055 to Social & Scientific Systems, a DLH Holdings Corp. Company.
This research was supported [in part] by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences. We also acknowledge the laboratory staff who contributed to the generation of the data utilized in this publication, including teams from BBD BioPhenix - Biobide, National Research Council of Canada, Sinnhuber Aquatic Research Laboratory at Oregon State University, and ZeClinics.
Funding
This research was supported by the Division of Translational Toxicology, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services ZIA ES103387-01. Additionally, the work is supported by contract GS-00F-173CA / 75N96022F00055 to Social & Scientific Systems, a DLH Holdings Corp. Company.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors did not use generative AI or AI-assisted technologies. The authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Declaration of interests
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Arantza Muriana reports financial support was provided by BBD BioPhenix SL. Beatriz Molina Martinez reports financial support was provided by BBD BioPhenix SL. Ana del Pozo reports financial support was provided by BBD BioPhenix SL. Oihane Jaka reports financial support was provided by BBD BioPhenix SL. Valentina Schiavone reports financial support was provided by Zeclinics SL. Vincenzo Di Donato reports financial support was provided by Zeclinics SL. Claudia Miguel Sanz reports financial support was provided by Zeclinics SL. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Bibliography
- Ali Shaukat, van Mil Harald G. J., and Richardson Michael K.. 2011. “Large-Scale Assessment of the Zebrafish Embryo as a Possible Predictive Model in Toxicity Testing.” PloS One 6 (6): e21076. 10.1371/journal.pone.0021076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alzualde Ainhoa, Behl Mamta, Sipes Nisha S., Hsieh Jui-Hua, Alday Aintzane, Tice Raymond R., Paules Richard S., Muriana Arantza, and Quevedo Celia. 2018. “Toxicity Profiling of Flame Retardants in Zebrafish Embryos Using a Battery of Assays for Developmental Toxicity, Neurotoxicity, Cardiotoxicity and Hepatotoxicity toward Human Relevance.” Neurotoxicology and Teratology 70:40–50. 10.1016/j.ntt.2018.10.002. [DOI] [PubMed] [Google Scholar]
- Bailey Jordan, Oliveri Anthony, and Levin Edward D.. 2013. “Zebrafish Model Systems for Developmental Neurobehavioral Toxicology.” Birth Defects Research. Part C, Embryo Today : Reviews 99 (1): 14–23. 10.1002/bdrc.21027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bal-Price Anna, Hogberg Helena T., Crofton Kevin M., Daneshian Mardas, FitzGerald Rex E., Fritsche Ellen, Heinonen Tuula, et al. 2018. “Recommendation on Test Readiness Criteria for New Approach Methods (NAM) in Toxicology: Exemplified for Developmental Neurotoxicity (DNT).” ALTEX 35 (3): 306–52. 10.14573/altex.1712081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bellinger David C. 2024. “Environmental Chemicals and Neurodevelopmental Disorders in Children.” In Textbook of Children’s Environmental Health, edited by Etzel Ruth A. and Landrigan Philip J., 0. Oxford University Press. 10.1093/oso/9780197662526.003.0049. [DOI] [Google Scholar]
- Carneiro Clarissa F. D., Moulin Thiago C., Macleod Malcolm R., and Amaral Olavo B.. 2018. “Effect Size and Statistical Power in the Rodent Fear Conditioning Literature – A Systematic Review.” PLoS ONE 13 (4): e0196258. 10.1371/journal.pone.0196258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cassar Steven, Adatto Isaac, Freeman Jennifer L., Gamse Joshua T., Iturria Iñaki, Lawrence Christian, Muriana Arantza, Peterson Randall T., Van Cruchten Steven, and Zon Leonard I.. 2020. “Use of Zebrafish in Drug Discovery Toxicology.” Chemical Research in Toxicology 33 (1): 95–118. 10.1021/acs.chemrestox.9b00335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiu Cindy N., and Prober David A.. 2013. “Regulation of Zebrafish Sleep and Arousal States: Current and Prospective Approaches.” Frontiers in Neural Circuits 7 (April):58. 10.3389/fncir.2013.00058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen Jacob. 2009. Statistical Power Analysis for the Behavioral Sciences. 2. ed., Reprint. New York, NY: Psychology Press. [Google Scholar]
- Crofton Kevin M., Mundy William R., Lein Pamela J., Bal-Price Anna, Coecke Sandra, Seiler Andrea E. M., Knaut Holger, Buzanska Leonora, and Goldberg Alan. 2011. “Developmental Neurotoxicity Testing: Recommendations for Developing Alternative Methods for the Screening and Prioritization of Chemicals.” ALTEX 28 (1): 9–15. [PubMed] [Google Scholar]
- di Domenico Kevin, Lacchetti Ines, Cafiero Giulia, Mancini Aurora, Carere Mario, and Mancini Laura. 2024. “Reviewing the Use of Zebrafish for the Detection of Neurotoxicity Induced by Chemical Mixtures through the Analysis of Behaviour.” Chemosphere 359 (July):142246. 10.1016/j.chemosphere.2024.142246. [DOI] [PubMed] [Google Scholar]
- Ellis LD, Berrue F, Morash M, Achenbach JC, Hill J, and McDougall JJ. 2018. “Comparison of Cannabinoids with Known Analgesics Using a Novel High Throughput Zebrafish Larval Model of Nociception.” Behavioural Brain Research 337 (January):151–59. 10.1016/j.bbr.2017.09.028. [DOI] [PubMed] [Google Scholar]
- de Esch Celine, Slieker Roderick, Wolterbeek André, Woutersen Ruud, and de Groot Didima. 2012. “Zebrafish as Potential Model for Developmental Neurotoxicity Testing. A Mini Review.” Neurotoxicology and Teratology 34 (6): 545–53. 10.1016/j.ntt.2012.08.006. [DOI] [PubMed] [Google Scholar]
- Fitzgerald Jennifer A., Kirla Krishna Tulasi, Zinner Carl P., and vom Berg Colette M.. 2019. “Emergence of Consistent Intra-Individual Locomotor Patterns during Zebrafish Development.” Scientific Reports 9 (September):13647. 10.1038/s41598-019-49614-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fitzgerald Jennifer A., Könemann Sarah, Krümpelmann Laura, Županič Anže, and vom Berg Colette. 2021. “Approaches to Test the Neurotoxicity of Environmental Contaminants in the Zebrafish Model: From Behavior to Molecular Mechanisms.” Environmental Toxicology and Chemistry 40 (4): 989–1006. 10.1002/etc.4951. [DOI] [PubMed] [Google Scholar]
- Fritsche Ellen, Grandjean Philippe, Crofton Kevin M., Aschner Michael, Goldberg Alan, Heinonen Tuula, Hessel Ellen V.S., et al. 2018. “Consensus Statement on the Need for Innovation, Transition and Implementation of Developmental Neurotoxicity (DNT) Testing for Regulatory Purposes.” Toxicology and Applied Pharmacology 354 (September):3–6. 10.1016/j.taap.2018.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grandjean Philippe, and Landrigan Philip J.. 2014. “Neurobehavioural Effects of Developmental Toxicity.” The Lancet. Neurology 13 (3): 330–38. 10.1016/S1474-4422(13)70278-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haigis Ann-Cathrin, Ottermanns Richard, Schiwy Andreas, Hollert Henner, and Legradi Jessica. 2022. “Getting More out of the Zebrafish Light Dark Transition Test.” Chemosphere 295 (May):133863. 10.1016/j.chemosphere.2022.133863. [DOI] [PubMed] [Google Scholar]
- Howe Kerstin, Clark Matthew D., Torroja Carlos F., Torrance James, Berthelot Camille, Muffato Matthieu, Collins John E., et al. 2013. “The Zebrafish Reference Genome Sequence and Its Relationship to the Human Genome.” Nature 496 (7446): 498–503. 10.1038/nature12111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hsieh Jui-Hua, Ryan Kristen, Sedykh Alexander, Lin Ja-An, Shapiro Andrew J, Parham Frederick, and Behl Mamta. 2019. “Application of Benchmark Concentration (BMC) Analysis on Zebrafish Data: A New Perspective for Quantifying Toxicity in Alternative Animal Models.” Toxicological Sciences 167 (1): 92–104. 10.1093/toxsci/kfy258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Irons TD, MacPhail RC, Hunter DL, and Padilla S. 2010. “Acute Neuroactive Drug Exposures Alter Locomotor Activity in Larval Zebrafish.” Neurotoxicology and Teratology, Emerging high throughput and complementary model screens for neurotoxicology, 32 (1): 84–90. 10.1016/j.ntt.2009.04.066. [DOI] [Google Scholar]
- Jarema Kimberly A., Hunter Deborah L., Hill Bridgett N., Olin Jeanene K., Britton Katy N., Waalkes Matthew R., and Padilla Stephanie. 2022. “Developmental Neurotoxicity and Behavioral Screening in Larval Zebrafish with a Comparison to Other Published Results.” Toxics 10 (5): 256. 10.3390/toxics10050256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeffers Angela, Konrad Kathryn, Larson Gary, Allen-Moyer Katherine, Cunny Helen, and Shockley Shockley. 2024. “Simulation Methodologies to Determine Statistical Power in Laboratory Animal Research Studies.” Laboratory Animals 58 (5). 10.1177/00236772241273002. [DOI] [Google Scholar]
- Kopp Renate, Legler Juliette, and Legradi Jessica. 2018. “Alterations in Locomotor Activity of Feeding Zebrafish Larvae as a Consequence of Exposure to Different Environmental Factors.” Environmental Science and Pollution Research International 25 (5): 4085–93. 10.1007/s11356-016-6704-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levin Edward D. 2011. “Zebrafish Assessment of Cognitive Improvement and Anxiolysis: Filling the Gap between in Vitro and Rodent Models for Drug Development.” Reviews in the Neurosciences 22 (1): 75–84. 10.1515/RNS.2011.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacPhail RC, Brooks J, Hunter DL, Padnos B, Irons TD, and Padilla S. 2009. “Locomotion in Larval Zebrafish: Influence of Time of Day, Lighting and Ethanol.” NeuroToxicology 30 (1): 52–58. 10.1016/j.neuro.2008.09.011. [DOI] [PubMed] [Google Scholar]
- Mansoury Morva, Hamed Maya, Karmustaji Rashid, Al Hannan Fatima, and Safrany Stephen T.. 2021. “The Edge Effect: A Global Problem. The Trouble with Culturing Cells in 96-Well Plates.” Biochemistry and Biophysics Reports 26 (July):100987. 10.1016/j.bbrep.2021.100987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Margiotta-Casaluci Luigi, Owen Stewart F., Rand-Weaver Mariann, and Winter Matthew J.. 2019. “Testing the Translational Power of the Zebrafish: An Interspecies Analysis of Responses to Cardiovascular Drugs.” Frontiers in Pharmacology 10 (August):893. 10.3389/fphar.2019.00893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishimura Yuhei, Murakami Soichiro, Ashikawa Yoshifumi, Sasagawa Shota, Umemoto Noriko, Shimada Yasuhito, and Tanaka Toshio. 2015. “Zebrafish as a Systems Toxicology Model for Developmental Neurotoxicity Testing.” Congenital Anomalies 55 (1): 1–16. 10.1111/cga.12079. [DOI] [PubMed] [Google Scholar]
- OECD. 2023. Initial Recommendations on Evaluation of Data from the Developmental Neurotoxicity (DNT) In-Vitro Testing Battery. OECD Series on Testing and Assessment. OECD. 10.1787/91964ef3-en. [DOI] [Google Scholar]
- Padilla S, Hunter DL, Padnos B, Frady S, and MacPhail RC. 2011. “Assessing Locomotor Activity in Larval Zebrafish: Influence of Extrinsic and Intrinsic Variables.” Neurotoxicology and Teratology, Special issue: Zebrafish: Current discoveries and emerging technologies for neurobehavioral toxicology and teratology, 33 (6): 624–30. 10.1016/j.ntt.2011.08.005. [DOI] [PubMed] [Google Scholar]
- Quevedo Celia, Behl Mamta, Ryan Kristen, Paules Richard S., Alday Aintzane, Muriana Arantza, and Alzualde Ainhoa. 2019. “Detection and Prioritization of Developmentally Neurotoxic and/or Neurotoxic Compounds Using Zebrafish.” Toxicological Sciences: An Official Journal of the Society of Toxicology 168 (1): 225–40. 10.1093/toxsci/kfy291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reynolds Penny S. 2023. “A Bestiary of Effect Sizes.” In A Guide to Sample Size for Animal-Based Studies, 167–80. John Wiley & Sons, Ltd. 10.1002/9781119800002.ch15. [DOI] [Google Scholar]
- Ricci Cristian, Baumgartner Jeannine, Malan Linda, and Smuts Cornelius M.. 2020. “Determining Sample Size Adequacy for Animal Model Studies in Nutrition Research: Limits and Ethical Challenges of Ordinary Power Calculation Procedures.” International Journal of Food Sciences and Nutrition 71 (2): 256–64. 10.1080/09637486.2019.1646714. [DOI] [PubMed] [Google Scholar]
- Rousseaux Colin G., Shockley Keith R., and Gad Shayne C.. 2022. “Experimental Design and Statistical Analysis for Toxicologic Pathologists.” In Haschek and Rousseaux’s Handbook of Toxicologic Pathology, 545–649. Elsevier. 10.1016/B978-0-12-821044-4.00002-9. [DOI] [Google Scholar]
- Serdar Ceyhan Ceran, Cihan Murat, Yücel Doğan, and Serdar Muhittin A.. 2020. “Sample Size, Power and Effect Size Revisited: Simplified and Practical Approaches in Pre-Clinical, Clinical and Laboratory Studies.” Biochemia Medica 31 (1): 010502. 10.11613/BM.2021.010502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smirnova Lena, Hogberg Helena T., Leist Marcel, and Hartung Thomas. 2024. “Revolutionizing Developmental Neurotoxicity Testing – A Journey from Animal Models to Advanced in Vitro Systems.” ALTEX - Alternatives to Animal Experimentation 41 (2): 152–78. 10.14573/altex.2403281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stroup Walter W. 2016. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton: CRC Press. 10.1201/b13151. [DOI] [Google Scholar]
- Venkatachalam Ananda Baskaran, Levesque Bailey, Achenbach John C., Pappas Jane J., and Ellis Lee D.. 2023. “Long and Short Duration Exposures to the Selective Serotonin Reuptake Inhibitors (SSRIs) Fluoxetine, Paroxetine and Sertraline at Environmentally Relevant Concentrations Lead to Adverse Effects on Zebrafish Behaviour and Reproduction.” Toxics 11 (2): 151. 10.3390/toxics11020151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vesterinen HM, Sena ES, Egan KJ, Hirst TC, Churolov L, Currie GL, Antonic A, Howells DW, and Macleod MR. 2014. “Meta-Analysis of Data from Animal Studies: A Practical Guide.” Journal of Neuroscience Methods 221 (January):92–102. 10.1016/j.jneumeth.2013.09.010. [DOI] [PubMed] [Google Scholar]
- Vorhees Charles V., Williams Michael T., Hawkey Andrew B., and Levin Edward D.. 2021. “Translating Neurobehavioral Toxicity Across Species From Zebrafish to Rats to Humans: Implications for Risk Assessment.” Frontiers in Toxicology 3 (February). 10.3389/ftox.2021.629229. [DOI] [Google Scholar]
- Yang Huiting, Liang Xuefang, Zhao Yanyan, Gu Xiaohong, Mao Zhigang, Zeng Qingfei, Chen Huihui, and Martyniuk Christopher J.. 2021. “Molecular and Behavioral Responses of Zebrafish Embryos/Larvae after Sertraline Exposure.” Ecotoxicology and Environmental Safety 208 (January):111700. 10.1016/j.ecoenv.2020.111700. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplement 1: Code Companion. This supplement includes links to the data and the code companion.
Supplement 2: Edge and Plate Effects. This supplement shows the three plates that had potential edge and plate effects.
Supplement 3: Endpoint Formulas. This supplement shows the formulas used to calculate the Distance Change, Distance Shift, and Movement Similarity endpoints.
Supplement 4: Mixed Model Power. This supplement describes the modeling and results for using a mixed model with a random effect for laboratory.
Supplement 5: Summary Statistics and t-test Power. This excel file contains tables with summary statistics and power calculations for a t-test to detect a 20% change at 80% power.
