Abstract
Developmental neurotoxicity (DNT) studies could benefit from revisions to study design, data analysis, and some behavioral test methods to enhance reproducibility. The Environmental Protection Agency (EPA) reviewed 69 studies submitted to the Office of Pesticide Program. Two of the behavioral tests identified the lowest observable adverse effect level (LOAEL) 20 and 13 times, respectively, while the other two tests identified the LOAEL only 3 and 4 times, respectively. The EPA review showed that the functional observational battery (FOB) was least effective at detecting the LOAEL, whereas tests of learning and memory (L&M) had methodological shortcomings. Human neurodevelopmental toxicity studies over the past 30 years show that most of the adverse effects are on higher cognitive functions such as L&M. The results of human studies together with structure-function relationships from neuroscience, suggest that tests of working memory, spatial navigation/memory, and egocentric navigation/memory should be added to guideline studies. Collectively, the above suggest that EPA and EU DNT studies would better reflect human findings and be more relevant to children by aligning L&M tests to the same domains that are affected in children, removing less useful methods (FOB), and using newer statistical models to better account for random factors of litter and litter × sex. Common issues in study design and data analyses are discussed: sample size, random group assignment, blinding, elimination of subjective rating methods, avoiding confirmation bias, more complete reporting of species, housing, test protocols, age, test order, and litter effects. Litter in DNT studies should at least be included as a random factor in ANOVA models and may benefit from inclusion of litter × sex as random factors.
Keywords: mixed linear ANOVA, litter effects, random factors, random litter × sex factor, blinding, random assignment, neurocognitive effects in children, working memory, spatial navigation/memory, Morris water maze, egocentric navigation/memory, Cincinnati water maze
1. INTRODUCTION
Developmental (DNT) and adult (ANT) neurotoxicological research has made much progress in the last 40 years. Many neurotoxic chemicals were uncovered by laboratory and human epidemiological studies. These findings resulted in regulations restricting exposure to hazardous chemicals, such as, methylmercury, polychlorinated biphenyls (PCBs), polychlorinated diphenyl ethers (PCDEs), pesticides, and other chemicals. Chemicals that may become regulated in the future are too many to enumerate here, but include manganese, trichloroethylene, diesel particles, other pesticides. Drugs were also identified that cause developmental neurotoxicity. These include, alcohol, tobacco, isotretinoin, anticonvulsants, amphetamines, marijuana, and (+)-fenfluramine, while others are suspected to be developmentally neurotoxic but not proven, such as antidepressants (Ansorge et al., 2008; Ansorge et al., 2004; Croen et al., 2011; Gentile, 2015; Healy et al., 2016; Oberlander et al., 2008; Oberlander and Vigod, 2016; Rai et al., 2013; Sprowles et al., 2016; Sprowles et al., 2017) and antipsychotics (Gentile, 2010; Singh et al., 2016). Despite these successes, the field faces challenges, including (1) how can adult neurotoxicity assessments best inform other regulatory studies to provide optimal protection to humans, and (2) how can DNT studies be improved to enhance their value in protecting children.
The U.S. Environmental Protection Agency (EPA) has a test guideline specifically for DNT. It includes outcomes for neuropathology and behavior. For behavior, the DNT guideline includes four test methods: functional observations (it can be the EPA Functional Observational Battery (FOB) (Moser, 1990; Moser, 2000) or a related method, open-field locomotor activity (motor activity), acoustic startle habituation, and learning and memory (L&M). FOB testing is to be performed on postnatal day (P) 4, 11, 21, 35, 45, and ~P60 (adults), open field on P13, 17, 21, and adults, acoustic startle on P21 and adults, and L&M on P21 and adults (Tsuji and Crofton, 2012). The guideline has been in place for 30 years being published in 1991 and re-issued with minor revisions in 1998 (Makris et al., 2009). The history and basis for the EPA DNT was reviewed elsewhere (Makris et al., 2009). Two of the tests, open field and acoustic startle habituation are non-controversial. The FOB (Graham et al., 2012; Vorhees et al., 2021) and L&M tests are the subject of concern (Bushnell, 2015; Crofton et al., 2004; Raffaele et al., 2004; Raffaele et al., 2010; Vorhees and Makris, 2015; Vorhees and Williams, 2014b). Given the issues surrounding FOB and L&M tests, what is the basis for these concerns and how might they be addressed, starting with L&M tests.
2. ISSUES AND IMPROVEMENTS OF LEARNING AND MEMORY TESTS IN DEVELOPMENTAL NEUROTOXICITY STUDIES
In 2010 the EPA reviewed 69 DNT guideline studies submitted to the Agency (Raffaele et al., 2010). The review did not address the validity of the tests but addressed which tests most often detected DNT at a dose lower or equal to other toxicity test methods. The studies were received under the auspices of the EPA Office of Pesticide Programs (OPP). In the review, the EPA determined the lowest observable adverse effect level (LOAEL) in each study and used it as the point of departure (POD) for risk assessment. The EPA concluded that DNT studies were useful for risk assessment because some tests had the lowest or equal to the lowest POD compared with data from other studies of reproductive, developmental, or general toxicity. Two behavioral tests set the POD frequently and two infrequently. Open field set the POD on 20 out the 69 studies and acoustic startle on 13 out of the 69 studies. At the other end, the FOB set the POD 3 out of 69 studies and L&M tests on 4 out of the 69 studies.
The 69 studies investigated insecticides or chemicals suspected to be neurotoxic. It stands to reason that if a test to detect DNT is used with a group of neurotoxic chemicals one would expect many adverse outcomes (or “hit” rate). If such a test shows few hits, one may ask whether it is performing appropriately. A neurotoxicity test that does a poor job of detecting neurotoxic effects will not protect children against future developmental neurotoxic agents. Based on this review, the data showed that the FOB and L&M tests did not perform well. Open-field and acoustic startle performed much better. In addition, the EPA expressed methodological concerns about the L&M tests but not about the FOB, presumably because the FOB has written protocols whereas the guideline leaves L&M methods up to the test laboratory. The result is wide variation in the types of L&M tests being used, methodological issues, and procedural problems along with a hit rate of 5.8% for POD determination suggests that either the 69 test chemicals do not affect L&M, or the L&M tests were deficient by design or sensitivity to detect effects that were there but missed. Collectively, these concerns cast doubt on L&M testing, which leads us to the question of how to improve L&M tests so they better detect neurotoxic effects. What data are there that would inform the choice of L&M tests for EPA or EU DNT guideline studies (Organization for Economic Cooperation and Development (OECD TG 426)?
2A. REASONS FOR LEARNING AND MEMORY TESTS IN DEVELOPMENTAL NEUROTOXICITY STUDIES
Given the goal of protecting children’s health, it follows that the starting point for selecting L&M tests should be existing DNT data on children. When the first EPA DNT guideline was developed there were few data on adverse neurodevelopmental effects in children, but that has changed over the last 30 years. Today, there are many studies in children that provide a rich database to draw upon to inform how DNT L&M tests can be improved. The human findings were reviewed elsewhere (Grandjean and Landrigan, 2006; Grandjean and Landrigan, 2014; Vorhees et al., 2018). From the human data, is it possible to identify animal tests that reflect the same underlying processes (domains) as are affected in children? Yes. For example, in humans there are different kinds of memory mediated by different brain regions and circuits (Buzsaki and Moser, 2013; Ferbinteanu, 2020). There are analogous domains and circuits in rodents. These relationships are well characterized in neuroscience. For instance, in humans there is working or short-term memory and reference or long-term memory. Reference memory consists of explicit and implicit memory. Explicit memory consists of episodic, spatial, and declarative memory. Implicit memory consists of egocentric/procedural and stimulus-response memory. Egocentric memory consists of path integration and route-based memory.
Rodents have homologous forms of memory. Rodents have working and reference memory. Reference memory in rodents consists of explicit and implicit memory as in humans. Rats have episodic and aspects of declarative memory (spatial and object). In people declarative memory is for people, places, things, and events. Rodents do not have memory for people obviously, but they do for things, events, and places. Therefore, if a chemical impairs memory for place in children, then this type of memory would be predicted to be affected in rodents. The converse is also likely, if place/spatial memory is impaired in rodents, it is likely to be affected in children, a testable hypothesis. Although ideally the animal data would prevent children from being exposed to an agent that adversely affects rodents.
The hippocampus and surrounding structures (subiculum, entorhinal cortex, perirhinal cortex) mediate spatial navigation and reference/recognition memory in rats, non-human primates, and people. Not only is there a correspondence between these structures across species, but there are tests in rats that were adapted for use in humans (Brown et al., 2016; Brown et al., 2014; Cornwell et al., 2008). For example, the Morris water maze (MWM) was developed for rats, but it was modified to a virtual spatial maze for people that in combination with fMRI data show homologous regions activated in both species during allocentric navigation (Brown et al., 2014). Adaptation of other tests exist for working and egocentric memory (Baumann and Mattingley, 2010; Cullen and Taube, 2017). In rats, working memory may be assessed with the radial-arm maze (RAM). In people tests of working memory include digit span, letter-number sequencing, and others. In both species, working memory activates the hippocampus and prefrontal cortex (PFC). In rats path finding can be assessed using the Cincinnati water maze (CWM) and in people using virtual city-scape maps. In both species these tasks activate the neostriatum (Botreau and Gisquet-Verrier, 2010; Braun et al., 2015; Braun et al., 2016; Braun et al., 2012; Delcasso et al., 2014; Hartley et al., 2003; Jog et al., 1999). These brain-behavior relationships are supported by many lines of evidence, including electrophysiological, brain imaging, neuropharmacological, and molecular evidence. Collectively, such data form the foundation for animal tests that better correspond to human brain functions identified in children adversely affected by developmental neurotoxic chemicals and are better aligned with the human data than are the L&M tests that the EPA has received over the last 30 years.
Compounds that are developmentally neurotoxic in children include lead (Pb), methylmercury, manganese, pesticides, PCBs, PBDEs, bisphenol A, airborne particulates, cocaine, alcohol, marijuana, methamphetamine, and nicotine and include effects on memory, attention, spatial processing, learning, executive function, IQ, and increased rates of attention deficit hyperactivity disorder (ADHD). Less often effects are found on motor coordination, anxiety, and social cognition. Given that epidemiological data show that memory is often affected in children from developmental neurotoxic agents (Grandjean and Landrigan, 2006; Grandjean and Landrigan, 2014; Vorhees et al., 2018), and given the correspondence between human findings and homologous brain structure-function tests in rats and humans, these connections form the basis for a more rational set of cognitive tests in rats that compare favorably to those affected in children. Unfortunately, so far the human epidemiological and animal approaches were not used to choose L&M tests in DNT studies. Labs that do regulatory DNT testing have yet to bring the human developmental neurotoxicity data to bear on the selection of rodent L&M tests, and we suggest this is long overdue. Rudimentary functions that are assessed by the FOB, i.e., urination, salivation, ear twitch, whisker movement, paw placement, reaching, muscle tone, foot splay, or similar basic functions, as well as surface righting, inclined plane, pivoting, swimming ontogeny, wire hanging, or other motoric developmental effects often done in rodents in DNT studies, are not domains affected in children. Rather the effects in children are on learning, memory, attention, and executive functions. Rats can perform many types of L&M tests and some executive functions, sometimes even in mazes (such as cognitive flexibility or by using schedule-controlled operant conditioning). With operant conditioning, after training, rats can be put on complex schedules of reinforcement and assessed for discrimination, delayed matching to sample, delayed non-matching to sample, differential reinforcement of low rates of responding (DRL), vigilance, signal detection, delayed discounting, and other complex schedules. Operant methods come with a cost because they require extensive training and extended testing. Nevertheless, operant methods can assess many executive functions that water mazes cannot. However, water mazes can assess some executive functions, such as cognitive flexibility in the MWM and if used, water mazes are more efficient that appetitive methods.
Our recommendation on how to improve L&M tests in DNT studies is to select methods that assess parallel functions to those known to be affected in children. Since extensive human data are now available, they should be used to maximum effect.
2B. TEST FOR LEARNING AND MEMORY IN DEVELOPMENTAL NEUROTOXCITY STUDIES
As noted, epidemiological studies find most of the effects in children from neurotoxic exposures are on working memory, spatial memory, procedural memory, attention (ADHD-like behavior), and IQ. Tests suitable for assessing these functions in rats are numerous and include for working memory tests such as the RAM (appetitive or water), spontaneous alternation, novel object recognition (configured for short-term retention), and contextual fear conditioning. For spatial learning and memory, tests include the MWM (Morris, 1984; Morris, 1981; Stewart and Morris, 1993; Vorhees and Williams, 2006), the Barnes maze (Barnes et al., 1997), some T-maze procedures (Ferbinteanu, 2016, 2020), and novel object recognition (configured for long-term memory). For procedural/egocentric learning and memory, tests include the CWM (Vorhees and Williams, 2016), Whishaw retrieval test (Whishaw, 1998), proximal cue water mazes, and cued T-mazes (Ferbinteanu, 2020).
To avoid increasing DNT costs, it is worth considering adjustments to the current EPA guideline. One possibility would be to remove tests that had few POD hits. One would be the FOB because this method had the fewest hits in the EPA review and because the functions it assesses show no parallel in affected children (Cory-Slechta et al., 2001). Moreover, the FOB tests are subjective. Other tests should be retained. Open field should be kept, but there is no clear rationale for testing it six times. The frequency of its use could be reduced to two ages. One could eliminate the P13 and P17 test ages since rats at these ages are not very active. Acoustic startle should be kept but could be enhanced by including prepulse inhibition of startle to obtain more information for the same time and effort it takes to do startle habituation. Prepulse inhibition is correlated with human neuropsychiatric disorders (Li et al., 2009) and human and rat startle can be compared across species.
For L&M the EPA DNT requires assessment in young and adult rats. Young rats can learn tests such as the MWM if the pool is small relative to platform size, but the learning curve is often shallow if rats are under P25 (Carman et al., 2002; Rudy and Paylor, 1988; Rudy et al., 1987; Schenk, 1985; Tonkiss et al., 1992; Vorhees and Williams, 2014a). Since spatial difficulty in a search task is a function of search area relative to target size, it is easy to configure a maze with a small surface to platform ratio that young rats can learn, but the question is, what strategy are they using, is it spatial, egocentric, or patterning (e.g., swimming in a zig-zag pattern until they collide with the platform)? There are data showing that when the ratio of pool to platform area is small, rats often use non-spatial strategies (Schenk, 1985). However, if rats are P50 or older, they perform the task appropriately using distal cues and spatially navigate to the platform. For adult rats, pool diameters of 183 cm, 200–213 cm (preferable) or even larger (244 cm) makes the test more sensitive to spatial impairments; pool sizes of 150 cm and smaller in rats are subject to non-spatial strategies (Schenk, 1985). Pools 122–150 cm diameter are appropriate for mice, and mice do not learn well in mazes larger than 150 cm in diameter (Schaefer et al., 2009).
Successful navigation of the CWM, when run in complete darkness to prevent use of distal cues, leaving only internal (egocentric) cues, also depends on test age. Rats younger than P40 cannot learn the task. Even if the maze is simplified to 5 or 6 Ts rather than 9–10 Ts, young rats perform poorly (Jablonski et al., 2017). However, by P50, rats have developed the cognitive capacity to find the goal after extensive searching. When run in the light, the CWM requires only about 5 days for rats to learn whereas in the dark, it takes 18–20 days, 2 trials/day in a 9-T maze (Vorhees and Williams, 2016).
In sum, the data from the EPA DNT guideline study review (Raffaele et al., 2010) show that of the four required behavioral tests, open-field and startle were effective at detecting the LOAEL in many cases and were useful for risk assessment, whereas the FOB and L&M tests had few LOAEL findings and were less useful. Of the latter two, the L&M tests were regarded as problematic and in need of improvement. We suggest the use of more relevant L&M tests could be achieved by drawing on the extensive neurodevelopmental studies done in children over the last 30 or more years. To use these data effectively, the L&M tests chosen for rodents should parallel the cognitive domains adversely affected in children, such as those that assess working memory, spatial learning/reference memory, and procedural/egocentric memory.
3. ISSUES IN STUDY DESIGN AND STATISTICS
3A. CONTROLLING LITTER EFFECTS
1. Statistical control for litter effects
Multiparous species present unique issues for maintaining subject independence. This has been discussed many times. A recent review by Golub and Sobin (2020) summarizes the background on this issue and makes recommendations to control litter effects using newer statistical models. To illustrate the importance of litter effects, Scott et al. (2008) showed that even for an outcome as simple as survival in mice, controlling for litter reduced apparent effects in their simulations based on 2579 untreated mice by 30%. The importance of litter in developmental studies has been emphasized before, yet studies continue to be published without this factor controlled [see (Forstmeier et al., 2017; Golub and Sobin, 2020; Holson and Pearce, 1992; Jimenez and Zylka, 2021; Lazic and Essioux, 2013; Vorhees and Williams, 2020)]. If the data are appropriate for parametric analyses, including litter as a random factor is the optimal approach (Golub and Sobin, 2020; Vorhees and Williams, 2020). If the data are not amenable to parametric analysis then one should treat each litter as a unit and use one value to represent the data from that litter. It could be the litter average or the average of the males and average of the females in a non-parametric ANOVA, such as the Kruskal-Wallace method.
2. Litter × sex as a random effect
A factor never considered in developmental studies is litter × sex influences. To illustrate how this can change statistical outcomes, in a recent study we had 19 litters with 10 pups per litter consisting of 5 males and 5 females with one male and one female assigned to one of 5 groups for a total of 190 pups. Three pups died, so the final dataset had 187 offspring. The pups were tested in the CWM for 10 days; hence, the RM-ANOVA was a 5-treatment × 2-sex × 10-day design. Analysis of the data with litter as a random factor and sex as a between factor resulted in significant main effects for treatment (F(4,180) = 4.01, p < 0.004), sex (F(1,180) = 5.39, p < 0.03), and day (F(9,1397) = 50.9, p < 0.0001). When litter and litter × sex were both included as random factors, the sex effect changed. The effects in this analysis were treatment (F(4,160) = 4.29, p < 0.003), sex (F(1,19.3), = 3.77, p < 0.07), and day (F(9,1397) = 51.0, p < 0.0001). The sex main effect went from p < 0.03 to p < 0.07, not a large shift but enough to influence interpretation of the data (other interactions are not shown since they remained essentially the same). There was little change in the effects of treatment or day, yet the sex main effect changed. In effect, how the variance around sex was partitioned made a difference. The influence of sex in the first analysis would likely have received attention in the interpretation of the data, but in the second analysis probably not. The second outcome is attributable to the fact that the sex of pups within litters are not independent as the model requires since both sexes are littermates. Adding litter × sex as a random factor accounted for their relatedness and reduced the variance attributed to the sex main effect. This additional random factor has not made its way into the literature to the best of our knowledge but is worth consideration as it might change the way sex effects are interpreted in developmental neurobehavioral studies. Not all behavioral outcomes may see a change by using both litter and litter × sex as random factors but exploring this approach may be informative.
3B. ADDITIONAL METHODOLOGICAL ISSUES THAT IMPACT REPRODUCIBILITY
DNT studies, like studies in other fields, benefit from proper study design and data analysis as they are crucial to experimental reproducibility. The lack of reproducibility is an issue that has plagued biomedical research in recent years (Alberts et al., 2014; Jimenez and Zylka, 2021). Improving this requires improving more than design and data analysis, it should include better descriptions of methods. This means clearer information about species, strain, and genetic background, details about housing (cage type and size, group versus single housing, bedding materials, food type and source, water (tap or filtered), room temperature and humidity, lighting cycle, age, sex, animal supplier and related conditions, and of course random assignment of animals to groups, blinding of lab personnel to treatment group, and control of litter effects. Next we consider sample size.
1. Sample size
Small sample size is a threat to reproducibility (Bishop, 2020; Button et al., 2013; Forstmeier et al., 2017; Scott et al., 2008). The sample size used in an experiment is an estimate of the population mean. The law of large numbers shows that the larger the sample, the more the sample mean approaches the population mean. Small samples run the risk of being unrepresentative and less reproducible. In the study by Scott et al. (2008) on mortality in mice, in simulations from 2579 untreated mice, the apparent effects using group sizes ranging from 4 to 50 per sex per group for random uncensored data, shows the frequency of apparent differences was >50% when sample size was 4 males and 4 females per group, 40% when sample size was 5 per sex per group, 30% with 10 per sex per group, 15% with 20 per sex per group, and 8% with 25 per sex per group. Group size effects noise/variability in the sample which affects reproducibility.
Although mortality cannot be generalized to other outcomes, the effect of sample size is a general one. Such problems can be reduced by using power calculations. Concepts and methods for power calculations are described by Cohen (1988). The method allows one to determine sample sizes in advance. The calculations are informed by experience since one needs some basis on which to estimate effect size. The p-value is chosen by the experimenter and can be as stringent as one wishes. However, in a regulatory context, power calculations are bypassed by requirements for group sizes set by the Agency’s guidelines. For example, sample sizes of 20 males and 20 females per group are generally required for developmental toxicity and DNT studies. FDA and EPA experience with such sample sizes has shown them to be reliable, but in the absence of requirements, power calculations are the best guide for sample size determination.
2. Inclusion of both sexes
In regulatory and basic research, factorial designs, with group and sex as factors are standard. However, a common practice in regulatory studies is to analyze the data for males and females separately. This practice doubles the number of statistical tests, which is undesirable but more importantly it prevents the detection of possible treatment × sex interactions. Separate analyses are also inconsistent with the design of an experiment that includes males and females. Overall, it is necessary to choose an appropriate sample size, randomly assign animals to groups, blind test personnel to group membership, control litter effects, and use statistics that match the design with sex as a factor in the model.
A typical DNT study might have 2-between factors, with one factor being treatment with 4 levels (e.g., control, low, mid, and high dose) and 2-sexes. The group factor requires that treatment groups be independent of one another and this is accomplished by the animals being unrelated to each other and random assignment of the animals to different treatment groups. Sex is a preexisting biological variable and although treated as if independent in rodent studies it obviously is not as discussed above. Factorial designs may have any number of dimensions, such as a 4 × 2 × 3 design with 4-dose groups, 2-sexes, and 3-conditions.
ANOVAs require assumptions. These include that groups are orthogonal to one another, that the data are normally or approximately normally distributed, and that error variances are equal or approximately equal between groups. ANOVAs perform well when these assumptions are met or approximated. Exactly how much deviation an ANOVA can have before being misleading is usually unknown. For a design where animals are assessed multiple times on a given test, repeated measure ANOVA (RM-ANOVA) is appropriate. RM-ANOVA has an added assumption, i.e., that the structure of the covariance matrix for time be specified. Experimenters generally do not know the shape of the covariance matrix, but it can be tested against different models to find one that best fits the data. RM designs can also be analyzed by multivariate methods (MANOVA).
In the Statistical Analysis System (SAS), there are general linear models (GLM) with Type III F-tests that provide uncorrected or adjusted F-tests. For repeated measure factors and their interactions, adjusted F-tests are generally needed to reduce inflated F-values caused by data that do not meet the covariance structure requirements. These include the Huynh-Feldt (H-F) and Greenhouse-Geiser (G-G) adjustments. GLM models are fairly robust, and they handle repeated measure factors, random factors, and accommodate both between and within subject factors. GLM does not estimate missing data points for repeated measure factors and therefore subjects with incomplete data on the repeated measure are eliminated from the analysis. To avoid this, missing values should be estimated so the dataset is complete.
Mixed linear models (e.g., SAS: Proc Mixed or Proc Glimmix) are also useful, but one limitation is that they limit how many repeated measure factors can be analyzed. Mixed models require that a covariance structure be set prior to analysis (Wolfinger, 1993). There are different covariance models from which to choose. These include variance components, compound symmetry, autoregressive, autoregressive moving average, etc. There is also an unstructured model where no covariance model is specified, and SAS uses an algorithm to fit the data to a model through a series of iterations. However, the unstructured model requires extensive computational time and often fails to find a fit, and the analysis fails to run. To determine the covariance model of best fit, one can compare several using fit statistics such as the corrected Akaike Information Criterion (AICC) and chose the one that fits best.
Mixed models test the variables using a maximum likelihood method that can provide a better test of effects of the independent variable than GLM. Given that the experimenter is testing dose levels in DNT studies, mixed models generally provide a better test for group differences. Mixed models also estimate missing data since it uses a model rather than each data point.
If a significant interaction is found using an omnibus Mixed ANOVA model, the mean-square error can be used in follow-up analyses to conduct a posteriori comparisons using slice-effect ANOVAs provided by SAS. The slice-effect ANOVA holds alpha constant while testing along one dimension at a time. For example, if one finds a group × day interaction on a learning test, one can conduct slice-effect ANOVAs on each day to see which days have significant group differences. If group is significant on some but not all days, one can further analyze for group differences within each day by pairwise comparisons. There are many post hoc tests, such as Tukey-Kramer, Sidak, False Discovery Rate, Hochberg step-down, Bonferroni, etc. that control alpha for multiple comparisons, but others such as Duncan, Tukey HSD, and LSD do not control alpha at the same p-value when there are more than 3 groups.
Not all data are appropriate for parametric analysis. Some datasets are not normally distributed and cannot be normalized even when transformed. Some datasets fail a test for homogeneity of variance. In these cases, non-parametric statistics, such as the Kruskal-Wallace ANOVA are useful, but control for litter effects is still essential. Since the Kruskal-Wallace does not have random effect capability, litter can be controlled by using one value or one male and one female value per litter.
3. Random assignment
Inferential statistics assume that subjects are independent of one another. This is accomplished by randomly assigning animals to groups. Unfortunately, many studies fail to state that randomization was done or if stated, do not state how it was done. Arbitrary assignment is not random, i.e., reaching into a cage and pulling out an animal. Researchers should use a random number generator or random number table to assign animals to groups. An alternative is to balance groups by body weight by rank ordering them by weight, then assigning them to groups in rotation in ascending or descending order. Or, if there is a key outcome in a study, animals can be tested without treatment for that outcome, and then rank ordered by the magnitude of the effect of interest and assigned to groups in rotation matched on this outcome. An example is matching rats for the acoustic startle response before treatment is administered (Geyer et al., 1978; Williams et al., 2018, 2019).
In developmental studies with multiparous species (rodents) and with treatment during development, random assignment will depend on the study design. In a whole litter design (such as a prenatal study), each dam is randomly assigned to a treatment group since the dam is given the treatment during gestation. The same would apply to treatments given during gestation and lactation, during lactation only through the mother, or the same treatment given to all pups within a litter. Data from whole litter studies should be analyzed by litter, either as litter average or pup nested within litter. To reduce variability from differences in litter size, standardizing litter number is advisable. This must also be done randomly. The experimenter assigns each pup within a litter an arbitrary number stratified by sex. Then using a random number system, animals are assigned to groups by sex. Inevitably, there will be small litters or litters with imbalanced sex ratios. One can exclude such litters or in-foster one or two pups from a litter born within 48 h of the target litter if there is a donor litter with extra pups.
There are also split-litter designs where different pups (by sex) are assigned to different treatment groups within each litter. Split-litter designs are efficient because the sample size needed is the number of litters, whereas in a whole (between) litter study the sample size is the number of dams. If a study has 4 dose groups (0, low, mid, and high) and the unit of randomization is the dam, the needed sample size is 20 dams per group or 80 litters, whereas in a split-litter design only 20 litters are needed (one male and one female per group per litter given each dose). This design has the further advantage that litter is controlled, with the exception of litter × sex. Skeptics worry that there might be differential maternal-pup or pup-pup interactions across dose groups if the drug affects pup behavior to such a degree that the dam reacts differentially to pups receiving different doses. Other concerns are that there may be drug leakage from the injection site or if the drug is eaten or gavaged littermates may be exposed to the test article through coprophagia. However, these concerns about split-litter designs are speculative at best. The likelihood of inter-pup transfer of significant amounts of a chemical or drug to blur dose levels is remote. Studies on methylenedioxymethamphetamine (MDMA) with between litter and split-litter designs showed that regardless of which design was used, spatial learning and memory were comparably impaired (c.f., (Vorhees et al., 2004; Williams et al., 2003)).
3C. ADDITIONAL METHODOLOGICAL ISSUES THAT IMPACT REPRODUCIBILITY
1. Personnel Blinding
Many published studies continue to fail to mention whether personnel were blinded to treatment group. Sometimes this is an oversight, but it should be stated, and if it was not blinded, the reason for that should be stated. One way to blind personnel is to code vials for each dose, so personnel do not know what group got which treatment. When blinding is not possible because the test agent causes visible differences, it may be impossible to hide treatment identity, but after the treatment period is over, the investigator can re-code cages anyway to reduce the likelihood that experimenters can detect which group got which treatment.
Treatments that impair growth are not infrequent in DNT studies and this can make blinding difficult. Nevertheless, it still helps in such cases to change cage cards after weaning or after treatment has stopped. In the case of transgenic animals if the mutation causes visible differences, blinding may be impossible. When this occurs, report that this was the case rather than ignoring it.
2. Correlated variables
A problem in many DNT studies with multiple behavioral tests where each test has several dependent measures, is that these variables are not independent but are correlated to one degree or another. An example is in maze tests in which errors and latency to reach the goal are both measured. These are highly correlated. Similarly, on tests of acoustic startle, peak response amplitude and average response amplitude are correlated >0.95. There is nothing wrong with analyzing all outcome variables from the same test so long as the investigators acknowledge that they are correlated. Less obvious are correlations between tasks. When a compound has effects on an organism, it is not uncommon to see patterns in the outcome variables, and these are likely to stem from intercorrelations. This applies to biochemical effects as well. In multiplex assays for cytokine expression in different brain regions, for example, one should treat brain region as a repeated measure factor since each region is from the same animal. Moreover, cytokines do not act in isolation but interact with one another. Such interactions make them non-independent of one another.
3. Controlling the number of statistical tests
Randomization, blinding, and non-interference (see below) all contribute to rigorous study design as can making tests more objective rather than observational with reliance on human judgment. Automated tests can also contribute to objectivity. However, not all tasks can be automated but in some cases objectivity can be maintained. An example are maze tests, such as the CWM (Vorhees and Williams, 2016) where errors or similar behavior can be precisely-defined. In this example, entry into a blind alley of the maze is easily defined by the boundary of where the dead-end channel begins. Once an animal crosses the imaginary line at the entrance to a cul-de-sac it has erred, and the only issue is how far into the channel it must go to be classified as an error (head, head and front legs, or whole body). This is in marked contrast to other types of manually scored behavior, such as trying to rate how tense an animal feels when it is removed from its cage on a 3 or 4-point scale. These type of ratings are vague and inter-rater agreement is challenging to establish and maintain.
Another problem occurs in tests with time limits if many animals reach the limit. When this occurs, the upper range of possible data is truncated. Tests with time limits can result in data censorship. There are statistical methods for dealing with censored data, but an alternative is to divide the data, analyzing the data clustered at the boundary separately from data on a learning test where the animals are improving across trials and/or days, since the learning curve is usually the data of interest.
4. No-assist approaches during learning tasks
A premise in preventing bias is to minimize experimenter interactions with animals during a test. Despite this goal there are sometimes interventions in studies, often with water mazes. In studies using the MWM, experimenters encounter animals that reach the time limit without finding the platform, so they lead these animals to the goal (Bekinschtein et al., 2007; Blokland et al., 2004; Burwell et al., 2004; Carman and Mactutus, 2001; Commins et al., 2003; Conrad and Roy, 1995; Deacon et al., 2002; Devan et al., 1992; Guzowski et al., 2001; Jo et al., 2007; Kubik and Fenton, 2005; Rauch et al., 1989a, b; Rudy and Paylor, 1987; Rudy and Paylor, 1988; Silva et al., 1992; Teixeira et al., 2006; Tonkiss et al., 1992; van Rijzingen et al., 1995; Wesierska et al., 1990; Zhang et al., 2008). This common practice should be avoided for several reasons. First, it is not possible to lead every animal the same way and differential leading could introduce bias between how experimental animals are guided versus controls. Not every animal will follow an object the experimenter uses to try to lead it to the goal. It is possible that animals in the experimental group require more guidance than controls. To prevent such possibilities, it is better to not guide animals. Second, leading animals to the goal is potentially counterproductive. If leading animals to the goal assists slow learners and if there are more slow learners in the experimental group, then assisting these animals reduces group differences. Reducing group differences has the effect of underestimating the true treatment effect. While the intent of intervention is to ‘help’ the rat or mouse, if it reduces group differences it undermines the test. Conversely, if a treatment makes the experimental group anxious or fearful, the object used to guide animals may be frightening rather than attract them to follow it. In that case, animals might avoid the object and take longer to reach the goal than control rats that follow the object. In the final analysis, even if the effect under investigation is significant and guidance was used, it may distort the data. Moreover, guidance is not needed. If an animal reaches the time limit without finding the goal, the experimenter should pick it up and place it on the platform or goal and let it learn on subsequent trials at its own pace with no help.
In mouse MWM experiments another intervention in the literature is poking or prodding mice that float. Prodding runs the risk that the experimental group might receive more prods than controls. This intervention is also unnecessary. If a mouse floats for the entire trial (which is unusual), it should be removed and given another trial a few hours later or the next day. Most strains of mice that are prone to floating when first placed in water, will start searching if retested later. If a few mice never search they can be excluded, and that number reported to determine if there are significant differences in the number of non-performers between groups. Training trials given before maze trials also reduce floating in mice. Trials to a visibly marked platform the day before hidden platform trials reduce floating when hidden platform trials are started a day later, apparently because it introduces mice to the parameters of the task and reduces off-task behavior.
The EPA DNT guideline endorses automated methods and as noted above automated methods have advantages when the system accurately detects the behavior of interest. Tests frequently automated are open-field activity, startle response, rotorod, and video tracking for maze learning and object exploration, etc. Automated tests reduce experimenter judgement but do not necessarily reduce error variance. For example, video tracking systems in water mazes sometimes lose track of the animal. This creates tracking artifacts that can be difficult to edit from affected files. As tracking software improves, this problem is becoming less problematic but tracking software is sensitive to light levels and angles. If lighting is not indirect it can cause bright spots that compete with the animal for tracking. Deflections of the tracker away from the animal can also be caused by the waves the animal makes as it swims. These errors introduce artifacts that experimenters must redact meticulously.
In sum, personnel should not intervene to ‘help’ animals during a test. When the time limit is reached, it is better to pick the animal up from wherever it is and place it at the goal or in a holding cage until the next trial. And automated tests are not perfect and need to be cross-checked for accuracy by experienced experimenters after the data are captured.
5. Avoiding over-interpretation of unexpected effects/interactions
Button et al. (2013) note that experimenters tend to under-appreciate that small sample sizes not only “reduce the chance of detecting a true effect”, they reduce “the likelihood that a statistically significant result reflects a true effect” [see also (Forstmeier et al., 2017)]. A related issue is that experimenters may fail to appreciate how noisy data can be when samples sizes are small, leading to the view that the treatment had no effect. When data are variable, an experimenter may run multiple small (pilot or preliminary) experiments trying to find an effect but each of these lacks adequate statistical power, this approach runs the risk of finding no effect even if there is one or finding an effect that is not reliable (Bishop, 2020). Even when a new effect is found, follow-up experiments often show smaller effect sizes than the first time. Therefore, powering the first study correctly reduces the risk of not being able to replicate a finding (Forstmeier et al., 2017). Type II errors (missing an effect) are compromised by low statistical power (1-β) which is affected by sample size.
As mentioned above, the number of statistical tests performed in a large study presents a problem since the more statistical tests that are performed the more effects one is likely to find and this effect is magnified if the dependent variables are correlated, which they often are. Controlling for family-wise error is rarely dealt with and probably results in more significant data being published than actually are significant (Forstmeier et al., 2017). For this reason, experimenters should interpret significant effects from large experiments cautiously and not assume every significant p-value represents a reliable effect.
Forstmeier et al. (2017) have identified a set of problems that lead to statistically significant effects that many not be reproducible. They are (a) small sample size, (b) novelty seeking (looking for anything significant, even among interactions that were not hypothesized), (c) multiple statistical testing, (d) hindsight bias, (e) doing experiments iteratively until a significant effect is obtained, (f) removal of outliers without reporting or justifying reasons for removing data, (g) reporting different outcomes without acknowledging likely intercorrelations, (h) over-interpretation of interactions, (i) finding what was predicted when personnel were not blind to treatment group (confirmation bias), and (j) not replicating findings before publishing a new effect. Forstmeier et al. (2017) illustrate these and other ways that experimenters make inadvertent mistakes in interpreting data and describe ways to avoid these inadvertent mistakes. In addition, there is an initiative underway to develop a toolbox for investigators to help improve the rigor of basic research studies called the Enhancing Quality of Preclinical Data (EQIPD) (Bespalov et al., 2021). Bespalov et al. (2021) describe the approach in general terms, but the online toolbox is still under development but will be accessible to anyone to provide guidance on study design and data analysis.
4. TEST ORDER
A common issue in DNT studies is when each animal is to be given multiple tests, in what order should tests be given. The question arises does the experience from preceding tests affect performance on subsequent tests, and more importantly does it affect controls differently than experimental animals and become a confound? At some level, all experiences have the potential to affect downstream performance on each successive test, but it is impractical to treat new groups and give them only one test. Therefore, multiple testing is the only practical way of doing these experiments, but how should test order be determined. In general, test order is based on proceeding according to presumed stressfulness of the test from lowest to highest. If there are transfer effects this approach is designed to mitigate transfer effects by giving the least demanding tests first. Tests of anxiety are known to be sensitive to previous testing, therefore, if a test such as elevated plus-maze, elevated zero-maze, or light-dark tests are used, they should be given first. A test with moderate stress such as a test of open-field locomotor activity should be given after tests of anxiety-like behavior. A learning test with no external reinforcement such as novel object recognition (or one of its variants, novel place recognition, novelty in place, or temporal order), might be next. These would precede tests such as acoustic and tactile startle since they involve strong stimuli. If food deprivation is used for an appetitive task, this could be given after startle since food restriction is stressful. Alternatively, one could give swimming tests next since swimming induces some transient stress but does not require the training that appetitive tests do. The test order between diet restriction/positive reinforcement tests vs. swimming tests is somewhat arbitrary. Water mazes are quicker than appetitive tasks and that may be a reason to put them ahead of food reinforced tests that require more trials and days to obtain an adequate learning curve. For example, the acquisition phase of the MWM takes 5–7 days. Even a four-phase MMW with acquisition, reversal, shift, and cued takes only 23 days. The CWM tested under standard lighting takes 5 days and under infrared light 18 days. A radial arm swimming maze can be done in 2–3 days. Some appetitive tasks are also efficient. A diet restricted, appetitive RAM with 8 arms can be done in 5–7 days. However, schedule controlled appetitive operant tasks can take months. Therefore, the length of a test is a consideration when a test battery is used. The most stressful tests are those that use foot-shock. These tests, such as active and passive avoidance and conditioned freezing should be done near the end of the study. Finally, if drug challenges are used, these should be last since drugs can have residual effects. However, with a proper washout period it is possible to do different drug challenges if they are spaced apart, so the drug is cleared before the next drug challenge.
6. CONCLUSIONS
DNT studies, especially those done for risk assessment under a regulatory guideline, would benefit from being updated. There are now enough neurodevelopmental studies in children exposed to neurotoxic agents, that these data, along with data from neuroscience, can be used to select better cognitive tests than are currently used in DNT studies. Better alignment between human outcomes and rodent tests for homologous functional domains will provide better protection for children’s brain development in the future, but only if such changes are made. EPA and OECD DNT studies need better alignment with epidemiological data in children, and this includes revisions to every aspect of these protocols to eliminate collection of low-value data. Controlling litter effects remains a persistent problem in DNT studies despite how long this issue has been known. Mixed linear ANOVAs with litter as a random factor is the best way to handle litter effects, and inclusion of litter × sex as a second random factor should be explored. Study design and interpretation can be strengthened with adequate sample size, inclusion of both sexes, random group assignment, personnel blinding, recognizing correlated variables, reducing the number of statistical tests, refraining from assisting animals that fail to find a goal, not over-interpreting unexpected effects, especially interactions, and most importantly, replicating findings to ensure reproducibility before reporting a new effect.
HIGHLIGHTS.
Developmental neurotoxicity (DNT) studies need to improve rigor and reproducibility
Regulatory DNT study guidelines need revision based on data in children.
Small N, lack of litter control, and correlated variables compromise DNT studies.
Litter and litter × sex should be random factors in ANOVA models.
Randomization, blinding, confirmation bias, and “helping” animals remain issues.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest
The authors declare no conflict of interest in what is reported in this paper.
References
- Alberts B, Kirschner MW, Tilghman S, Varmus H, 2014. Rescuing US biomedical research from its systemic flaws. Proc. Natl. Acad. Sci. U. S. A 111(16), 5773–5777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ansorge MS, Morelli E, Gingrich JA, 2008. Inhibition of serotonin but not norepinephrine transport during development produces delayed, persistent perturbations of emotional behaviors in mice. J. Neurosci 28(1), 199–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ansorge MS, Zhou M, Lira A, Hen R, Gingrich JA, 2004. Early-life blockade of the 5-HT transporter alters emotional behavior in adult mice. Science 306(5697), 879–881. [DOI] [PubMed] [Google Scholar]
- Barnes CA, Suster MS, Shen J, McNaughton BL, 1997. Multistability of cognitive maps in the hippocampus of old rats. Nature 388(6639), 272–275. [DOI] [PubMed] [Google Scholar]
- Baumann O, Mattingley JB, 2010. Medial parietal cortex encodes perceived heading direction in humans. J. Neurosci 30(39), 12897–12901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bekinschtein P, Cammarota M, Igaz LM, Bevilaqua LR, Izquierdo I, Medina JH, 2007. Persistence of long-term memory storage requires a late protein synthesis- and BDNF-dependent phase in the hippocampus. Neuron 53(2), 261–277. [DOI] [PubMed] [Google Scholar]
- Bespalov A, Bernard R, Gilis A, Gerlach B, Guillen J, Castagne V, Lefevre IA, Ducrey F, Monk L, Bongiovanni S, Altevogt B, Arroyo-Araujo M, Bikovski L, de Bruin N, Castanos-Velez E, Dityatev A, Emmerich CH, Fares R, Ferland-Beckham C, Froger-Colleaux C, Gailus-Durner V, Holter SM, Hofmann MC, Kabitzke P, Kas MJ, Kurreck C, Moser P, Pietraszek M, Popik P, Potschka H, Prado Montes de Oca E, Restivo L, Riedel G, Ritskes-Hoitinga M, Samardzic J, Schunn M, Stoger C, Voikar V, Vollert J, Wever KE, Wuyts K, MacLeod MR, Dirnagl U, Steckler T, 2021. Introduction to the EQIPD quality system. Elife 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bishop D, 2020. How scientists can stop fooling themselves over statistics. Nature 584(7819), 9. [DOI] [PubMed] [Google Scholar]
- Blokland A, Geraerts E, Been M, 2004. A detailed analysis of rats’ spatial memory in a probe trial of a Morris task. Behav. Brain Res 154(1), 71–75. [DOI] [PubMed] [Google Scholar]
- Botreau F, Gisquet-Verrier P, 2010. Re-thinking the role of the dorsal striatum in egocentric/response strategy. Front Behav. Neurosci 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braun AA, Amos-Kroohs RM, Gutierrez A, Lundgren KH, Seroogy KB, Skelton MR, Vorhees CV, Williams MT, 2015. Dopamine depletion in either the dorsomedial or dorsolateral striatum impairs egocentric Cincinnati water maze performance while sparing allocentric Morris water maze learning. Neurobiol. Learn. Mem 118, 55–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braun AA, Amos-Kroohs RM, Gutierrez A, Lundgren KH, Seroogy KB, Vorhees CV, Williams MT, 2016. 6-Hydroxydopamine-Induced Dopamine Reductions in the Nucleus Accumbens, but not the Medial Prefrontal Cortex, Impair Cincinnati Water Maze Egocentric and Morris Water Maze Allocentric Navigation in Male Sprague-Dawley Rats. Neurotox. Res 30(2), 199–212. [DOI] [PubMed] [Google Scholar]
- Braun AA, Graham DL, Schaefer TL, Vorhees CV, Williams MT, 2012. Dorsal striatal dopamine depletion impairs both allocentric and egocentric navigation in rats. Neurobiol. Learn. Mem 97(4), 402–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown TI, Carr VA, LaRocque KF, Favila SE, Gordon AM, Bowles B, Bailenson JN, Wagner AD, 2016. Prospective representation of navigational goals in the human hippocampus. Science 352(6291), 1323–1326. [DOI] [PubMed] [Google Scholar]
- Brown TI, Whiteman AS, Aselcioglu I, Stern CE, 2014. Structural differences in hippocampal and prefrontal gray matter volume support flexible context-dependent navigation ability. J. Neurosci 34(6), 2314–2320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burwell RD, Saddoris MP, Bucci DJ, Wiig KA, 2004. Corticohippocampal contributions to spatial and contextual learning. J. Neurosci 24(15), 3826–3836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bushnell PJ, 2015. Testing for cognitive function in animals in a regulatory context. Neurotoxicol. Teratol 52(Pt A), 68–77. [DOI] [PubMed] [Google Scholar]
- Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafo MR, 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci 14(5), 365–376. [DOI] [PubMed] [Google Scholar]
- Buzsaki G, Moser EI, 2013. Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nat. Neurosci 16(2), 130–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carman HM, Booze RM, Mactutus CF, 2002. Long-term retention of spatial navigation by preweanling rats. Dev. Psychobiol 40(1), 68–77. [DOI] [PubMed] [Google Scholar]
- Carman HM, Mactutus CF, 2001. Proximal versus distal cue utilization in spatial navigation: The role of visual acuity? Neurobiol. Learn. Memory 78, 332–346. [DOI] [PubMed] [Google Scholar]
- Cohen J, 1988. Statistical Power Analysis for the Behavioral Sciences, Second Edition ed. Lawrence Erlbaum Associates, Hillsdale, NJ. [Google Scholar]
- Commins S, Cunningham L, Harvey D, Walsh D, 2003. Massed but not spaced training impairs spatial memory. Behav. Brain Res 139(1–2), 215–223. [DOI] [PubMed] [Google Scholar]
- Conrad CD, Roy EJ, 1995. Dentate gyrus destruction and spatial learning impairment after corticosteroid removal in young and middle-aged rats. Hippocampus 5(1), 1–15. [DOI] [PubMed] [Google Scholar]
- Cornwell BR, Johnson LL, Holroyd T, Carver FW, Grillon C, 2008. Human hippocampal and parahippocampal theta during goal-directed spatial navigation predicts performance on a virtual Morris water maze. J. Neurosci 28(23), 5983–5990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cory-Slechta DA, Crofton KM, Foran JA, Ross JF, Sheets LP, Weiss B, Mileson B, 2001. Methods to identify and characterize developmental neurotoxicity for human health risk assessment. I: behavioral effects. Environ. Health Perspect 109Suppl 1, 79–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Croen LA, Grether JK, Yoshida CK, Odouli R, Hendrick V, 2011. Antidepressant Use During Pregnancy and Childhood Autism Spectrum Disorders. Arch. Gen. Psychiatry [DOI] [PubMed] [Google Scholar]
- Crofton KM, Makris SL, Sette WF, Mendez E, Raffaele KC, 2004. A qualitative retrospective analysis of positive control data in developmental neurotoxicity studies. Neurotoxicol. Teratol 26(3), 345–352. [DOI] [PubMed] [Google Scholar]
- Cullen KE, Taube JS, 2017. Our sense of direction: progress, controversies and challenges. Nat. Neurosci 20(11), 1465–1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deacon RM, Bannerman DM, Kirby BP, Croucher A, Rawlins JN, 2002. Effects of cytotoxic hippocampal lesions in mice on a cognitive test battery. Behav. Brain Res 133(1), 57–68. [DOI] [PubMed] [Google Scholar]
- Delcasso S, Huh N, Byeon JS, Lee J, Jung MW, Lee I, 2014. Functional relationships between the hippocampus and dorsomedial striatum in learning a visual scene-based memory task in rats. J. Neurosci 34(47), 15534–15547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Devan BD, Blank GS, Petri HL, 1992. Place navigation in the Morris water task: Effects of reduced platform interval lighting and pseudorandom plaform positioning. Psychobiology 20(2), 120–126. [Google Scholar]
- Ferbinteanu J, 2016. Contributions of Hippocampus and Striatum to Memory-Guided Behavior Depend on Past Experience. J. Neurosci 36(24), 6459–6470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferbinteanu J, 2020. The Hippocampus and Dorsolateral Striatum Integrate Distinct Types of Memories through Time and Space, Respectively. J. Neurosci 40(47), 9055–9065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forstmeier W, Wagenmakers EJ, Parker TH, 2017. Detecting and avoiding likely false-positive findings - a practical guide. Biol. Rev. Camb. Philos. Soc 92(4), 1941–1968. [DOI] [PubMed] [Google Scholar]
- Gentile S, 2010. Antipsychotic therapy during early and late pregnancy. A systematic review. Schizophr. Bull 36(3), 518–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gentile S, 2015. Prenatal antidepressant exposure and the risk of autism spectrum disorders in children. Are we looking at the fall of Gods? J. Affect. Disord 182, 132–137. [DOI] [PubMed] [Google Scholar]
- Geyer MA, Petersen LR, Rose GJ, Horwitt DD, Light RK, Adams LM, Zook JA, Hawkins RL, Mandell AJ, 1978. The effects of lysergic acid diethylamide and mescaline-derived hallucinogens on sensory-integrative function: tactile startle. J Pharmacol Exp Ther 207(3), 837–847. [PubMed] [Google Scholar]
- Golub MS, Sobin CA, 2020. Statistical modeling with litter as a random effect in mixed models to manage “intralitter likeness”. Neurotoxicol. Teratol 77, 106841. [DOI] [PubMed] [Google Scholar]
- Graham DL, Schaefer TL, Vorhees CV, 2012. Neurobehavioral testing for developmental toxicity, in: Hood RD (Ed.) Developmental and Reproductive Toxicology: A Practical Approach. Informa Press, London, pp. 346–387. [Google Scholar]
- Grandjean P, Landrigan PJ, 2006. Developmental neurotoxicity of industrial chemicals. Lancet 368(9553), 2167–2178. [DOI] [PubMed] [Google Scholar]
- Grandjean P, Landrigan PJ, 2014. Neurobehavioural effects of developmental toxicity. Lancet Neurol. 13(3), 330–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guzowski JF, Setlow B, Wagner EK, McGaugh JL, 2001. Experience-dependent gene expression in the rat hippocampus after spatial learning: A comparison of the immediate-early genes Arc, c-fos, and zif268. J. Neurosci 21(14), 5089–5098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartley T, Maguire EA, Spiers HJ, Burgess N, 2003. The well-worn route and the path less traveled: distinct neural bases of route following and wayfinding in humans. Neuron 37(5), 877–888. [DOI] [PubMed] [Google Scholar]
- Healy D, Le Noury J, Mangin D, 2016. Links between serotonin reuptake inhibition during pregnancy and neurodevelopmental delay/spectrum disorders: A systematic review of epidemiological and physiological evidence. Int. J. Risk Saf. Med 28(3), 125–141. [DOI] [PubMed] [Google Scholar]
- Holson RR, Pearce B, 1992. Principles and pitfalls in the analysis of prenatal treatment effects in multiparous species. Neurotoxicol. Teratol 14, 221–228. [DOI] [PubMed] [Google Scholar]
- Jablonski SA, Williams MT, Vorhees CV, 2017. Learning and memory effects of neonatal methamphetamine exposure in rats: Role of reactive oxygen species and age at assessment. Synapse 71(11). [DOI] [PubMed] [Google Scholar]
- Jimenez JA, Zylka MJ, 2021. Controlling litter effects to enhance rigor and reproducibility with rodent models of neurodevelopmental disorders. J. Neurodev. Disord 13(1), 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jo YS, Park EH, Kim IH, Park SK, Kim H, Kim HT, Choi JS, 2007. The medial prefrontal cortex is involved in spatial memory retrieval under partial-cue conditions. J. Neurosci 27(49), 13567–13578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jog MS, Kubota Y, Connolly CI, Hillegaart V, Graybiel AM, 1999. Building neural representations of habits. Science 286(5445), 1745–1749. [DOI] [PubMed] [Google Scholar]
- Kubik S, Fenton AA, 2005. Behavioral evidence that segregation and representation are dissociable hippocampal functions. J. Neurosci 25(40), 9205–9212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lazic SE, Essioux L, 2013. Improving basic and translational science by accounting for litter-to-litter variation in animal models. BMC. Neurosci 14, 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Du Y, Li N, Wu X, Wu Y, 2009. Top-down modulation of prepulse inhibition of the startle reflex in humans and rats. Neurosci. Biobehav. Rev 33(8), 1157–1167. [DOI] [PubMed] [Google Scholar]
- Makris SL, Raffaele K, Allen S, Bowers WJ, Hass U, Alleva E, Calamandrei G, Sheets L, Amcoff P, Delrue N, Crofton KM, 2009. A retrospective performance assessment of the developmental neurotoxicity study in support of OECD test guideline 426. Environ. Health Perspect 117(1), 17–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris R, 1984. Developments of a water-maze procedure for studying spatial learning in the rat. J. Neurosci. Methods 11, 47–60. [DOI] [PubMed] [Google Scholar]
- Morris RGM, 1981. Spatial localization does not require the presence of local cues. Learn. Motiv 12, 239–260. [Google Scholar]
- Moser VC, 1990. Approaches for assessing the validity of a functional observational battery. Neurotoxicol. Teratol 12(5), 483–488. [DOI] [PubMed] [Google Scholar]
- Moser VC, 2000. The functional observational battery in adult and developing rats. Neurotoxicology 21(6), 989–996. [PubMed] [Google Scholar]
- Oberlander TF, Bonaguro RJ, Misri S, Papsdorf M, Ross CJ, Simpson EM, 2008. Infant serotonin transporter (SLC6A4) promoter genotype is associated with adverse neonatal outcomes after prenatal exposure to serotonin reuptake inhibitor medications. Mol. Psychiatry 13(1), 65–73. [DOI] [PubMed] [Google Scholar]
- Oberlander TF, Vigod SN, 2016. Developmental Effects of Prenatal Selective Serotonin Reuptake Inhibitor Exposure in Perspective: Are We Comparing Apples to Apples? J. Am. Acad. Child Adolesc. Psychiatry 55(5), 351–352. [DOI] [PubMed] [Google Scholar]
- Raffaele K, Gilbert M, Crofton K, Sette W, 2004. Learning and memory tests in developmental neurotoxicity testing: a cross-laboratory comparison of control data. The Toxicologist 78(S-1), 276–276. [Google Scholar]
- Raffaele KC, Rowland J, May B, Makris SL, Schumacher K, Scarano LJ, 2010. The use of developmental neurotoxicity data in pesticide risk assessments. Neurotoxicol. Teratol 32(5), 563–572. [DOI] [PubMed] [Google Scholar]
- Rai D, Lee BK, Dalman C, Golding J, Lewis G, Magnusson C, 2013. Parental depression, maternal antidepressant use during pregnancy, and risk of autism spectrum disorders: population based case-control study. BMJ 346, f2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rauch TM, Welch DI, Gallego L, 1989a. Hyperthermia impairs retrieval of an overtrained spatial task in the Morris water maze. Behav. Neural Biol 52, 321–330. [DOI] [PubMed] [Google Scholar]
- Rauch TM, Welch DI, Gallego L, 1989b. Hypothermia impairs performance in the Morris water maze. Physiol. Behav 45, 315–320. [DOI] [PubMed] [Google Scholar]
- Rudy JW, Paylor R, 1987. Development of interocular equivalence of place learning in the rat requires convergence sites established prior to training. Behav. Neurosci 101(5), 732–734. [DOI] [PubMed] [Google Scholar]
- Rudy JW, Paylor R, 1988. Reducing the temporal demands of the Morris place-learning task fails to ameliorate the place-learning impairment of preweanling rats. Psychobiology 16(2), 152–156. [Google Scholar]
- Rudy JW, Stadler-Morris S, Albert P, 1987. Ontogeny of spatial navigation behaviors in the rat: Dissociation of “proximal” and “distal”-cue-based behaviors. Behav. Neurosci 101(1), 62–73. [DOI] [PubMed] [Google Scholar]
- Schaefer TL, Vorhees CV, Williams MT, 2009. Mouse plasmacytoma-expressed transcript 1 knock out induced 5-HT disruption results in a lack of cognitive deficits and an anxiety phenotype complicated by hypoactivity and defensiveness. Neuroscience 164(4), 1431–1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schenk F, 1985. Development of place navigation in rats from weaning to puberty. Behav. Neural Biol 43, 69–85. [DOI] [PubMed] [Google Scholar]
- Scott S, Kranz JE, Cole J, Lincecum JM, Thompson K, Kelly N, Bostrom A, Theodoss J, Al-Nakhala BM, Vieira FG, Ramasubbu J, Heywood JA, 2008. Design, power, and interpretation of studies in the standard murine model of ALS. Amyotroph. Lateral. Scler 9(1), 4–15. [DOI] [PubMed] [Google Scholar]
- Silva AJ, Paylor R, Wehner JM, Tonegawa S, 1992. Impaired spatial learning in a-calcium-calmodulin kinase II mutant mice. Science 257(July 10), 206–211. [DOI] [PubMed] [Google Scholar]
- Singh KP, Singh MK, Singh M, 2016. Effects of prenatal exposure to antipsychotic risperidone on developmental neurotoxicity, apoptotic neurodegeneration and neurobehavioral sequelae in rat offspring. Int. J. Dev. Neurosci 52, 13–23. [DOI] [PubMed] [Google Scholar]
- Sprowles JL, Hufgard JR, Gutierrez A, Bailey RA, Jablonski SA, Williams MT, Vorhees CV, 2016. Perinatal exposure to the selective serotonin reuptake inhibitor citalopram alters spatial learning and memory, anxiety, depression, and startle in Sprague-Dawley rats. Int. J. Dev. Neurosci 54, 39–52. [DOI] [PubMed] [Google Scholar]
- Sprowles JLN, Hufgard JR, Gutierrez A, Bailey RA, Jablonski SA, Williams MT, Vorhees CV, 2017. Differential effects of perinatal exposure to antidepressants on learning and memory, acoustic startle, anxiety, and open-field activity in Sprague-Dawley rats. Int. J. Dev. Neurosci 61, 92–111. [DOI] [PubMed] [Google Scholar]
- Stewart CA, Morris RGM, 1993. The watermaze, in: Sahgal A (Ed.) Behavioural Neuroscience, Volume I, A Practical Approach. IRL Press at Oxford University Press, Oxford, pp. 107–122. [Google Scholar]
- Teixeira CM, Pomedli SR, Maei HR, Kee N, Frankland PW, 2006. Involvement of the anterior cingulate cortex in the expression of remote spatial memory. J. Neurosci 26(29), 7555–7564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tonkiss J, Shultz P, Galler JR, 1992. Long-Evans and Sprague-Dawley rats differ in their spatial navigation performance during ontogeny and at maturity. Dev. Psychobiol 25(8), 567–579. [DOI] [PubMed] [Google Scholar]
- Tsuji R, Crofton KM, 2012. Developmental neurotoxicity guideline study: issues with methodology, evaluation and regulation. Congenit Anom (Kyoto) 52(3), 122–128. [DOI] [PubMed] [Google Scholar]
- van Rijzingen IMS, Gispen WH, Spruijt BM, 1995. Olfactory bulbectomy temporarily impairs Morrs maze performance: An ACTH (4–9) analog accelerates return of function. Physiol. Behav 58(1), 147–152. [DOI] [PubMed] [Google Scholar]
- Vorhees CV, Makris SL, 2015. Assessment of learning, memory, and attention in developmental neurotoxicity regulatory studies: synthesis, commentary, and recommendations. Neurotoxicol. Teratol 52(Pt A), 109–115. [DOI] [PubMed] [Google Scholar]
- Vorhees CV, Reed TM, Skelton MR, Williams MT, 2004. Exposure to 3,4-methylenedioxymethamphetamine (MDMA) on postnatal days 11–20 induces reference but not working memory deficits in the Morris water maze in rats: implications of prior learning. Int. J. Dev. Neurosci 22(5/6), 247–259. [DOI] [PubMed] [Google Scholar]
- Vorhees CV, Sprowles JN, Regan SL, Williams MT, 2018. A better approach to in vivo developmental neurotoxicity assessment: Alignment of rodent testing with effects seen in children after neurotoxic exposures. Toxicol. Appl. Pharmacol 354, 176–190. [DOI] [PubMed] [Google Scholar]
- Vorhees CV, Williams MT, 2006. Morris water maze: procedures for assessing spatial and related forms of learning and memory. Nat. Protocols 1(3), 848–858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vorhees CV, Williams MT, 2014a. Assessing spatial learning and memory in rodents. ILAR. J 55(2), 310–332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vorhees CV, Williams MT, 2014b. Value of water mazes for assessing spatial and egocentric learning and memory in rodent basic research and regulatory studies. Neurotoxicol. Teratol 45, 75–90. [DOI] [PubMed] [Google Scholar]
- Vorhees CV, Williams MT, 2016. Cincinnati water maze: A review of the development, methods, and evidence as a test of egocentric learning and memory. Neurotoxicol. Teratol 57, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vorhees CV, Williams MT, 2020. Litter effects: Comments on Golub and Sobin’s “Statistical modeling of litter as a random effect in mixed models to manage “intralitter likeness”“. Neurotoxicol. Teratol 77, 106852. [DOI] [PubMed] [Google Scholar]
- Vorhees CV, Williams MT, Hawkey AB, Levin ED, 2021. Translating neurobehavioral toxicity across species from zebrafish to rats to humans: Implications for risk assessment. Front. Toxicol in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wesierska M, Macias-Gonzalez R, Bures J, 1990. Differential effect of ketamine on the reference and working memory versions of the Morris water maze task. Behav. Neurosci 104(1), 74–83. [DOI] [PubMed] [Google Scholar]
- Whishaw IQ, 1998. Place learning in hippocampal rats and the path integration hypothesis. Neurosci. Biobehav. Res 22(2), 209–220. [DOI] [PubMed] [Google Scholar]
- Williams MT, Gutierrez A, Vorhees CV, 2018. Effects of Acute Exposure of Permethrin in Adult and Developing Sprague-Dawley Rats on Acoustic Startle Response and Brain and Plasma Concentrations. Toxicol. Sci 165(2), 361–371. [DOI] [PubMed] [Google Scholar]
- Williams MT, Gutierrez A, Vorhees CV, 2019. Effects of Acute Deltamethrin Exposure in Adult and Developing Sprague Dawley Rats on Acoustic Startle Response in Relation to Deltamethrin Brain and Plasma Concentrations. Toxicol. Sci 168(1), 61–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams MT, Morford LL, Wood SL, Rock SL, McCrea AE, Fukumura M, Wallace TL, Broening HW, Moran MS, Vorhees CV, 2003. Developmental 3,4-methylenedioxymethamphetamine (MDMA) impairs sequential and spatial but not cued learning independent of growth, litter effects or injection stress. Brain Res 968(1), 89–101. [DOI] [PubMed] [Google Scholar]
- Wolfinger R, 1993. Covariance structure selection in general mixed models. Communications in Statistics - Simulation and Computation 22(4), 1079–1106. [Google Scholar]
- Zhang M, Moon C, Chan GC, Yang L, Zheng F, Conti AC, Muglia L, Muglia LJ, Storm DR, Wang H, 2008. Ca-stimulated type 8 adenylyl cyclase is required for rapid acquisition of novel spatial information and for working/episodic-like memory. J. Neurosci 28(18), 4736–4744. [DOI] [PMC free article] [PubMed] [Google Scholar]
