Abstract
Regulatory studies of developmental and reproductive toxicity (DART) studies have remained largely unchanged for decades with exposures occurring at various phases of the reproductive cycle and toxicity evaluations at different ages/times depending on the study purpose. The NTP has conducted studies examining the power to detect adverse effects where there is a pre-natal exposure, but evaluations occur postnatally. In these studies, examination is required of only one male and female pup from each litter beyond weaning. This provides poor resolving power to detect rare events (eg reproductive tract malformations). If an adverse effect is detected, there is little confidence in the shape of the dose response curve (and the BMD or NOAEL). We have developed a new protocol to evaluate DART, the modified one generation study, with exposure commencing with pregnant animals and retention of 4 males and females from each litter beyond weaning to improve statistical power. These animals can be allocated to specific cohorts that examine sub-chronic toxicity, teratology, littering and neurobehavioral toxicity in the same study. This approach also results in a reduction in animal numbers used, compared with individual stand- alone studies, and offers increased numbers of end points evaluated compared with recent OECD proposals.
Keywords: Female reproduction, Male reproduction, Reproductive system, Safety assessment, Teratology, Toxicity
Introduction
The classical studies employed to evaluate developmental and reproductive toxicology (DART) to generate important information for human risk assessment have remained largely unchanged for several decades. The exposure period during the reproductive cycle (Figure 1) and when and the type of end points evaluated are then dependent on the study objectives. The most common approach for pharmaceutical agents is to have a segmented type of design in which the exposure periods are divided within the reproductive cycle (Collins et al., 1999) to reflect the precise indication of the drug under development. However, for most environmental agents, it is more common to have exposure throughout the reproductive cycle especially as the precise exposure paradigm that may occur in humans is not known (i.e. human exposure is unintentional). Thus, the study most often used to evaluate reproductive (and postnatal developmental) toxicity for these types of agents is the multigeneration reproduction study (USEPA, 1998) (Figures 2 and 3), where reproductive performance is evaluated over two breeding generations, typically in the rat. This is the only standard design where exposure to a test article occurs throughout the reproductive cycle (see Figure 2). The major objective of the study has been focused on fertility and fecundity of both parents and their offspring. Since the overall intent is to evaluate the ability of animals to mate and produce viable offspring, the study produces large numbers of animals for evaluation. Indeed, it is the largest and one of the most logistically complex regulatory-type studies routinely conducted to produce data for human health risk assessment.
Figure 1.
Diagrammatic representation of the Mammalian Reproductive Cycle. From (Foster and Gray, 2013)
Figure 2.
Diagrammatic representation of a typical multigeneration study with reference to the mammalian reproductive cycle. Dosing is continuous throughout the cycle with assessments made at multiple life stages. From (Foster and Gray, 2013)
Figure 3.
Diagrammatic representation of the current EPA test guideline for fertility and reproductive effects (OPPTS 870.3800). From (Foster and Gray, 2013)
Key: Q = quarantine; PBE = pre-breed exposure; ECE = estrous cycle evaluation; M = mating; G = gestation; L = lactation; W = weaning; N = necropsy; C = cull; VO = vaginal opening; PPS = preputial separation; PND = postnatal day; AGD = Anogenital distance.
As our knowledge of critical windows of exposure has expanded, particularly with the increased focus on agents that may have endocrine-like activity, the last 20 years has shown the need for a study where there has been a larger focus on the evaluation of potential postnatal adverse outcomes and particularly where there may have been early life exposure. Thus, there have been updates to standard designs to incorporate more functional end points (e.g., sperm and oocyte analysis, vaginal cytology, indices of puberty and sexual differentiation) to improve the detection of agents affecting reproduction and the endocrine status of animals. In particular, in current study designs, the ability to evaluate (i.e. detection, analysis of dose response and confidence in establishing a Benchmark Dose(BMD) or NOAEL) abnormalities of the reproductive tract (e.g. malformations of the epididymis, seminal vesicles, prostate and external genitalia) routinely following in utero exposure to agents with endocrine activity was determined to be underpowered by several research groups (Blystone et al., 2010, Hotchkiss et al., 2008, McIntyre et al., 2002). For example, in an evaluation of prenatal developmental toxicity, every fetus is examined for potential abnormalities (typically ~ 250 fetuses per group) whereas in the multigeneration study (and most other regulatory studies that involve breeding and following the offspring as they mature), only one male and female pup per litter from a minimum of 20 litters is examined at adulthood for adverse pathological events (i.e., only 40 of the potential 250 animals/group produced). Some of the National Toxicology Program’s (NTP) own studies have shown the added value and increased statistical power of evaluating more offspring per litter by retaining them to adulthood (see Figure 4), rather than discarding animals already produced, or performing only a gross examination at weaning, when the reproductive organs are not fully differentiated or developed (Blystone et al., 2010). The NTP has already adopted the improved use of these animals, in its multigeneration study design, by carrying more animals through to adulthood for examination, rather than removing them. A major role of the NTP has been in the development of new test methods and following a workshop on the evaluation of tumors of the endocrine system (Thayer and Foster, 2007), the NTP adopted a new default paradigm for its rat cancer bioassays that incorporates exposure during the perinatal period (i.e., gestation and lactation). The NTP has conducted a number of these “perinatal bioassays” in the past, but this would normally require a specific, scientific justification. Our current paradigm is that a perinatal study will be conducted, unless there is a specific, scientific reason why this should not occur.
Figure 4.
Power curves for the detection of rare adverse reproductive malformations from a DART study. The study is assumed to have 20 litters/group and a 0% background incidence. The vertical dotted line indicates a 10% incidence of an adverse outcome, which would be detected 4.7% with 1 pup/litter 66.4% with 3 pups per litter and 86.5 % with 4 pups/litter. Adapted from (Blystone et al., 2010)
Before embarking on such a study, it would be customary to undertake a preliminary study that evaluated target organ toxicity (for a conventional cancer study, this would be the 90-day toxicity study) and enable suitable dose levels to be selected for the cancer bioassay. Thus, for a long- term study involving exposure during pregnancy and lactation, a shorter duration study that involved exposure during these critical developmental windows would be required. The NTP realized that in performing the necessary setting of dose levels and identification of target organ toxicity in order to undertake a perinatal cancer bioassay, it was possible at the same time to use animals already produced following exposure during gestation and lactation to develop additional, high quality DART information in a single design which we have termed the modified one generation study (MOG)
The Modified One Generation reproduction study
The MOG design (see Fig 5) employs pregnant animals with dosing commencing at implantation (gestation day [GD] 6 in the rat) and continually exposes the dams throughout gestation and lactation. At weaning, the offspring would be continued to be administered the test article at the same dose level as their respective dam and are subsequently assigned to a number of different testing cohorts that can be considered as interchangeable “cassettes” that can be included, or not, based on the study objectives, or other available information. These cassettes are essentially protocols used on other standard studies and would normally include:
Figure 5.
Diagrammatic representation of the NTP modified one generation reproduction study.
Only 10 pups per sex (on reaching adulthood) are required for the subchronic cohort and thus sufficient numbers of animals would be available for evaluations of other developmental toxicity that may include effects on the developing immune or nervous systems (see Figure 6).
Key: G = gestation; L= lacation; PND = postnatal day; GD = gestation Day; M = mating; VO = vaginal opening; PPS = belanopreputial separation; EC = estrous cyclicity evaluation.
An evaluation of target organ toxicity, pathology, clinical pathology etc., similar to a current 90-day toxicity protocol – a subchronic toxicity cohort. This would normally require 10 animals to be evaluated per sex, per dose group obtained from different litters.
An evaluation of prenatal developmental toxicity – a teratology cohort. One male and female offspring from each litter would be selected and non-sibling matings would be performed in each group on reaching sexual maturity (~Post Natal Day (PND) 110). Just prior to expected delivery, a laparotomy would be performed on the pregnant dams for a standard evaluation of external, visceral and skeletal abnormalities of the fetuses.
An evaluation of breeding performance – a littering cohort. One male and female offspring from each litter would be selected and non-sibling matings would be performed in each group on reaching sexual maturity (~PND 110). The pregnant dams would be allowed to deliver their litters and raise them to weaning.
On a specific case basis, other cassettes could be added, or substituted, in the protocol, including an assessment of developmental neurotoxicity (see Fig 6) and/or developmental immunotoxicity, using protocols similar to those used for other regulatory submissions.
Figure 6.
Diagrammatic representation of the NTP modified one generation reproduction study with a developmental neurotoxicity cohort.
The study would normally be conducted in a rat strain with robust reproductive performance (the NTP uses the Sprague Dawley) and would commence with a sufficient number of time-mated animals to ensure a minimum of 20 litters per dose group, with normally at least 3 dose levels plus a vehicle control. The normal route of exposure employed on such studies would be oral (e.g., dosed feed, drinking water, or gavage) and treatment would be continuous throughout the study (for gavage, direct dosing of pups may be required at least from PND 12, or as appropriate from toxicokinetic [TK] information). If no other toxicity information is available, a pilot study with a small number of pregnant dams would be required to set dose levels and potentially acquire preliminary TK data in pregnancy and lactation.
This design emphasizes a full evaluation of the F1 animals in the study. This represents a unique exposure group compared to other toxicity studies (i.e., exposure from implantation until adulthood). The design uses significantly fewer animals than a Reproductive Assessment by Continuous Breeding multigeneration reproduction study that NTP has routinely employed and generates important information on both reproduction and postnatal development, together with a pathological evaluation of all the offspring (after PND 4, when litters are standardized to eight pups, 4males and females) when they reach adulthood to improve our ability to detect and determine dose-response relationships for postnatal effects. A major addition will be the information achieved on prenatal developmental toxicity within the study. The teratology and littering cohorts of animals will also allow the evaluation of fertility and fecundity in two cohorts of animals and importantly, we can maintain the relationship between structural changes in the reproductive organs and any functional outcomes that may occur in the same animals. Table 1 provides an assessment of the approximate number of animals generated on the MOG in comparison with the number required if one was required to conduct 90-day toxicity, teratology, reproduction and developmental neurotoxicity “stand-alone” studies.
Table 1.
Estimation of Animals employed on standard, stand alone, NTP toxicity studies in comparison to the Modified One generations study (MOG)
| Study Type | #of Groups | Animals/Group | Total at Study Start | Estimated Total Generated |
|---|---|---|---|---|
| 90-day sub-chronic | 6 | 20 | 120 | 120 |
| Multigeneration Reproduction | 40 | 160 | 2400 | |
| Pre-natal dev. toxicity | 4 | 20 | 80 | 1200 |
| Developmental Neurotoxicity | 4 | 20 | 80 | 1200 |
| Combined | 440 | 4920 | ||
| MOG | 4 | 20 | 80 | 3400 |
Approaches in the case of significant developmental toxicity
The basis for the conduct of the MOG is to have sufficient numbers of pregnant dams producing litters to populate the subsequent cohorts of animals. If, however, the test article under study has significant developmental toxicity, this would preclude testing as there would be insufficient numbers of pups from which to select the F1 cohorts to have an adequately powered study. We have explored several options to proceed with the MOG if this is the case. The first option would be to modify the dose level during the critical developmental windows of pregnancy and lactation, for example to decrease the dose level in this period to prevent early pup death and then raise the dose level after weaning to sufficiently challenge post-weaning animals. This approach was successfully employed in the perinatal bioassay of perfluoro-octonoate by NTP that is currently undergoing review. The second option would be to modify the exposure paradigm. In this approach, exposure of pregnant dams could be commenced later in gestation to overcome early fetal loss due to a potent developmental toxicant and yet still have sufficient animals available to continue postnatal evaluations. NTP has applied such an approach in the study of hydroxyurea, the only FDA approved medication for sickle cell disease in adults that also appears to have efficacy in infants and children (Wang, 2016), but for which information on potential long term consequences would be helpful to clinicians to evaluate the benefits of treatment in comparison to risk. The last option always available is to undertake separate studies. While not a preferred option, and probably a rare occurrence, the choice to use separate studies would likely depend on the outcome of any preliminary dose range finding study. In particular, marked treatment related loss of offspring (fetuses, pups or both) that would preclude, or make extremely difficult employing significantly lower dose levels in the MOG study. It also is possible that a developmental toxicant may be so potent, that neither a reduction in dose level, nor an adjustment of the developmental exposure window would allow sufficient numbers of offspring to be produced and would thus compromise the overall objectives of the MOG study and thus a separate study of developmental toxicity would be warranted.
Other international efforts to modify DART study designs
In parallel with some of these NTP efforts, other initiatives had been taken internationally to refine reproductive toxicity testing for agrochemicals (Cooper et al., 2006) for which a large database of toxicity information in animals is usually available. The study design published by Cooper et al had been proposed as a replacement for the EPA/OECD multigeneration reproduction study and in addition explored the incorporation of some additional end points (particularly developmental neurotoxicity and immunotoxicity) that would normally be conducted in separate, triggered assays. In Europe, the advent of REACH (Registration, Evaluation, Authorization and restriction of CHemical substances) was likely to require increased toxicity testing of chemicals at the same time that other efforts were seeking to reduce experimental animal usage. Significant attention was therefore focused on those assays that required (or produced) the largest number of experimental animals – the multigeneration reproduction study and the prenatal developmental toxicity study (usually conducted in two species). The OECD took up the challenge of finding a study that would provide adequate information for the evaluation of reproductive toxicity, but would also reduce animal numbers and has adopted a study (the extended one-generation reproduction study, based on the Cooper et al design) as a formal guideline (see figure 7;(OECD., 2012). The design is proposed to be used for all chemicals (not just agrochemicals), where a much reduced toxicity database is likely to be available. Some of the major shortfalls were noted in a previous commentary on this approach (Foster, 2014) in comparison to the MOG. These included the shortening of the pre-breed period for parental animals to 2 weeks (from the standard 10 weeks on the multigeneration study). This can be problematic where sub-chronic (90 day) information was unlikely to be available for some industrial and environmental chemicals (unlike agrochemicals). Thus, a short pre-breed exposure would not expose the parental males to all stages of spermatogenesis and allow the germ cells to be available for fertilization in a single breeding trial.
Figure 7.
Diagrammatic representation of the OECD 443 Extended One Generation Reproduction Study (OECD, 2012).
Key: P = parental generation; M = males; F = females. F1 = first filial generation
The OECD protocol also employed internal triggers in the design for both end points selected, and in the use of specific groups of animals. For example, the breeding of the F1 animals (to produce F2 pups) was one of the major triggered procedures within the protocol. In their advice to users of the protocol, OECD suggested that implementation of the trigger to breed the F1 animals would be undertaken only if any functional change (in fertility or fecundity) was noted in the absence of changes in parental gonadal histopathology. This could prove challenging to conduct in a routine fashion, since it is likely that there would be a study stagger (that is, not all animals in a specific group would be cohabited at the same time and not all parental animals would achieve a pregnancy at the same time). Parental gonadal histology could not take place until after all the F1 animals had weaned and then tissues would need to be removed at necropsy, processed and slides prepared, evaluated by a trained pathologist, data collected, statistically analyzed and then any findings relayed to a sponsor or a regulatory agency before the F1 animals would be bred. The proposed cohorts for evaluation of developmental immunotoxicity and developmental neurotoxicity were also underpowered (only 10 per sex per group) and for neurotoxicity evaluation did not include any evaluation of learning and memory. Moreover these specific cohorts were also deemed optional.
A second major trigger for breeding the F1 animals was if a dose-related effect on a developmental landmark(s) (e.g. indices of puberty) were noted in the absence of any body weight mediated effects. This could also be a trigger that has difficulties in implementation where there is the observation of acceleration in a developmental index. So for example, a classic response to an estrogen is the advancement of vaginal opening (female puberty in rats) in the F1 offspring. These animals are in a rapid growth phase as they approach and attain puberty and so an early occurrence (of say 4 to 5 days) will inevitably occur in younger animals that will weigh less than the corresponding controls. The question then becomes was the effect noted due to toxicity producing a weight loss, or that the animals were younger in attaining the end point? In its recent guidance on end points, the European Chemicals Agency (ECHA) has requested that certain changes be made to OECD 443 to make it acceptable (ECHA, 2015) for use in the EU for providing information on DART end points. This has included the maintenance of the 10 week pre-breed exposure period (unless a scientific justification can be made)and removing triggers where possible (i.e. breeding the F1 animals). The changes proposed by ECHA meet many of the issues raised in the previous commentary (Foster, 2014) and significantly improve the quality of the information generated on the study, but these requests by ECHA really move the design back closer to the previous multigeneration study, but the weaknesses in the evaluation of developmental neurotoxicity already noted within the protocol remain. It would be unlikely that any investigator would perform 2 separate studies – one for Europe and another for the rest of the OECD group. Doing so, would defeat the overall objective of reducing the number of experimental animals employed.
Concluding remarks
The NTP MOG study design is a robust evaluation of reproductive, pre- and post-natal developmental toxicity. The design maximizes the utility of the animals already produced and available for study in generating high quality DART information. The design reduces the overall number of animals employed compared to combination of “stand-alone” toxicity studies as well as facilitating NTP’s requirement for information on sub-chronic target organ toxicity and dose setting before embarking on a rat perinatal carcinogenesis study.
By using this study design it is possible to generate high quality information on DART end points for use in risk assessment while at the same time being aware of the needs for implementation of the 3 R’s in our future testing requirements. That is, we will be able to:
Refine our toxicity study designs.
Replace certain other standard toxicity studies by folding them into this protocol.
Reduce overall animal use.
Acknowledgments
This work was supported by the NIH, National Institute of Environmental health Sciences.
References
- Blystone CR, Kissling GE, Bishop JB, Chapin RE, Wolfe GW, Foster PM. Determination of the di-(2-ethylhexyl) phthalate NOAEL for reproductive development in the rat: importance of the retention of extra animals to adulthood. Toxicol Sci. 2010;116:640–646. doi: 10.1093/toxsci/kfq147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins TF, Sprando RL, Shackelford ME, Hansen DK, Welsh JJ. Food and Drug Administration proposed testing guidelines for reproduction studies. Revision Committee. FDA Guidelines for Developmental Toxicity and Reproduction, Food and Drug Administration. Regul Toxicol Pharmacol. 1999;30:29–38. doi: 10.1006/rtph.1999.1306. [DOI] [PubMed] [Google Scholar]
- Cooper RL, Lamb JC, Barlow SM, Bentley K, Brady AM, Doerrer NG, Eisenbrandt DL, Fenner-Crisp PA, Hines RN, Irvine LF, Kimmel CA, Koeter H, Li AA, Makris SL, Sheets LP, Speijers G, Whitby KE. A tiered approach to life stages testing for agricultural chemical safety assessment. Crit Rev Toxicol. 2006;36:69–98. doi: 10.1080/10408440500541367. [DOI] [PubMed] [Google Scholar]
- ECHA. EOGRTS Study Design. European Commission; 2015. Guidance on Information Requirements and Chemical Safety Assessment. Chapter R.7.a: Endpoint specific guidance, Appendix R.7.6. [Google Scholar]
- Foster PM. Regulatory Forum opinion piece: New testing paradigms for reproductive and developmental toxicity--the NTP modified one generation study and OECD 443. Toxicol Pathol. 2014;42:1165–1167. doi: 10.1177/0192623314534920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foster PMD, Gray LE. Casarett & Doull’s Toxicology: The Basic Science of Poisons. 8. McGraw-Hill Education; 2013. Toxic Responses of the Reproductive System. [Google Scholar]
- Hotchkiss AK, Rider CV, Blystone CR, Wilson VS, Hartig PC, Ankley GT, Foster PM, Gray CL, Gray LE. Fifteen years after “Wingspread”--environmental endocrine disrupters and human and wildlife health: where we are today and where we need to go. Toxicol Sci. 2008;105:235–259. doi: 10.1093/toxsci/kfn030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McIntyre BS, Barlow NJ, Foster PMD. Male rats exposed to linuron in utero exhibit permanent changes in anogenital distance, nipple retention, and epididymal malformations that result in subsequent testicular atrophy. Toxicol Sci. 2002;65:62–70. doi: 10.1093/toxsci/65.1.62. [DOI] [PubMed] [Google Scholar]
- OECD. Test No. 443: Extended One-Generation Reproductive Toxicity Study. OECD Publishing; 2012. [Google Scholar]
- Thayer KA, Foster PM. Workgroup report: National Toxicology Program workshop on Hormonally Induced Reproductive Tumors - Relevance of Rodent Bioassays. Environ Health Perspect. 2007;115:1351–1356. doi: 10.1289/ehp.10135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- USEPA. Reproduction and Fertility Effects. 1998. Health Effects Test Guidelines OPPTS 870.3800. [Google Scholar]
- Wang WC. Minireview: Prognostic factors and the response to hydroxurea treatment in sickle cell disease. Exp Biol Med (Maywood) 2016;241:730–736. doi: 10.1177/1535370216642048. [DOI] [PMC free article] [PubMed] [Google Scholar]







