Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2023 Feb 14;13:2605. doi: 10.1038/s41598-023-29623-8

Robustness of a multivariate composite score when evaluating distress of animal models for gastrointestinal diseases

Steven R Talbot 1,, Simone Kumstel 2, Benjamin Schulz 2, Guanglin Tang 2, Ahmed Abdelrahman 2, Nico Seume 2, Edgar H U Wendt 2, Johanna Eichberg 2, Christine Häger 1, André Bleich 1, Brigitte Vollmar 2, Dietmar Zechner 2
PMCID: PMC9929045  PMID: 36788346

Abstract

The fundament of an evidence-based severity assessment in laboratory animal science is reliable distress parameters. Many readouts are used to evaluate and determine animal distress and the severity of experimental procedures. Therefore, we analyzed four distinct parameters like the body weight, burrowing behavior, nesting, and distress score in the four gastrointestinal animal models (pancreatic ductal adenocarcinoma (PDA), pancreatitis, CCl4 intoxication, and bile duct ligation (BDL)). Further, we determined the parameters’ robustness in various experimental subgroups due to slight variations like drug treatment or telemeter implantations. We used non-parametric bootstrapping to get robust estimates and 95% confidence intervals for the experimental groups. It was found that the performance of the readout parameters is model-dependent and that the distress score is prone to experimental variation. On the other hand, we also found that burrowing and nesting can be more robust than, e.g., the body weight when evaluating PDA. However, the body weight still was highly robust in BDL, pancreatitis, and CCl4 intoxication. To address the complex nature of the multi-dimensional severity space, we used the Relative Severity Assessment (RELSA) procedure to combine multiple distress parameters into a score and mapped the subgroups and models against a defined reference set obtained by telemeter implantation. This approach allowed us to compare the severity of individual animals in the experimental subgroups using the maximum achieved severity (RELSAmax). With this, the following order of severity was found for the animal models: CCl4 < PDA ≈ Pancreatitis < BDL. Furthermore, the robustness of the RELSA procedure and outcome was externally validated with a reference set from another laboratory also obtained from telemeter implantation. Since the RELSA procedure reflects the multi-dimensional severity information and is highly robust in estimating the quantitative severity within and between models, it can be deemed a valuable tool for laboratory animal severity assessment.

Subject terms: Computational biology and bioinformatics, Gastroenterology, Preclinical research, Translational research

Introduction

Laboratory animals have made significant contributions to biomedical research14. However, public and political concerns regarding experiments on animals are steadily increasing58. Several legislative advancements have been made to alleviate these concerns and to expand animal welfare taking scientific requirements, ethics, and morals into account. These advancements led to the broad adoption of the 3R principles and the passing of the European Union Directive 2010/63/EU9,10. Articles 38, 39, 54, and the Annex VIII of Directive 2010/63/EU demand prospective and retrospective assessment of the severity of experimental procedures, classified into four categories (mild, moderate, severe, and non-recovery)11.

Similarly, the USA's Institutional Animal Care and Use Committees enforce the local Animal Welfare Act and Animal Welfare Regulations12. China, estimated to be the top user of animals for experimental research, has also implemented guidelines for ethical review of laboratory animal welfare in late 201813. Consequently, appropriate methods to assess severity and distress in animal research are of utmost importance. Although significant improvement has been made in regards to the refinement of procedures, such as appropriate analgesia14 and humane endpoints15, the lack of validated, evidence-based methodology hinders the advancement of animal welfare as well as the corresponding science16,17.

Several readout parameters for animal distress evaluation have been found in recent years. In addition, researchers often use non-invasive methods to assess physical and physiological parameters, appearance, and behavior. For example, body weight is widely used to determine animal well-being and to refine humane endpoints in experimental procedures18. Score sheets are also routinely used in judging clinical signs to assess the level of distress in animals. Such scores often provide the basis for subsequent actions if predefined humane endpoints are reached19,20. Additionally, evaluating innate behavior like burrowing and nesting activity has become a cornerstone of assessing animal well-being, as they are reduced in severe distress2123.

Although it is well accepted that multiple rather than single readout parameters should be used to describe and compare animal distress2426, single parameters are often not combined to form a single score2730. However, a combined analysis of multiple readout parameters is vital to explore the multi-dimensional dependencies of the variables. Depending on the nature of the analysis, multivariate or multiple-variable approaches like regressions are used31. Publications have shown that multivariate methods such as Principal Component Analysis (PCA) (Ernst et al. 2020)32 and k-means clustering33,34 can be used to evaluate the performance and the importance of variables in animal models, i.e., as indicators of animal distress. However, methods like PCA are prone to collinearity—which often occurs in the measured variables, and, therefore, require careful analysis. More sophisticated strategies involve, e.g., Machine Learning35 and adaptive modeling36 to assess and select individual variables or combinations. The omnipresent high variance in animal experiments is not only a problem regarding reproducibility but also hampers the usefulness of statistical methods. Non-parametric methods like bootstrapping37,38 can help to obtain more reliable parameter estimators and the corresponding confidence intervals39. This method was deemed superior to classical inferential statistics, especially in the clinical context40,41.

While all readout parameters mentioned above proved helpful in the past, little is known about their robustness in different experimental settings. Robustness, in a broader scientific sense, means that conclusions remain stable even when experimental conditions are varied42,43. Furthermore, robustness has been defined as one of three critical aspects of the usefulness of an animal experiment44,45. Thus, it is essential for translational research and ethical considerations when judging specific animal models. For example, suppose an animal model's distress can be measured robustly. In that case, it makes sense to argue in favor or against that specific model to improve animal well-being by refining interventions and procedures.

Consequently, one purpose of this study was to evaluate the robustness of animal distress parameters (burrowing activity, nesting behavior, body weight, distress score) when varying experimental conditions. In addition, it was the goal to combine these readout parameters into a single metric, called RELSAmax46 and to evaluate whether this score can be used to differentiate the distress levels of four animal models for gastrointestinal diseases (pancreatic cancer, chronic pancreatitis, liver fibrosis, and cholestasis). Finally, this study aimed to check the robustness of the RELSAmax score by employing different sets of reference data from independent laboratories.

Material and methods

Animal models

Animals

This study did not use new animals but re-evaluated data generated in previous projects with the novel focus of combining multiple distress parameters into a score to compare the severity of various animal models. All animal experiments were approved by the local authority (Landesamt für Landwirtschaft, Lebensmittelsicherheit und Fischerei Mecklenburg-Vorpommern (license 1-019/15, 1-062/16, 1-002/17) or the Lower Saxony State Office for Consumer Protection and Food Safety (LAVES, license 15/1905). Mice in laboratory A were housed at the central animal facility of the University Medical Center Rostock in different type III cages (Zoonlab GmbH, Castrop-Rauxel, Germany) at 12 h light/dark cycle (light period: 7:00–19:00), a temperature of 21 ± 2 °C, and relative humidity of 60 ± 20% with food (10 mm pellets, ssniff-Spezialdiäten GmbH, Soest, Germany) and tap water ad libitum. Enrichment was provided in the form of a paper roll (75 × 38 mm, H 0528-151, ssniff-Spezialdiäten GmbH), nesting material (shredded tissue paper, Verbandmittel GmbH, Frankenberg, Deutschland), and a wooden stick (40 × 16 × 10 mm, Abedd, Vienna, Austria). The health of the animal stock was routinely checked according to FELASA guidelines (Helicobacter sp., Rodentibacter pneumotropicus, and murine Norovirus were detected in a few mice within the last 2 years; these animals were not used for any experiments).

Mice for the reference data set B were pair-housed at the Central Animal Facility of the MHH in macrolon type-II cages (360 cm2; Tecniplast, Italy), which were changed once per week. Cages were bedded with autoclaved softwood shavings (poplar wood; AB 368P, AsBe-wood GmbH, Buxtehude, Germany), paper nesting material (AsBe-wood GmbH, Buxtehude, Germany), and two cotton nesting pads (AsBe-wood GmbH, Buxtehude, Germany). Room conditions were standardized (22 ± 1 °C; humidity: 50–60%; 14:10 h light/dark cycle). Mice were fed standard rodent food (Altromin 1324, Altromin, Lage, Germany) ad libitum, and autoclaved (135 °C/60 min) distilled water was provided ad libitum. All mice were randomly allocated to the experimental groups and habituated to the experimental environment before the surgical procedure. The mice were free of the viral, bacterial, and parasitic pathogens listed in the recommendations of the Federation of European Laboratory Animal Science Association.

ETA-F-10 transmitters (Data Sciences International, Minnesota, USA) in laboratory A were placed in the abdominal cavity of male C57BL/6J mice after anesthetizing them with 1–2 vol% isoflurane (n = 10). Analgesia was provided by one s.c. injection of 5 mg/kg carprofen (Rimadyl, Pfizer GmbH, Berlin, Germany) before surgical intervention and 1250 mg/l metamizole (Ratiopharm, Ulm, Germany) in the drinking water until the end of the experiment. This experiment's methodological details and data were published previously26,47.

In laboratory B, transmitters (ETA-F10 or HD-X11; DSI, St Paul, MN, USA) were aseptically implanted into the intraperitoneal cavity of 9–10 weeks old female mice C57BL/6J (n = 13) with electrodes placed subcutaneously for a bipolar lead II configuration under general isoflurane anesthesia. General anesthesia was induced in an induction chamber (15 × 10 × 10 cm) with 5 vol% isoflurane (Isofluran CP®, CP Pharma, Burgdorf, Germany) and an oxygen flow (100% oxygen) of 6 l/min. After confirmation of the absence of the righting reflex and removal from the chamber, anesthesia was maintained via an inhalation mask with 1.5–2.5 vol% isoflurane and an oxygen flow of 1 l/min. The corneal reflex was combined with the eyelid-closing reflex and the toe pinch reflex to determine the depth of anesthesia. Personnel involved have been trained and were experienced in performing these assays carefully and very softly to omit any damage. In addition, the eyes were moistened with eye ointment to protect them from drying (Bepanthen®, Bayer AG, Leverkusen, Germany). After total anesthesia, the surgical area was shaved, and the mice were placed in the surgical field in dorsal recumbency with the head towards the surgeon. During the entire duration of the anesthesia, the mice were placed on a heating pad at 37.0 ± 1.0 °C to prevent hypothermia. EMLA creme (1 g of cream = 25 mg/g Lidocaine + 25 mg/g Prilocaine; Aspen Germany GmbH, Munich, Germany) was used for local anesthesia at the incision sites. For analgesia, animals received either preoperative 200 mg/kg metamizole (Novaminsulfon 500 mg Lichtenstein, Zentiva Pharma GmbH, Frankfurt am Main, Germany) subcutaneously (s.c.) and postoperative 200 mg/kg metamizole orally via the drinking water until day 3 or preoperative 5 mg/kg carprofen (Rimadyl, Zoetis Deutschland GmbH, Berlin, Germany) s.c. and postoperative 2.5 mg/kg s.c. every 12 h until day 3. The methodological details and the data of this experiment were published previously46.

Pancreatic cancer was established by injecting 2.5 × 105 6606 PDA cells slowly into the pancreas of anesthetized male C57BL/6J mice. For all mice, analgesia was provided by one s.c. injection of 5 mg/kg carprofen (Rimadyl, Pfizer GmbH) before cell injection and 1250 mg/l metamizole (Ratiopharm) in the drinking water until the end of the experiment. Mice (n = 7) of one subgroup had an ETA-F-10 transmitter implanted 14 days before cell injection47. All other mice had no transmitter48. Starting on day 4 after cell injection, mice without transmitters were treated with combinatorial chemotherapies or the appropriate vehicles as controls (vehicle CHC/Met: n = 7, vehicle Gal/Met: n = 5). Either α-cyano-4-hydroxycinnamate (CHC, daily i.p. injection of 15 mg/kg, Tocris Bioscience, Bristol, UK) plus metformin (Met, daily i.p. injection of 125 mg/kg, Merck, Darmstadt, Germany) or galloflavin (Gal, i.p. injection of 20 mg/kg three times a week, Tocris Bioscience) plus metformin (Met, daily i.p. injection of 125 mg/kg, Merck) were applied as chemotherapies until day 37 after cell injection (CHC/Met: n = 7, Gal/Met: n = 7). This experiment's methodological details and data were published previously26,48.

Male C57Bl/6J mice were treated with cerulein (Bachem H-3220.0005, Bubendorf, Switzerland) to induce chronic pancreatitis. Cerulein was dissolved in 0.9% sodium chloride and administered by consecutive intraperitoneal (i.p.) injections (50 μg/kg), three hourly injections/day on 3 days/week for 4 weeks. MicroRNA-21 inhibitor (miRCURY LNA™ microRNA-21a-5p inhibitor; sequence: TCAGTCTGATAAGCT) and its corresponding microRNA-21 control (miRCURY LNA™ microRNA-21a-5p control; sequence: TCAGTATTAGCAGCT) were purchased from Qiagen (Hilden, Germany), resuspended in PBS and injected at a dose of 10 mg/kg (s.c.) on day 0 and day 14 after first cerulein injection (inhibitor: n = 8, vehicle: n = 8). This experiment's methodological details and data were published elsewhere26,49.

For inducing liver damage, carbon tetrachloride (Merck Millipore, Eschborn, Germany) was diluted fourfold with corn oil (Sigma-Aldrich, code C8267), and 1 μl per g body weight of this solution (dosage: 0.25 ml/kg body weight) was injected (i.p.) into male BALB/cANCrl mice twice per week over 6 weeks. Analgesia was provided by 1250 mg/l metamizole (Ratiopharm) in the drinking water until the end of the experiment. 20 mg/kg MCC950 (Sigma Aldrich, St. Louis, USA) or aqua dest. ad inj. (vehicle control) was injected (i.p.) into mice (for nesting activity MCC950: n = 6, vehicle: n = 6; for burrowing MCC950: n = 7, vehicle: n = 3; for body weight MCC950: n = 13, vehicle: n = 9; for distress score MCC950: n = 7, vehicle: n = 3) daily from day 28 to day 41 after the first carbon tetrachloride injection. This experiment's methodological details and data were published elsewhere26,50.

A laparotomy was performed on male BALB/cANCrl mice under anesthesia (1.2–2.5 vol% isoflurane) to induce cholestasis by bile duct ligation (BDL). The bile duct was ligated by three surgical knots and transected between the two distal ligations. To relieve pain, 5 mg/kg carprofen (Pfizer GmbH, Berlin, Germany) was injected (s.c.) before the operation, and 1250 mg/l metamizole (Ratiopharm) was provided in the drinking water until the end of the experiment. 20 mg/kg MCC950 or aqua dest. ad inj. (vehicle control) was i.p. injected into mice (for nesting activity MCC950: n = 7, vehicle: n = 7; for burrowing and distress score MCC950: n = 7, vehicle: n = 9; for body weight: MCC950: n = 14, vehicle: n = 16) daily from day 1 before BDL to day 13 after BDL. This experiment's methodological details and data were published elsewhere26,50.

Evaluation of distress

The body weight, burrowing activity, nesting, and distress score were evaluated to assess distress. In laboratory A, the burrowing activity was analyzed by filling a tube (length: 15 cm, diameter: 6.5 cm) with 200 g of food pellets, which was then placed into the mouse cage 2–3 h before the dark phase. The remaining pellets in the burrowing tube were weighed after 2 h (for C57Bl/6J mice) or 17 ± 2 h (for BALB/cANCrl mice), and the weight of the burrowed pellets was calculated. The percentage of burrowing activity and body weight was calculated by using the weight of burrowed pellets or body weight before any intervention as a reference.

The nest-building behavior in laboratory A was analyzed by placing a cotton nestlet (5 cm square of pressed cotton batting, Zoonlab GmbH, Castrop-Rauxel, Germany) in the cage 30–60 min before the dark phase. The nests were scored at the end of the dark phase ± 2 h using a scoring system developed by Deacon51. However, a 6th score point was added to this scoring system. A score of 6 defined a perfect nest: The nest looked like a crater, and more than 90% of the circumference of the nest wall was higher than the body height of the coiled-up mouse. Please note that the nesting activity of BALB/cANCrl mice was measured 1 day after evaluating the burrowing activity to avoid offering the animals two actions at the same time.

In addition, the well-being of mice was assessed at laboratory A by evaluating multiple parameters with the help of a scoresheet. This scoresheet was based on other score sheets19,20 and was previously published by our group52. The mice were, therefore, observed in their home cage for a few minutes, and the distress score was assessed when one or more defined criteria (e.g., spontaneous behavior, flight behavior, or general body conditions) were diagnosed.

Data obtained from laboratory B served as the reference set. Here, animal distress was assessed by analysis of burrowing behavior. Here, baseline measurements were taken on days two and one before surgery. A 250 ml plastic bottle with a length of 15 cm, a diameter of 5.5 cm, and a port diameter of 4 cm was used as a burrowing apparatus. It was filled with 140 g ± 1.5 g of the standard diet pellets of the mice (Altromin1324, Lage, Germany). On day 1, 2, 3, 5, and 7 after surgery, mice were singly placed in a type-II macrolon cage with autoclaved hardwood shavings overnight. The burrowing bottles were placed in the left corner. Half of the used nesting material from the home cage was provided as a shelter in the right corner. The tests started three hours before the dark phase. The amount of burrowed pellets was assessed after two hours and 12 h. Body weight was evaluated 2 days before (baseline) and daily after transmitter implantation (for 1 week). This experiment's methodological details and data were published elsewhere46.

Statistics

All statistical analyses were performed using the R software (v4.0.3)53. The continuous variables (body weight and burrowing activity) were standardized to 100% at baseline levels (day = − 1). Variables representing animal distress (body weight, burrowing activity, nesting score, and the distress score; Figs. 1, 2, 3, 4) were bootstrapped 10,000-fold to obtain assumption-free estimates on the median (y^) and the 95% confidence intervals (rcompanion54). In addition, the distribution of the estimates was inspected visually and tested against the hypothesis of normal distribution using Shapiro–Wilk’s test. In case of evidence for non-normally distributed data, the experimental subgroups were compared using the Kruskal–Wallis test. Pairwise comparisons were calculated with the Wilcoxon–Mann–Whitney test. Holm's correction adjusted the resulting p-values for multiple comparisons, i.e., to control the family-wise error rate. Time-dependent (intra-treatment comparisons) of non–parametric data were analyzed with the Friedman Rank Sum test. Subsequent baseline-level comparisons were calculated with Dunn’s post hoc test and Holm’s correction (see Tables T1T4 in the supplement). In the case of normally distributed data, a one-way ANOVA or repeated-measures ANOVA with sphericity corrections55 for time-dependent data with “day” as the within-subjects variable was performed. Between-treatment analyses and comparisons to baseline levels (with the control group as day − 1) were performed with Dunnett’s test and the Holm correction. Results were considered statistically significant at the following levels: *p ≤ 0.05, **p ≤ 0.01, ***p ≤ 0.001, ****p ≤ 0.0001; multiplicity-adjusted p-values are shown as padj.

Figure 1.

Figure 1

Distress evaluation of an orthotopic pancreatic cancer model. Pancreatic cancer (PDA) was treated with α-cyano-4-hydroxycinnamate plus metformin (CHC/Met), Galloflavin plus Metformin (Gal/Met), or the respective vehicle solutions (V). Telemetric transmitters were implanted into mice in a separate experiment, and 6606PDA cells were injected into the pancreas (Tel + PDA). The percentage of body weight (A), the percentage of burrowing activity (B), nesting (C), and the distress score (D) were evaluated on the indicated days. Differences between the groups in (A) were assessed with the Kruskal–Wallis test. Significant differences in body weight between the groups were detected (χ2 = 44.871, df = 4, p < 0.0001), and the following post hoc comparisons revealed differences between the groups PDA + CHC/Met and Tel + PDA (*padj = 0.035) as well as between PDA + Gal/Met and Tel + PDA (*padj = 0.035). Group differences were also present in the distress score (χ2 = 12.67, df = 4, p = 0.013), between PDA + V (CHC/Met) and PDA and Gal/Met (*padj = 0.015), PDA + V (CHC/Met) and Tel + PDA (*padj = 0.031), PDA + CHC/Met and PDA + Gal/Met (*padj = 0.015), PDA + CHC/Met and Tel + PDA (*padj = 0.031) as well as between PDA + Gal/Met and Tel + PDA (*padj = 0.031). The graphs depict the bootstrapped median estimator on each experimental day and the 95% confidence intervals. (AD) PDA + V (CHC/Met) n = 7; PDA + CHC/Met n = 7; PDA + V (Gal/Met) n = 5; PDA + Gal/Met n = 7; Tel + PDA n = 7.

Figure 2.

Figure 2

Distress evaluation of a chronic pancreatitis model. Chronic pancreatitis (Panc) was treated with a microRNA-21 inhibitor (miRNA-21 inh.) or the identical vehicle solution plus a respective control oligonucleotide (CR). The percentage of body weight (A), the percentage of burrowing activity (B), the nesting (C), and the distress score (D) were assessed on the indicated days. No significant differences between the two groups were determined using the Kruskal–Wallis test in (A) (χ2 = 0.82, df = 1, p = 0.36), (B) (χ2 = 0.97, df = 1, p = 0.32), (C) (χ2 = 0.45, df = 1, p = 0.5) or (D) (χ2 = 0.44, df = 1, p = 0.50). The graphs depict the bootstrapped median estimator on each experimental day and the 95% confidence intervals. (AD) Panc + CR (miRNA-21 inh.) n = 8, Panc + miRNA-21 inh. n = 8.

Figure 3.

Figure 3

Distress evaluation of liver damage in a fibrosis model. Liver damage was induced by repetitive carbon tetrachloride application (CCl4), and mice were treated with an NLRP3 inflammasome inhibitor (MCC950) or the appropriate vehicle solution (V). The percentage of body weight (A), the percentage of burrowing activity (B), the nesting (C), and the distress score (D) were evaluated on the indicated days. No significant differences between the groups were found using the Kruskal–Wallis test in (A) (χ2 = 0.48, df = 1, p = 0.49), (B) (χ2 = 1.33, df = 1, p = 0.25), (C) (χ2 = 0.047, df = 1, p = 0.83) or (D) (χ2 = 2.33, df = 1, p = 0.13). The graphs depict the bootstrapped median estimator on each experimental day and the 95% confidence intervals. (A) MCC950: n = 13, vehicle: n = 9; (B) MCC950: n = 7, vehicle: n = 3; (C) n = 6, vehicle: n = 6. (D) MCC950: n = 7, vehicle: n = 3.

Figure 4.

Figure 4

Distress evaluation of a cholestasis model in mice. Cholestasis was induced by bile duct ligation (BDL), and the animals were treated with an NLRP3 inflammasome inhibitor (MCC950) or the appropriate vehicle solution (V). The percentage of body weight (A), the percentage of burrowing activity (B), the nesting (C), and the distress score (D) were evaluated on the indicated days. No significant differences between the groups were determined using the Kruskal–Wallis test in (A) (χ2 = 0.16, df = 1, p = 0.69), (B) (χ2 = 2.58, df = 1, p = 0.11), (C) (χ2 = 0.05, df = 1, p = 0.82) or (D) (χ2 = 0.26, df = 1, p = 0.61). The graphs depict the bootstrapped median estimator on each experimental day and the 95% confidence intervals. (A) MCC950 n = 14, vehicle: n = 16; (B) MCC950 n = 7, vehicle n = 9; (C) MCC950 n = 7, vehicle n = 7; (D) MCC950 n = 7, vehicle n = 9.

The distress of animal models and experimental subgroups (Figs. 5, 6) was calculated with the Relative Severity Assessment (RELSA) algorithm46 from the RELSA R-package (https://talbotsr.com/RELSA/index.html). The RELSA score was determined using three input variables (body weight, burrowing activity, and distress score) mapped against a reference set (laboratory A) with the same variables but from an independent transmitter-implantation experiment with a defined qualitative severity (i.e., moderate severity in laboratory B). Further, the highest RELSAmax value was obtained for each individual as the maximum RELSA during the treatment, representing the most experienced quantitative distress. The resulting RELSAmax values in the experimental subgroups were 10,000-fold bootstrapped to obtain estimates on the median (R^) and the corresponding 95% confidence intervals. Between-subgroup/model comparisons were calculated with the Wilcoxon–Mann–Whitney test in case of evidence for non-normal data, followed by Holm’s correction to determine differences in distress levels. In the case of normally distributed data, the adjusted t-test was used. The robustness of the severity estimation was tested with data from a second laboratory (B), using only body weight and burrowing activity as input variables. RELSAmax values > 1 were considered more severe than the reference model.

Figure 5.

Figure 5

Within-model comparisons of distress using the variables body weight, burrowing activity, and the distress score, represented as the RELSAmax metric. The red line denotes the maximally experienced severity in the telemetry experiment of laboratory A (reference line), also on the RELSA scale. In each animal model, the distress was assessed between treatment groups. The panels show PDA (A), pancreatitis (B), carbon tetrachloride (CCl4) (C), and the cholestasis (BDL) model (D)—always in comparison to the reference level. Distinct treatments with drugs (CHC/Met, Gal/Met, miRNA-21 inh., MCC950) or treatment with appropriate vehicle controls (V) are indicated. The 95% confidence intervals in all treatment groups remain below the reference line, indicating no evidence that any analyzed treatment has a higher severity than the surgery model. The Kruskal–Wallis test revealed a significant difference between the treatment groups only in panel (A) (χ2 = 16.12, df = 4, p-value = 0.003), more specifically between PDA + Gal/Met and Tel + PDA (*padj = 0.021). The graphs depict the RELSAmax values obtained from individual RELSA analyses, the bootstrapped median estimators, and the 95% confidence intervals. (A) PDA + V (CHC/Met) n = 7, PDA + CHC/Met n = 7, PDA + V (Gal/Met) n = 5, PDA + Gal/Met n = 7, Tel + PDA n = 7; (B) Panc + CR (miRNA-21 inh.) n = 8, Panc + miRNA-21 inh. n = 8; (C) CCl4 + V (MCC950) n = 9, CCl4 + MCC950 n = 13; (D) BDL + V (MCC950) n = 16, BDL + MCC950 n = 14.

Figure 6.

Figure 6

Between-model comparisons of distress after pooling the experimental subgroups. (A) The distress of the carbon tetrachloride (CCl4), pancreatitis (Panc), pancreatic cancer (PDA), and the cholestasis (BDL) model was evaluated using the RELSAmax values, with the reference built from data of laboratory A (transmitter implantation model, measured variables: body weight, burrowing activity as well as the distress score, the red line denotes the maximally experienced severity in the telemetry experiment). A Kruskal–Wallis test shows significant differences between the RELSAmax estimates of the animal models in panel (A) (χ2 = 54.86, df = 3, p < 0.0001). Differences were observed between CCl4 and PDA (**padj < 0.01), CCl4 and BDL (****padj < 0.0001), pancreatitis and BDL (***padj < 0.001) as well as between PDA and BDL (****padj < 0.0001). The following order of model severity can be seen by ordering the RELSAmax estimates: R^CCl4 = 0.40, CI95% [0.29; 0.52]) < PDA (R^PDA = 0.61, CI95% [0.50; 0.71]) < Pancreatitis (R^Panc = 0.70, CI95% [0.56; 0.83]) < BDL (R^BDL = 1.10, CI95% [1.00; 1.20]). (B) Testing the robustness of the procedure by implementing a different reference set from laboratory B with the variables body weight and burrowing activity. The treatment groups also showed significant differences using the Kruskal–Wallis test (χ2 = 53.15, df = 3, p < 0.0001). Pairwise differences were observed between CCl4 and PDA (***padj < 0.001), CCl4 and Pancreatitis (****padj < 0.0001), CCl4 and BDL (****padj < 0.0001), Pancreatitis and BDL (**padj < 0.01) as well as between PDA and BDL (****padj < 0.0001). Here, the order of RELSAmax estimates showed as follows: R^CCl4 = 0.31, CI95% [0.22; 0.40]) < PDA (R^PDA = 0.58, CI95% [0.50; 0.67]) < Pancreatitis (R^Panc = 0.65, CI95% [0.54; 0.76]) < BDL (R^BDL= 0.89, CI95% [0.81; 0.96]). The graphs depict the RELSAmax values obtained from individual RELSA analyses (pooled subgroups for each animal model) and the bootstrapped median estimator together with the 95% confidence intervals. Data of vehicle and drug-treated groups were pooled. (A,B) CCl4 n = 22, PDA n = 26, Pancreatitis n = 16, BDL n = 30.

Ethics declaration

No additional animals were used in this study. All analyses were conducted with data from previously published studies, which were in concordance with ARRIVE guidelines. All methods were carried out in accordance with relevant guidelines and regulations.

Results

Evaluation of single readout parameters for distress

In particular animal models, we analyzed four readout parameters for animal distress (body weight change, burrowing activity, nesting behavior, and a distress score) to assess the robustness of distress evaluation. Within each animal model, distinct groups of mice experienced slight variations in the experimental procedures. For example, the animals were treated with various drugs or vehicle solutions or had an implanted telemetry transmitter.

When assessing an animal model for pancreatic ductal adenocarcinoma (PDA), no significant differences in body weight change were observed when comparing groups of mice treated with specific drugs or vehicle solutions (Fig. 1A). However, significant differences in body weight change were observed between mice, which had a telemeter implanted and telemeter-free mice treated with specific drug combinations (Fig. 1A). No significant differences were also observed when analyzing burrowing activity and nesting behavior. Still, a significant difference was observed in the distress score when comparing distinct groups of mice (Fig. 1B–D). We concluded that some readout parameters, such as the distress score, are prone to experimental variation.

We also evaluated identical readout parameters for distress in an animal model for chronic pancreatitis and compared the distress between groups of mice treated either with a drug or vehicle control (Fig. 2). No significant differences between these two groups were detected when analyzing body weight change, burrowing activity, nesting behavior, or a distress score (Fig. 2).

In addition, we assessed two animal models for liver damage. Liver damage was either caused by redundant carbon tetrachloride (CCl4) administration or by cholestasis induced by bile duct ligation (BDL). When analyzing the CCl4 animal model, no significant differences between treatments were detected (Fig. 3). Also, no significant differences between treatments were found in the BDL model (Fig. 4).

These data demonstrated that the distress readout parameters in three out of four animal models were robust to slight variations in the experimental procedure (Figs. 2, 3, 4). In contrast, body weight and distress score gave different results in the PDA animal model when the experimental procedures varied slightly (Fig. 1). Since we noticed that other readout parameters for distress gave varying results (Fig. 1), we developed a method that combined multiple variables throughout a treatment into a singular score, the maximum of the Relative Severity Assessment score (RELSAmax)46.

Combining multiple variables into RELSA scores

RELSA is a weighted procedure that maps multiple input variables into a single metric, allowing the comparison of different animal models and measurements on a quantitative scale. The comparability between models was achieved by referencing the standardized data to a set with a well-defined qualitative and quantitative severity. In the reference data, three variables from an intraperitoneal transmitter implantation experiment were used (body weight change, burrowing activity, and the distress score, Fig. S1, laboratory A). The measurements showed that the loss in body weight, burrowing activity, and distress score was largest on day 0 (the first day after surgery). Over time, the values recovered back to baseline levels. According to the RELSA concept, the experimental measurements were compared against the maximum deviations in the reference set, scaled, and combined into the RELSA score. The RELSA procedure gives more attention to larger deviations so that potential noise gets less weight. The maximum RELSA value per treatment (RELSAmax) was then used to compare models on a time-independent basis.

Within-model comparisons of distress using RELSAmax

The bootstrapped RELSAmax estimates of the five pancreatic cancer experimental subgroups (Fig. 5A) were all below the RELSA reference level of laboratory A. Also, the 95% confidence intervals of the subgroup estimates did not cross the reference line. Therefore, it can be concluded that pancreatic cancer models show significantly lower distress than the telemetric implantation model. The between-subgroup comparisons showed that the highest distress was achieved in the PDA + Gal/Met subgroup. In addition, the RELSAmax of these animals was significantly higher than those within the Tel + PDA subgroup (Fig. 5A, padj = 0.021).

In the chronic pancreatitis model (Fig. 5B), the 95% confidence intervals of the bootstrapped RELSAmax estimates in the two subgroups were lower than the reference set, indicating significantly lower distress than in the telemetric implantation model. No significant difference between these subgroups was detected.

In the two CCl4 subgroups, the 95% confidence intervals of the bootstrapped RELSAmax estimates were also lower than the reference set, indicating significantly lower distress than in the telemetric implantation model (Fig. 5C). Also, in this animal model, no significant difference between the subgroups was detected (Fig. 5C).

In the BDL model, both estimates were above the reference level (Fig. 5D). However, the lower confidence intervals remained below the reference level. Therefore, there was insufficient evidence to support the hypothesis that the BDL model leads to more severe distress than the reference model. However, two animals in the BDL + MCC950 subgroup experienced very high distress (e.g., RELSAmax = 1.73 and RELSAmax = 2.52). Both animals reached the humane endpoint (> 20% body weight reduction) at the end of the experiment. Again, no significant difference between the subgroups of this animal model was found.

Between-model comparisons of distress using RELSAmax

The distinct subgroups were pooled to focus more on comprehensive model comparisons than experiments. Again, the RELSAmax procedure was used to compare distress levels between four animal models (Fig. 6A). This was achieved by mapping the experimental data to the standardized severity of the reference data from laboratory A. The order of estimates yielded a ranking of severity used to classify the animal models in terms of distress magnitude. This ranking showed the following order in increasing severity based on the RELSAmax model estimates: CCl4 < PDA ≈ Pancreatitis < BDL. Significant differences were found when comparing these animal models (Fig. 6A). Pancreatitis showed significantly higher severity than the CCl4 model (padj ≤ 0.001), which was also true for CCl4 vs. PDA (padj = 0.01). Further, the BDL model showed significantly higher severity towards the CCl4 (padj ≤ 0.0001), pancreatitis (padj ≤ 0.001), and PDA (padj ≤ 0.0001) models.

The robustness of this severity ranking was tested with an independent reference set using two variables (body weight change and burrowing activity) from an intraperitoneal transmitter implantation experiment in laboratory B. Both variables showed the same pattern of estimates on day 0 with subsequent recovery during the following days (see raw data of laboratory A Fig. S1 and laboratory B Fig. S2).

Testing the four animal models against the reference data from laboratory B resulted in RELSAmax estimates that were slightly lower than the ones from the first analysis (see Fig. 6A,B). However, this validation analysis still maintained the previous order of model severity: CCl4 < PDA ≈ Pancreatitis < BDL (Fig. 6B). Again, pancreatitis (padj ≤ 0.0001) and PDA (padj ≤ 0.001) showed significantly higher severity than the CCl4 model. Further, the BDL model showed significantly higher severity towards the CCl4 (padj ≤ 0.0001), pancreatitis (padj ≤ 0.01), and PDA (padj ≤ 0.0001) models. Thus, identical significant differences were observed between the four animal models when using both reference sets. Therefore, judging the severity of animal models based on the significant differences in distress using the RELSAmax method was robust towards using different reference data.

Discussion

This study evaluated animal distress by assessing the body weight change, a distress score, and the burrowing and nesting behavior in four animal models for distinct gastrointestinal diseases (Figs. 1, 2, 3, 4). The robustness within each model was evaluated by minor experimental design modifications, e.g., different treatment strategies. The previously established RELSA procedure46 graded the maximum distress each animal experiences by mapping the multi-dimensional information of various distress parameters against a specific reference set of defined severity. The RELSAmax score proved robust in three out of four animal models when comparing each animal model under different experimental conditions (Fig. 5). The robustness was also given when using RELSA reference sets from two laboratories to estimate the order of severity in the analyzed animal models with the RELSAmax value (Fig. 6).

The basis for an evidence-based severity assessment of animal models is the use of reliable distress parameters16,24,25. A robust body weight reduction was observed in two out of four animal models (BDL and pancreatitis) as a response to surgical intervention or chemical induction (Figs. 1, 2, 3, 4), which implies this parameter's relevance to detecting distress in some gastrointestinal animal models. Body weight alone or combined with other criteria is helpful for humane endpoint determination in many animal models35,5658.Therefore, its evaluation is highly recommended for severity grading by welfare assessment protocols24,25. A robust reduction of burrowing activity was observed in the BDL, pancreatitis, and PDA animal model, but not after CCl4 intoxication (thus, robustness was given in three out of four animal models). A robust reduction of nesting activity was also observed in three out of four animal models (BDL, CCl4, and pancreatitis). The results indicated that burrowing and nesting behavior were even more robust in these experiments than body weight reduction. Indeed, both behavior tests have been reported to detect stress and suffering in many distinct animal models. For example, in animal models for colitis59, Parkinson disease60, pancreatitis61, epilepsy62,63, and depression64. However, a robust increase in the distress score could only be detected in the BDL animal model. This result indicates that in some animal models, the distress score is less informative than body weight change or behavior tests. This conclusion is consistent with previous studies48.

However, we also want to describe specific limitations to the abovementioned summary and conclusions. For example, within the PDA model, significant differences in body weight change were detected between the Tel + PDA and PDA + CHC/Met or PDA + Gal/Met group (Fig. 1A). An initial body weight reduction caused these differences as a result of the previous surgery for transmitter implantation and no additional decrease of body weight in the Tel + PDA group after cell injection47. Thus, body weight reduction is robust if one only compares PDA animal models using different pharmacological treatments. Still, it is not robust when an additional surgical intervention (telemeter implantation) is included in the PDA model. Furthermore, low robustness was also observed when comparing the distress score between PDA + GAL/Met and other treatment groups. This result most likely reflects the side effects of specific drugs, as described in a previous study48. Thus, this is an example that the robustness of data within an animal model might be reduced if the changed variable, e.g., therapy, causes an intense effect on the well-being of the animals. Besides therapy, different housing conditions, such as the number of animals per cage, dark/light cycle and nesting material, might also influence the animals' well-being. In addition, although no sex-specific difference for burrowing behavior was reported in a previous study65, it is still feasible that the sex of the animals might influence results when analyzing distress. However, we noticed in our reference data that both body weight change and burrowing behavior revealed similar significant changes after telemeter implantations. Further, these experiments were performed in two laboratories using different housing conditions with either male (Fig. S1) or female mice (Fig. S2). Thus, neither the housing conditions nor the sex of the animals had a major influence on the conclusions of our study.

Since we re-analyzed published data, a regular power analysis could not be conducted. Therefore, the low sample size in some experimental groups may affect the robustness of the statistical results and, e.g., the underlying assumptions for statistical testing. We used a non-parametric 10,000-fold bootstrapping method to address this issue as an alternative to classical inferential testing. This approach ensured the most robust estimates that could be obtained with the current data in our study.

We observed low robustness of methods when only single parameters were used to compare the distress between animal models. For example, the BDL model causes a gradual increase in the distress score due to a continuous progression of the disease (Fig. 4D). In contrast, after CCl4 injection, no increase in the distress score could be observed during chronic pancreatitis (Fig. 2D). Possibly, methods measure distress in an animal-model specific manner. This outcome was also marked by Mallien et al. when genetic, stress-based, and pharmacological mouse models of depression were compared64. These differences between animal models highlight the need to perform multi-parametric severity assessment when comparing different animal models16,46.

In the present study, a multi-parametric animal model assessment was done using the RELSA algorithm. This algorithm allowed an informed integration of various experimentally available read-out parameters into a single value. The maximum deviations per animal observed on the RELSA scale during an experiment were named RELSAmax. These values represent the utmost distress animals experience on a quantitative scale compared to a defined reference set46. Since it is recommended by the EU Commission to consider the highest distress an animal experiences for defining the severity grade of an experiment66, the RELSAmax is an excellent tool to determine and compare severity levels between animal models.

Please note that variables with large weights will contribute more information to the final RELSA score than variables with little weight. Thus, specific markers may dominate the RELSA values due to their model-specificity. However, these values are adequately mapped to the RELSA space due to the weighting system of the algorithm. Therefore, a holistic comparison of animal models and individual animals is possible with the limitation that the reference set must at least partially contain the same measured variables as the tested models.

No significant differences in the RELSAmax were observed, within each animal model, when comparing their varying experimental conditions, except when analyzing the PDA model (Fig. 5). Here, the significant difference between PDA + Gal/Met and Tel + PDA model can be explained by the lack of body weight reduction in the Tel + PDA model and a high distress score in the PDA + Gal/Met group (for a detailed discussion see previous text).

When pooling the data from various experiments for each animal model and comparing the severity of all animal models with the RELSAmax (Fig. 6), the CCl4, PDA, and pancreatitis models were below a RELSAmax of 1.0. This result indicates that these animal models cause less distress than transmitter implantation, which was used as a reference. Interestingly, the RELSAmax not only allows a comparison of the severity of various animal models to the reference model but is also an excellent tool to compare the distress between different animal models (Fig. 6). Based on RELSAmax, the CCl4 model indicates the lowest severity. In contrast, the PDA model is significantly more stressful for the animals. The pancreatitis model is shown to have a similar severity level to the PDA model. The BDL model has the highest severity, indicated by a significantly higher RELSAmax than the other gastrointestinal animal models. The progression of cholestasis leads to a substantial impairment of welfare, as characterized by major changes in all four single distress parameters (Fig. 4). That this animal model is quite severe is also supported by the low survival rate ranging from 64 to 70%6769. In contrast, the survival rates of pancreatitis at 99%61, the CCl4 model at 100%70, and the PDA model at 83%71 were reported to be higher.

Suppose one considers the implantation of a transmitter as a surgical intervention, which causes moderate distress as suggested by the EU-Commission Guideline in Annex XIII11. In that case, one can define the distress caused by other animal models by implementing the RELSA procedure. Based on this concept, the CCl4, PDA, and pancreatitis models might cause mild to moderate distress, whereas the BDL model might cause moderate to severe distress. However, please note that we compare only the highest distress level, which an animal reaches during the experiment at a single time point. When analyzing the RELSAmax values, we do not consider how long an animal is distressed during an experiment. Thus, cumulative suffering might have to be considered in addition to the RELSAmax when defining specific severity categories. Such longitudinal data can be presented using standard RELSA curves46. However, a concept of how to summarize cumulative suffering by an algorithm to allow direct comparison between, for example, short-term severe distress to long-term moderate distress still needs to be developed.

Conclusion

The present study characterized the robustness of distress assessment using multiple non-invasive methods. With the implementation of the RELSA procedure, the highest distress levels in animals during experiments were described mathematically with the RELSAmax value. This score allowed us to perform several between-animal model comparisons and reference tests. Since the results were very robust, the RELSAmax is reliable for comparing animal models. The algorithm might also be valuable when examining drug side effects or evaluating refinement measures during in vivo experiments.

Supplementary Information

Supplementary Information. (183.3KB, docx)
Supplementary Legends. (12.4KB, docx)

Acknowledgements

This study was supported by the Deutsche Forschungsgemeinschaft (DFG research group FOR 2591, ZE 712/1-1, ZE 712/1-2, VO 450/15-1, and VO 450/15-2 as well as HA6483/1-2, BL953/10-1 and 10-2, BL953/11-1 and 11-2). In addition, Guanglin Tang was supported by the China Scholarship Council (grant number: 201808080167).

Author contributions

D.Z., B.V., A.B., and C.H. developed the study concepts of each animal model. G.T., S.K., A.A., N.S., E.W., J.E., C.H., and D.Z. conducted the experiments and evaluated the data. S.R.T. curated the data, did the statistical/computational analysis, and prepared the figures. D.Z., S.R.T., S.K., and B.S. conceived this study and wrote the manuscript. All authors have reviewed the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

Raw data can be downloaded from a GitHub repository under the following link: https://github.com/mytalbot/gastrointestinal_data.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-29623-8.

References

  • 1.Sims EK, Carr ALJ, Oram RA, DiMeglio LA, Evans-Molina C. 100 years of insulin: Celebrating the past, present and future of diabetes therapy. Nat. Med. 2021;27:1154–1164. doi: 10.1038/s41591-021-01418-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Barré-Sinoussi F, Montagutelli X. Animal models are essential to biological research: Issues and perspectives. Future Sci. OA. 2015 doi: 10.4155/fso.15.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.van Tilbeurgh M, et al. Predictive markers of immunogenicity and efficacy for human vaccines. Vaccines. 2021;9:579. doi: 10.3390/vaccines9060579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Phillips NLH, Roth TL. Animal models and their contribution to our understanding of the relationship between environments, epigenetic modifications, and behavior. Genes. 2019;10:47. doi: 10.3390/genes10010047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ohl F, van der Staay FJ. Animal welfare: At the interface between science and society. Vet. J. 2012;192:13–19. doi: 10.1016/j.tvjl.2011.05.019. [DOI] [PubMed] [Google Scholar]
  • 6.Gross D, Tolba RH. Ethics in animal-based research. Eur. Surg. Res. 2015;55:43–57. doi: 10.1159/000377721. [DOI] [PubMed] [Google Scholar]
  • 7.Petetta F, Ciccocioppo R. Public perception of laboratory animal testing: Historical, philosophical, and ethical view. Addict. Biol. 2020 doi: 10.1111/adb.12991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Codecasa E, Pageat P, Marcet-Rius M, Cozzi A. Legal frameworks and controls for the protection of research animals: A focus on the animal welfare body with a french case study. Animals. 2021;11:695. doi: 10.3390/ani11030695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lee KH, Lee DW, Kang BC. The ‘R’ principles in laboratory animal experiments. Lab. Anim. Res. 2020;36:45. doi: 10.1186/s42826-020-00078-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Olsson IAS, Silva SPD, Townend D, Sandøe P. Protecting animals and enabling research in the European Union: An overview of development and implementation of Directive 2010/63/EU. ILAR J. 2017;57:347–357. doi: 10.1093/ilar/ilw029. [DOI] [PubMed] [Google Scholar]
  • 11.European Parliament. Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 on the protection of animals used for scientific purposesText with EEA relevance (2010).
  • 12.United States Department of Agriculture. USDA Animal Care: Animal Welfare Act and Animal Welfare Regulations. https://www.aphis.usda.gov/animal_welfare/downloads/bluebook-ac-awa.pdf (2019).
  • 13.MacArthur Clark JA, Sun D. Guidelines for the ethical review of laboratory animal welfare People’s Republic of China National Standard GB/T 35892–2018 [Issued 6 February 2018 Effective from 1 September 2018] Anim. Model Exp. Med. 2020;3:103–113. doi: 10.1002/ame2.12111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Foley PL, Kendall LV, Turner PV. Clinical management of pain in rodents. Comp. Med. 2019;69:468–489. doi: 10.30802/AALAS-CM-19-000048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Herrmann K, Flecknell P. The application of humane endpoints and humane killing methods in animal research proposals: A retrospective review. Altern. Lab. Anim. 2018;46:317–333. doi: 10.1177/026119291804600606. [DOI] [PubMed] [Google Scholar]
  • 16.Keubler LM, et al. Where are we heading? Challenges in evidence-based severity assessment. Lab. Anim. 2020;54:50–62. doi: 10.1177/0023677219877216. [DOI] [PubMed] [Google Scholar]
  • 17.Bleich A, Tolba RH. How can we assess their suffering? German research consortium aims at defining a severity assessment framework for laboratory animals. Lab. Anim. 2017;51:667. doi: 10.1177/0023677217733010. [DOI] [PubMed] [Google Scholar]
  • 18.Talbot SR, et al. Defining body-weight reduction as a humane endpoint: A critical appraisal. Lab. Anim. 2020;54:99–110. doi: 10.1177/0023677219883319. [DOI] [PubMed] [Google Scholar]
  • 19.Morton DB, Griffiths PH. Guidelines on the recognition of pain, distress and discomfort in experimental animals and an hypothesis for assessment. Vet. Rec. 1985;116:431–436. doi: 10.1136/vr.116.16.431. [DOI] [PubMed] [Google Scholar]
  • 20.Paster EV, Villines KA, Hickman DL. Endpoints for mouse abdominal tumor models: Refinement of current criteria. Comp. Med. 2009;59:234–241. [PMC free article] [PubMed] [Google Scholar]
  • 21.Deacon RMJ. Burrowing in rodents: A sensitive method for detecting behavioral dysfunction. Nat. Protoc. 2006;1:118–121. doi: 10.1038/nprot.2006.19. [DOI] [PubMed] [Google Scholar]
  • 22.Gjendal K, Ottesen JL, Olsson IAS, Sørensen DB. Burrowing and nest building activity in mice after exposure to grid floor, isoflurane or ip injections. Physiol. Behav. 2019;206:59–66. doi: 10.1016/j.physbeh.2019.02.022. [DOI] [PubMed] [Google Scholar]
  • 23.Kahnau P, Habedank A, Diederich K, Lewejohann L. Behavioral methods for severity assessment. Animals. 2020;10:1136. doi: 10.3390/ani10071136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Smith D, et al. Classification and reporting of severity experienced by animals used in scientific procedures: FELASA/ECLAM/ESLAV Working Group report. Lab. Anim. 2018;52:5–57. doi: 10.1177/0023677217744587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hawkins P, et al. A guide to defining and implementing protocols for the welfare assessment of laboratory animals: Eleventh report of the BVAAWF/FRAME/RSPCA/UFAW Joint Working Group on Refinement. Lab. Anim. 2011;45:1–13. doi: 10.1258/la.2010.010031. [DOI] [PubMed] [Google Scholar]
  • 26.Zechner D, et al. Generalizability, robustness and replicability when evaluating wellbeing of laboratory mice with various methods. Animals. 2022;12:2927. doi: 10.3390/ani12212927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Harikrishnan VS, Hansen AK, Abelson KSP, Sørensen DB. A comparison of various methods of blood sampling in mice and rats: Effects on animal welfare. Lab. Anim. 2018;52:253–264. doi: 10.1177/0023677217741332. [DOI] [PubMed] [Google Scholar]
  • 28.Hurst JL, West RS. Taming anxiety in laboratory mice. Nat. Methods. 2010;7:825–826. doi: 10.1038/nmeth.1500. [DOI] [PubMed] [Google Scholar]
  • 29.Lofgren J, et al. Analgesics promote welfare and sustain tumour growth in orthotopic 4T1 and B16 mouse cancer models. Lab. Anim. 2018;52:351–364. doi: 10.1177/0023677217739934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Peng M, et al. Battery of behavioral tests in mice to study postoperative delirium. Sci. Rep. 2016 doi: 10.1038/srep29874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ebrahimi Kalan M, Jebai R, Zarafshan E, Bursac Z. Distinction between two statistical terms: Multivariable and multivariate logistic regression. Nicotine Tob. Res. 2021;23:1446–1447. doi: 10.1093/ntr/ntaa055. [DOI] [PubMed] [Google Scholar]
  • 32.Ernst, L. et al. Severity assessment in mice subjected to carbon tetrachloride. Sci. Rep.10, 15790. 10.1038/s41598-020-72801-1 (2020). [DOI] [PMC free article] [PubMed]
  • 33.Wassermann L, et al. Monitoring of heart rate and activity using telemetry allows grading of experimental procedures used in neuroscientific rat models. Front. Neurosci. 2020;14:587760. doi: 10.3389/fnins.2020.587760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Häger C, et al. Running in the wheel: Defining individual severity levels in mice. PLoS Biol. 2018;16:e2006159. doi: 10.1371/journal.pbio.2006159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Helgers SOA, et al. Body weight algorithm predicts humane endpoint in an intracranial rat glioma model. Sci. Rep. 2020;10:9020. doi: 10.1038/s41598-020-65783-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bruch S, Ernst L, Schulz M, Zieglowski L, Tolba RH. Best variable identification by means of data-mining and cooperative game theory. J. Biomed. Inform. 2021;113:103625. doi: 10.1016/j.jbi.2020.103625. [DOI] [PubMed] [Google Scholar]
  • 37.Efron B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979 doi: 10.1214/aos/1176344552. [DOI] [Google Scholar]
  • 38.Narinç D, Aygün A, Küçükönder H, Aksoy T, Gürcan EK. Hayvancılık Alanında Bootstrap Tekniğinin Bir Uygulaması: Yumurta Sarı Rengi Örneği. Kafkas Univ. Vet. Fak. Derg. 2015 doi: 10.9775/kvfd.2014.12693. [DOI] [Google Scholar]
  • 39.Wood M. Statistical inference using bootstrap confidence intervals. Significance. 2004;1:180–182. doi: 10.1111/j.1740-9713.2004.00067.x. [DOI] [Google Scholar]
  • 40.Lee DK. Alternatives to P value: Confidence interval and effect size. Korean J. Anesthesiol. 2016;69:555–562. doi: 10.4097/kjae.2016.69.6.555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sim J, Reid N. Statistical inference by confidence intervals: Issues of interpretation and utilization. Phys. Ther. 1999;79:186–195. doi: 10.1093/ptj/79.2.186. [DOI] [PubMed] [Google Scholar]
  • 42.Goodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Sci. Transl. Med. 2016;8:341ps12. doi: 10.1126/scitranslmed.aaf5027. [DOI] [PubMed] [Google Scholar]
  • 43.Erdogan BR, Michel MC. In: Good Research Practice in Non-clinical Pharmacology and Biomedicine. Bespalov A, Michel MC, Steckler T, editors. Springer Open; 2020. pp. 163–175. [Google Scholar]
  • 44.Pallocca G, Rovida C, Leist M. On the usefulness of animals as a model system (part I): Overview of criteria and focus on robustness. Altex. 2022;39:347–353. doi: 10.14573/altex.2203291. [DOI] [PubMed] [Google Scholar]
  • 45.Strech D, Dirnagl U. 3Rs missing: Animal research without scientific value is unethical. BMJ Open Sci. 2019;3:bmjos-2018-000048. doi: 10.1136/bmjos-2018-000048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Talbot SR, et al. RELSA—A multidimensional procedure for the comparative assessment of well-being and the quantitative determination of severity in experimental procedures. Front. Vet. Sci. 2022 doi: 10.3389/fvets.2022.937711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kumstel S, et al. Benefits of non-invasive methods compared to telemetry for distress analysis in a murine model of pancreatic cancer. J. Adv. Res. 2020;21:35–47. doi: 10.1016/j.jare.2019.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kumstel S, et al. Grading animal distress and side effects of therapies. Ann. N. Y. Acad. Sci. 2020;1473:20–34. doi: 10.1111/nyas.14338. [DOI] [PubMed] [Google Scholar]
  • 49.Abdelrahman A, et al. A novel multi-parametric analysis of non-invasive methods to assess animal distress during chronic pancreatitis. Sci. Rep. 2019;9:14084. doi: 10.1038/s41598-019-50682-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Tang G, et al. Comparing distress of mouse models for liver damage. Sci. Rep. 2020;10:19814. doi: 10.1038/s41598-020-76391-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Deacon R. Assessing burrowing, nest construction, and hoarding in mice. J. Vis. Exp. JoVE. 2012;59:e2607. doi: 10.3791/2607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kumstel S, et al. Grading distress of different animal models for gastrointestinal diseases based on plasma corticosterone kinetics. Animals. 2019 doi: 10.3390/ani9040145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.R Core Team. R: A language and environment for statistical computing. https://www.R-project.org/ (R Foundation for Statistical Computing, 2020).
  • 54.Mangiafico, S. rcompanion: Functions to support extension education program evaluation. R package version 2.3.27. http://rcompanion.org/ (2021).
  • 55.Kassambara, A. rstatix: Pipe-friendly framework for basic statistical tests. https://github.com/kassambara/rstatix (2021).
  • 56.How to determine humane endpoints for research animals. Lab. Anim.45, 19. 10.1038/laban.908 (2016). [DOI] [PubMed]
  • 57.Hankenson FC, et al. Weight loss and reduced body temperature determine humane endpoints in a mouse model of ocular herpesvirus infection. J. Am. Assoc. Lab. Anim. Sci. 2013;52:277–285. [PMC free article] [PubMed] [Google Scholar]
  • 58.Mei J, et al. Refining humane endpoints in mouse models of disease by systematic review and machine learning-based endpoint definition. Altex. 2019;36:555–571. doi: 10.14573/altex.1812231. [DOI] [PubMed] [Google Scholar]
  • 59.Cheatham SM, et al. Morphine exacerbates experimental colitis-induced depression of nesting in mice. Front. Pain Res. (Lausanne, Switzerland) 2021;2:738499. doi: 10.3389/fpain.2021.738499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Sager TN, et al. Nest building performance following MPTP toxicity in mice. Behav. Brain Res. 2010;208:444–449. doi: 10.1016/j.bbr.2009.12.014. [DOI] [PubMed] [Google Scholar]
  • 61.Durst M, et al. Analysis of pain and analgesia protocols in acute cerulein-induced pancreatitis in male C57BL/6 mice. Front. Physiol. 2021;12:744638. doi: 10.3389/fphys.2021.744638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Boldt L, et al. Toward evidence-based severity assessment in mouse models with repeated seizures: I. Electrical kindling. Epilepsy Behav. 2021;115:107689. doi: 10.1016/j.yebeh.2020.107689. [DOI] [PubMed] [Google Scholar]
  • 63.van Dijk RM, et al. Design of composite measure schemes for comparative severity assessment in animal-based neuroscience research: A case study focussed on rat epilepsy models. PLoS One. 2020;15:e0230141. doi: 10.1371/journal.pone.0230141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Mallien AS, et al. Comparative severity assessment of genetic, stress-based, and pharmacological mouse models of depression. Front. Behav. Neurosci. 2022;16:908366. doi: 10.3389/fnbeh.2022.908366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Pond HL, et al. Digging behavior discrimination test to probe burrowing and exploratory digging in male and female mice. J. Neurosci. Res. 2021;99:2046–2058. doi: 10.1002/jnr.24857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.European Commission. Caring for animals aiming for better science. Severity Assessment framework. https://ec.europa.eu/environment/chemicals/lab_animals/pdf/guidance/severity/en.pdf (2012).
  • 67.Sigal M, et al. Darbepoetin-α inhibits the perpetuation of necro-inflammation and delays the progression of cholestatic fibrosis in mice. Lab. Investig. 2010;90:1447–1456. doi: 10.1038/labinvest.2010.115. [DOI] [PubMed] [Google Scholar]
  • 68.Gäbele E, et al. TNFalpha is required for cholestasis-induced liver fibrosis in the mouse. Biochem. Biophys. Res. Commun. 2009;378:348–353. doi: 10.1016/j.bbrc.2008.10.155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Zhang X, et al. A rational approach of early humane endpoint determination in a murine model for cholestasis. Altex. 2020;37:197–207. doi: 10.14573/altex.1909111. [DOI] [PubMed] [Google Scholar]
  • 70.Duncan MB, et al. Type XVIII collagen is essential for survival during acute liver injury in mice. Dis. Model. Mech. 2013;6:942–951. doi: 10.1242/dmm.011577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kumstel S, et al. Targeting pancreatic cancer with combinatorial treatment of CPI-613 and inhibitors of lactate metabolism. PLoS One. 2022;17:e0266601. doi: 10.1371/journal.pone.0266601. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information. (183.3KB, docx)
Supplementary Legends. (12.4KB, docx)

Data Availability Statement

Raw data can be downloaded from a GitHub repository under the following link: https://github.com/mytalbot/gastrointestinal_data.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES