Evidence over the last 10 years suggests that reproducibility in biomedical animal research is alarmingly low (1-5). Various potential causes of poor reproducibility have been identified, including inadequate animal study design, poor scientific rigour, low statistical power, analytical flexibility, and publication bias (1-9). As inadequate animal study design might be a major culprit, scientists and animal welfare regulators explicitly recommend environmental standardization as the best way to guarantee reproducible results in animal experiments (10). Even though studies with rodent models can be performed using genetically identical animals in highly controlled and standardized environments, reproducibility of results using such models has also emerged as a topic of concern (11-13). Therefore, contrary to the common belief that standardization guarantees reproducibility (10), it has been suggested that rigorous standardization (both genetic and environmental) may produce results that are idiosyncratic to the specific standardized conditions under which they were obtained, indicating that standardization might be an important factor of poor reproducibility (11-14).
The explanation for this phenomenon lies in the fact that an animal's phenotype (which represents the result of complex and dynamic interaction between its genotype and the environment in which it develops) significantly contributes to its response to an experimental treatment. Therefore, phenotypic plasticity caused by gene by environment interactions (G x E) determines the range of variation (reaction norm) of the animal's response and should be considered as an important mechanism of biological variation (14). Instead of incorporating biological variation in the experimental design, such variation is considered as a nuisance, which scientists aim to eliminate through rigorous standardization of both the genotype of the animals and the environmental conditions under which the animals are housed and tested (14,15).
However, despite efforts to standardize conditions even across laboratories, different laboratories always differ in many environmental factors that affect the animals’ phenotype (e.g. noise, odours, microbiota, or personnel (16–19)). Therefore, different laboratories will inevitably standardize to different lab-specific conditions, potentially resulting in different animals with lab-specific phenotypes. Taken together, this suggests that a failure to replicate the results of a study might indicate that the replication studies were testing animals of a different phenotype (11, 13).
To study this further, we designed a multi-lab study to investigate whether differences in housing and husbandry conditions between different rearing facilities induce variation in the hypothalamic-pituitary-adrenal (HPA) stress reactivity and anxiety-related behaviours. Since differences in stress responses could be mediated by epigenetic mechanisms (20), which act as molecular modulators between genetic make-up and environment, we will perform epigenomic analyses in the ventral hippocampus. The epigenetic analysis will be exploratory and restricted to mice from laboratories showing the greatest differences in HPA-axis reactivity. Thus, differences between the different cohorts of mice will reveal the range of phenotypic variation induced by common differences among laboratory conditions. Our findings will have implications for the reproducibility of results in animal research.
1. C. G. Begley, L. M. Ellis, Drug development: Raise standards for preclinical cancer research. Nature. 483, 531–533 (2012).
2. F. Prinz, T. Schlange, K. Asadullah, Believe it or not: How much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712 (2011).
3. M. R. Munafò, B. A. Nosek, D. V. M. Bishop, K. S. Button, C. D. Chambers, N. Percie du Sert, U. Simonsohn, E.-J. Wagenmakers, J. J. Ware, J. P. A. Ioannidis, A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
4. J. P. A. Ioannidis, Why most published research findings are false. PLoS Med. 2, 696–701 (2005).
5. E. Loken, A. Gelman, Measurement error and the replication crisis. Science. 355, 584–585 (2017).
6. L. P. Freedman, M. C. Gibson, The impact of preclinical irreproducibility on drug development. Clin. Pharmacol. Ther. 97, 16–18 (2015).
7. J. P. A. Ioannidis, D. Fanelli, D. D. Dunne, S. N. Goodman, Meta-research: Evaluation and improvement of research methods and practices. PLOS Biol. 13, e1002264 (2015).
8. S. N. Goodman, D. Fanelli, J. P. A. Ioannidis, What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12–341ps12 (2016).
9. D. Bishop. Rein in the four horsemen of irreproducibility, Nature. 2019 Apr;568(7753):435. doi: 10.1038/d41586-019-01307-2.
10. A. C. Beynen, K. Gärtner, L. F. M. van Zutphen, in Principles of laboratory animal science, L. F. M. Zutphen, V. Baumans, A. C. Beynen, Eds. (Elsevier Ltd, Amsterdam, ed. 2nd, 2003), pp. 103–110.
11. H. Würbel, Behaviour and the standardization fallacy. Nat. Genet. 26, 263 (2000).
12. S. H. Richter, J. P. Garner, C. Auer, J. Kunert, H. Würbel, Systematic variation improves reproducibility of animal experiments. Nat. Methods. 7, 167–168 (2010).
13. S. H. Richter, J. P. Garner, H. Würbel, Environmental standardization: Cure or cause of poor reproducibility in animal experiments? Nat. Methods. 6, 257–261 (2009).
14. B. Voelkl, H. Würbel, Reproducibility crisis: Are we ignoring reaction norms? Trends Pharmacol. Sci. 37 (2016), pp. 509–510.
15. Voelkl, B. Vogt, L. Sena, E.S. Würbel H, Reproducibility of preclinical animal research improves with heterogeneity of study samples. PLoS Biol. 16(2):e2003693 (2018).
16. Franklin, C.L., Ericsson, A. C, Microbiota and reproducibility of rodent models. Lab Anim (NY). 46(4): 114–122 (2017)
17. Parkar, S. G., Kalsbeek, A., & Cheeseman, J. F, Potential Role for the Gut Microbiota in Modulating Host Circadian Rhythms and Metabolic Health. Microorganisms, 7(2), 41. (2019).
18. Stappenbeck, T. S. & Virgin, H. W. Accounting for reciprocal host–microbiome interactions in experimental science. Nature 534, 191–199 (2016).
19. Velazquez, E.M, Nguyen, H, Heasley, K.T, Saechao, C.H. , M. Gil, L et al. Endogenous Enterobacteriaceae underlie variation in susceptibility to Salmonella infection. Nature Microbiology (2019).
20. Anacker C, O'Donnell KJ, Meaney MJ. Early life adversity and the epigenetic programming of hypothalamic-pituitary-adrenal function. Dialogues Clin Neurosci. 16(3):321–333 (2014).
This experiment will test the hypothesis that differences in the environmental conditions (housing and husbandry) between different rearing laboratories cause substantial differences in the phenotype of mice, specifically in terms of HPA stress reactivity and anxiety.
We will use a multi-laboratory study design to model differences in environmental conditions between different rearing laboratories as realistically as possible. An overview of the study design is attached as Figure 1.
90 time-mated pregnant females C57BL/6JRj in the last third of pregnancy, all derived from the same breeding stock of a commercial breeder (Janvier Labs, Le Genest-Saint-Isle, France), will be randomly allocated to 5 different laboratories (n = 18 per lab). Ordering all animals from the same breeder and deriving all pregnant females from the same breeding stock will guarantee that the cohorts of mice reared by the different laboratories will be as genetically similar as possible and that all differences between them can be attributed to the differential rearing environments. At weaning, in each rearing laboratory up to 12 litters with at least 3 pups of each sex will be selected randomly from all litters. If necessary, to achieve n=12, these will be complemented by litters with at least 2 pups of each sex. From each litter 3 (or 2) pups per sex will be selected randomly and reared together until the age of 8 weeks (PND 56) according to the specific protocols of housing and husbandry of each of the 5 animal facilities. At PND 57 one mouse per sex per cage of all cages with 3 mice will be sacrificed and various samples will be obtained for analysis of secondary outcome measures to control for changes induced by the transport to, and housing in, the test laboratory (See below list of study outcomes).
The remaining pairs of male and female offspring (n = 240) will be transported from the 5 rearing facilities to the testing facility at the University of Bern, where after an acclimation period of about 2.5 weeks, one mouse per sex per litter (n=120) will be tested for phenotypic differences in HPA stress reactivity (primary outcome variable) and anxiety-related behaviour (secondary outcome measures), while their cagemates (n = 120) will be sacrificed and the ventral hippocampus dissected and prepared for determination of epigenetic changes related to phenotypic differences between laboratories using genome-wide DNA methylation profiling. For this, test-naïve mice will be used to avoid the effects of testing on the epigenetic profile.
Two-established tests of anxiety-related behaviour, the open-field test and the light-dark box test, will be conducted in that order, with a break of 7 days in between, followed by assessing HPA-stress reactivity in response to 20 min physical restraint after another break of 7 days.
One week after the end of testing (at ~14.5 weeks of age), all mice will be euthanized by cervical dislocation followed by decapitation for post-mortem analysis.
The genome-wide DNA methylation analysis, as well as the molecular and histological analyses, will be restricted to the mice from the two laboratories showing the greatest differences in HPA stress reactivity. The final choice of the next-generation sequencing assays and the exact methods, including the bioinformatics pipeline, will be determined at a later stage.
The list of study outcomes
1) Primary outcome: HPA stress reactivity (measured by the area under the curve (AUC) of changes in corticosterone levels in the blood plasma in response to a standard stressor, acute restraint stress for 20 minutes).
2) Secondary outcomes:
a) measures to control for the effect of transport to and housing and testing in the test lab
- · Basal corticosterone levels in the blood plasma
- · Body weights
- · Weight of adrenal glands
- · Histological examination of the adrenal glands
- · Candidate gene expression analysis in the ventral hippocampus
- · Structural changes in the hypothalamus and ventral hippocampus
(b) secondary measures of HPA reactivity and of anxiety-related behaviour
- · Anxiety related behaviour (measured by light dark box test and open field test)
- · Weight of adrenal glands
- · Histological examination of the adrenal glands
- · Basal corticosterone levels in the blood plasma
- · Structural changes in the hypothalamus and ventral hippocampus
- · Genome-wide DNA methylation analysis in the ventral hippocampal neurons
- · Candidate gene expression analysis in the ventral hippocampus
(c) further secondary measures
- · Histological examination of the liver and thymus
- · Caecal and faecal samples to assess the gut microbiome
- · Histological examination of the germ cells development
- · Molecular analysis of epigenetic regulation in germ cells
All experimenters performing the stress reactivity tests, behavioural tests, and post-mortem analyses will be blind to the "treatment", i.e. the rearing facility the animals were transferred from. Blinding will be done by two colleagues otherwise not involved in the execution of the experiments. Cages will be assigned new identification numbers and positions of cages within and between the cage racks will be randomly re-shuffled so that the experimenters cannot deduce the origin of the cages (i.e. treatment) from the ID number or the position of the cage. For assignment of ID numbers and cage positions a script will be written in the software Mathematica (version 11), using the inbuilt random number generator.
The order of the cages during animal habituation and testing of all mice will be randomized using the random number generator of the software Mathematica (verion 11). Each of the 3 experimenters will get randomly assigned 20 male and 20 female mice which will be handled during the behavioural testing. Always two experimenters are testing animals in parallel (at the same time, but separated). Testing is done for both sexes by each experimenter in 3 blocks of 5 animals, each. The randomization and allocation procedure is restricted so that in each block for each experimenter there is exactly one from each lab in random order, with the addition that in no case animals tested at the same time are from the same lab. The allocation and randomization script is attached as supplementary file “Test Allocation.pdf”