Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2025 Nov 13;54(D1):D1133–D1142. doi: 10.1093/nar/gkaf1148

International Mouse Phenotyping Consortium Portal: facilitating investigation of gene function and providing insights into human disease

Robert Wilson 1,, Tuğba Bülbül Ataç 2, Tsz Kwan Cheng 3, Anthony Frost 4, Osman Güneş 5, Marina Kan 6, Piia Keskivali-Bond 7, Federico López Gómez 8, James McLaughlin 9, Jakub Mucha 10, Tawanda Munava 11, Carla Oliveira 12, Diego Pava 13, Jose Francisco Peña Estrada 14, Ewan Selkirk 15, Bora Vardal 16, Sara Wells 17, Pilar Cacheiro 18, Damian Smedley 19, Helen Parkinson 20
PMCID: PMC12807668  PMID: 41231752

Abstract

The International Mouse Phenotyping Consortium (IMPC; https://www.mousephenotype.org/) web portal contains phenotype data for mouse protein-coding genes derived from analysis of data obtained in a systematic and high-throughput fashion from knock-out lines produced by IMPC. The project has produced >1400 candidate mouse models of human disease that recapitulate phenotypes observed in patients. Over 8000 papers rely on data or reagents generated by IMPC, demonstrating the impact of the project on the research and clinical communities, and IMPC data is incorporated into other resources, such as MGI, Open Targets, and UniProt. Data release (DR23.0, 2025) contains >100 million data points from 9277 genes and identified 113 803 significant phenotypes. To manage efficient access to this quantity of high dimensional data the IMPC web portal has been rebuilt using a cloud native architecture. The modern user interface retains the look and feel of the original portal with improvements identified through a usability study. New data visualization and training materials for large scale data access through the API have also been developed to make the resource easier to use.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

The International Mouse Phenotyping Consortium (IMPC) is a global effort established in 2011 that involves 21 institutes in 15 countries who are collaborating to generate a comprehensive catalogue of mammalian gene function by knocking out and systematically phenotyping every protein-coding gene in the mouse on an inbred C57BL/6N genetic background [13]. The project provides a comprehensive catalogue of mammalian gene function and improves understanding of human disease via >1400 candidate models of human disease genes (see the ‘Disease models’ section) which recapitulate the phenotypes seen in human patients. IMPC data has been used to identify >100 validated rare disease disease-gene associations [4] and opened new avenues to explore for the therapeutic treatment of diseases [58]. IMPC phenotyping protocols are standardized and harmonized across the project and publicly shared through IMPReSS [3]. The phenotyping assays provide reproducible, sex, and life stage phenotypes for 16 categories, including anatomy, clinical chemistry, and ageing. Genes studied by IMPC are prioritized for targeting by using an unbiased algorithmic strategy, most recently targeting under-studied genes [911]. Gene essentiality is the primary factor influencing the ability to generate a null allele [12] and analysis of IMPC viability data and comparison of IMPC data with cell derived datasets has provided a framework to investigate the spectrum of intolerance to loss-of-function variation by classifying genes into the following essentiality categories: cellular lethal, developmentally lethal, subviable, viable with phenotype, and viable with no phenotype [13]. These gene essentiality categories align with human disease gene classifications, and refinement of gene essentiality analysis [14] has helped prioritize predicted pathogenic variants in genes not currently linked to Mendelian conditions, enhancing the approach that has already been shown to improve the genetic diagnosis of human diseases [15].

The IMPC portal is used by biomedical researchers, common and rare disease researchers, data scientists, informatics users and resource developers [3, 16]. FAIRness is ensured by reusing unique persistent identifiers for genes from authoritative sources, such as Mouse Genome Informatic (MGI) [17] and the Hugo Gene Nomenclature Committee [18], and widely adopted community ontologies, such as the Mammalian Phenotype (MP) ontology [19] and the Human Phenotype Ontology (HPO) [20]. Information about the IMPC project is disseminated under the CC-BY 4.0 licence through the web portal, API, FTP data downloads [3], and a knowledge graph (KG; see below). These resources provide the scientific community with free and unrestricted access to the primary and secondary data, the genotype-phenotype annotations made by IMPC, disease associations, the standard operating protocols used to perform the broad based phenotyping of the animals, and links to stock centres where the mice and reagents generated by IMPC can be obtained. The latest release (DR23.0, 2025), comprises >100 million experimental observations for 9277 genes with 113 803 significant phenotype calls (51 190 for embryonic stages, 57 918 for early adults <16wk, and 4695 for mid/late adults) [21], assayed in homozygotes, hemizygotes and heterozygotes animals from 9994 independently established lines. Statistics presented in subsequent sections derive from DR23.0 unless otherwise stated.

In this article we focus on, new approaches to identification of candidate disease models, web portal improvements addressing user feedback, portal sustainability and data visualization, improved training tools and materials for scalable data access and improvements to data capture and standardization that have enhanced the quality of the data. These changes to the portal have made it more sustainable, easily extensible and better placed to showcase the large quantity of high dimensional data that IMPC produces.

Improved data acquisition

Data acquisition and quality control (QC) related processes and reporting have been improved with services moving under a unified interface with harmonized API calls to track data through the different processes. The interface allows easier generation of new reports and modification of existing ones that are automatically updated for the data generating centres. This strategy has been utilized for QC reporting and alongside automation of some QC checks and standardization of issue titles and definitions allows faster resolution of QC issues. Updates have also been made to the data completeness process and reporting making it easier to ensure all data has been submitted. These processes and reports ensure faster turnaround of the data for the IMPC portal.

Disease models

The IMPC viability data and the gene essentiality analysis have revealed associations between essentiality categories and specific types of disorders [13, 22], ultimately leading to the identification of novel genes associated with neurodevelopmental disorders [23, 24]. In addition to capturing Mendelian disease associations from Online Mendelian Inheritance in Man (OMIM) [25], Orphanet [26], and DECIPHER [27] for the corresponding human orthologous genes, the automated identification of candidate mouse models of human disease is facilitated by the PhenoDigm algorithm [28]. This algorithm calculates a similarity score for all pairwise combinations of mouse knockouts and human single gene disorders that reflects the ability of the mouse model to mimic the phenotypes observed in humans. In DR23 there were 2938 candidate mouse models for human one-to-one orthologs associated with known Mendelian diseases. The human disease pipeline requires association of MP [19] annotations with the mouse knockouts and HPO [20] terms for diseases, resulting in analysis of 2581 genes by the PhenoDigm algorithm and the identification of 1435 candidate mouse models that recapitulate at least one of the human disease phenotypes. The portal presents evidence on which cross-species phenotypic matches have been detected and it is left to the user to decide whether there is already enough data to support the use of the IMPC mouse as a human disease model for mechanistic or therapeutic studies, or whether further characterization of the mouse is required. For the 1503 genes where no human disease phenotype match is currently detected, several factors influence the ability of mouse models to mimic the clinical features of human disease. These include the absence of assays for certain phenotypes, the pleiotropy and severity of the disease, differences in physiology and gene regulation between the two organisms, as well as variations in viability and zygosity of the disease-matched mouse knockout (see [4]). A summary of the similarity scores to human disorders associated by orthology, as well as potential novel disease associations based on phenotypic similarity, is displayed on the portal for each gene (Fig. 1), and can also be accessed and downloaded in bulk through the IMPC disease models portal (https://diseasemodels.research.its.qmul.ac.uk/) [4].

Figure 1.

Figure 1.

Example human disease section for Gp9. (A) The left tab contains human diseases associated by orthology as curated by OMIM [25], Orphanet [26], and/or DECIPHER [27]. The disease name, identifier, PhenoDigm score, and human phenotypes matching the candidate mouse model’s phenotype are displayed on the table. The right tab contains human diseases predicted to be associated with the gene through phenotypic similarity. (B) The Phenogrid is a collaboration between the Monarch Initiative [29] and the IMPC. It displays the phenotypic similarity of individual phenotypes between the human disease (y-axis) and the matched candidate mouse models (x-axis) as a grid. In this example, the match with the highest similarity score for the autosomal recessive human bleeding disorder Bernard–Soulier syndrome (OMIM:231200 characterized by thrombocytopenia and large platelets), and its associated gene, GP9, was the IMPC orthologous early adult mouse model [Gp9 < tm1.1(KOMP)Vlcg > hom early] with a score of 66.91%. Darker boxes indicate a higher similarity between human and mouse phenotypes, for example, the term giant platelets (HP:0001902) has a near perfect match with increased mean platelet volume (MP:0002599). Further information on PhenoDigm scores available on the help and data analysis pages: https://www.mousephenotype.org/help/data-visualization/gene-pages/disease-models/; https://www.mousephenotype.org/help/data-analysis/disease-associations/.

Knowledge graph

To enable cross-cutting queries which connect IMPC to other databases we have constructed a Neo4j KG derived from the IMPC data and a selection of other relevant data sources, including the NHGRI-EBI GWAS Catalog [30] and Open Targets [7]. To integrate data across species we have loaded the Unified Phenotype Ontology (uPheno) [31] to map from the MP ontology [19] to other ontologies including HPO [20]. We have also loaded LLM embeddings of phenotype terms using OpenAI text-embedding-3 model, to enable connecting between similar phenotypes based on cosine embedding distance where exact mappings are not available. The KG has been tested with queries such as connecting mouse genes to human disease associations, and shared phenotypes to orthologous mouse and human genes and is in development to gather user feedback (KG example queries (source code): https://github.com/EBISPOT/GrEBI/tree/dev/query_templates).

Web portal improvements

Infrastructure

The portal has been rebuilt as a cloud native application designed to address the current scale of the data/access needs and offer improvements to portal usability and extensibility. To improve page load times the portal uses the component model in which each section of the page loads data from a separate microservice in parallel. We also reviewed the structure of the data to optimize the information for each section so the data could be fetched in a single request. Where a significant amount of data was required, for example for the disease section of the gene page, we fetched a small amount of data initially and loaded the rest after the page had completed loading. These strategies significantly improved the performance of the pages for users.

Design and navigation

The portal design was conserved to retain the look and feel for users. User experience (UX) testing with 13 existing and new users and a usability review (see https://www.nngroup.com/articles/ten-usability-heuristics/) provided user-led feedback and retesting with users after revision improved our design. Consistency of design for links and breadcrumbs was reviewed, and the visibility of buttons, labels, tables and forms was increased by improving colour contrast, which is a simple but effective way to enhance the communication of information. Improved visibility of gene-phenotype P-values for statistical data and new filters and sort functions were added for data tables which improved the UX especially for large tables. Site navigation was also improved as the IMPC content has grown over time and become complex. New features were evaluated using moderated usability testing with a subset of the first UX group to determine if the site was sufficiently improved.

Data presentation and navigation

IMPC’s data is large and complex, therefore the summarization of phenotypes found for each mouse gene was improved with increased visibility for links to deeper information elsewhere on the page. For example, users expect to see a link to physiological system(s) when significant phenotypes are present (Fig. 2A). The ability to order materials from IMPC’s associated stock centres (MMRRC [32] and EMMA [33]) is a requirement for users, therefore this user journey has been redesigned, with a newly located and named button ‘Order Mice’ on gene query results page and a new ‘View Allele Products’ button at the top of each gene page. The increased visibility of these links along with links to other sections on the gene page offers users rapid navigation and easier access to data of interest.

Figure 2.

Figure 2.

Improvements to the gene report pages. (A) The summary section for the Myo6 gene report page showing the physiological systems impacted by the knock-out, which link to the phenotypes section of the page to show the data in for the system. Links to data collections for the gene, such as body-weight and viability data, also improve navigation and access to allele products is more prominent. (B) The phenotypes section contains a number of tabs to explore IMPC data. The significant phenotypes tab contains a prominent link to the data supporting the phenotype and filters for zygosity, allele, and physiological system make it easier to identify phenotypes of interest. (C) The table summarizing all experimental observations and P-values at the top of the supporting data page. In some cases, such as hyperactivity, multiple experimental observations support the phenotype call. Clicking on a row in the table shows the data for the experimental observation below the summary table. Significant P-values are shown in orange text and the P-value threshold is stated above the table.

For the phenotype section we made several key changes to simplify navigation to the supporting data that were guided by usability testing. We added an explicit link to the data supporting the phenotype association for the specific gene, which was previously accessed by clicking on the row, and placed this link next to the phenotype term to increase its visibility. We also added a filtering system to this section to enable selection of phenotypes by physiological system in response to feedback during testing (see Fig. 2B). These changes make it easier for users to find the supporting data behind the phenotype calls and identify phenotypes of interest.

On the gene page we report the most significant P-value identified for the phenotypes. IMPC, however, evaluates sexual dichotomy [34], and there is not always a simple one-to-one association between a phenotype and an experimental observation. For example, a hyperactivity phenotype can be assigned based on any one of a number of different open field test measurements that assess anxiety and exploratory behaviours, such as centre average speed and periphery distance travelled. To clearly communicate the data supporting the phenotype assignment we now indicate the number of observations in the supporting datasets link, and summarize all experimental observations and P-values in a table at the top of the supporting data page (Fig. 2C). Based on user feedback we clarified the P-value threshold that IMPC uses to assign a phenotype, and significant P-values are highlighted in the summary table to better communicate them to users. Selection of a row in the summary table makes it active, updating the section below to display the charts presenting the results for that particular experimental measurement. This reorganization of the supporting data page keeps the page compact and provides users with an immediate overview of how the experimental data supports the phenotype call.

Improved graphical analysis tools

Experimental observations are plotted in relation to the magnitude of the P-value (Fig. 3A). Each dot is coloured according to the physiological system, and the P-value threshold used to make a phenotype call is shown as a dotted line. Hovering over a dot provides more data, including: the number of mutants, zygosity, and effect size. Zooming in and out and scrolling allow intuitive and dynamic exploration of the data.

Figure 3.

Figure 3.

New and improved visualization. (A) The graphical analysis tab of the phenotypes section displays the magnitude of the P-values for experimental observations in relation to the P-value threshold used to assign phenotypes. On the right a resizable sliding window allows changing the region viewed and the zoom level. An information panel, as shown for pre-pulse inhibition—PPI4, is displayed when hovering over a data point. Manual annotations are shown as triangles and assigned an arbitrary value of 1 × 10−15 to include them on the chart. (B) When >1 allele has been examined the Allele comparison tab enables phenotypes to be compared for different combinations of alleles, represented as the set of connected dots below the histogram. Selecting a bar in the histogram, as shown for Dbn1 < tm1b(KOMP)Wtsi > and Dbn1 < tm1a(KOMP)Wtsi>, displays the phenotypes for that set below the graph. Filters for zygosity, life stage and sex enable different slices of the data to be selected, and the phenotypes can be summarized to compare across physiological systems. (C) The genome browser reveals how IMPC alleles impact gene models. RefSeq gene models are shown by default, but optionally GENCODE, UniProt SwissProt, and TrEMBL annotations can be displayed. For IMPC CRISPR alleles, tracks show the location of the CRISPR guides, alignment of the FASTA sequence and the calculated deletion. For ES Cell alleles the IKMC allele track from the UCSC genome browser was migrated from the mm9 to the GRCm39 assembly. By default, only the ES Cell derived products available to order through the IMPC website are shown.

Phenotype comparisons

Some IMPC genes have multiple alleles, and these can be associated with different phenotypes. Therefore, we developed a comparison tool that enables users to see shared phenotypes (Fig. 3B). UpSet visualization was selected due to complexities of the data [35]. A histogram indicating the number of phenotypes is shown for each allele, indicated by a solid dot in the table, and for different combinations of alleles, shown as a line intersecting solid dots. Filters enable refinement of the data according to the zygosity of the alleles, the life stage, or the sex of the animals. To make it more flexible users can compare the data by the physiological system impacted by the phenotypes as well as by the specific phenotype terms associated with the alleles (Fig. 3B).

Visualization of ES and CRISPR derived alleles

IKMC ES cell derived alleles, used to generate IMPC mouse lines, are currently visible via a track on the UCSC genome browser (https://bit.ly/433eANq) [36]; however, the NCBI37/mm9 assembly is outdated and none of the IMPC CRISPR derived alleles are represented, meaning users could not assess the impact of deletions vs. current genome annotations. Therefore the data for the ES cell derived alleles were migrated to GRCm39 using LiftOver and CRISPR derived alleles guides were aligned to the genome using BLAT [37] and BedTools [38] and deletion coordinates were derived using BEDOPs [39]. The tracks are displayed on allele pages using Integrative Genomics Viewer [40] with RefSeq [41], GENCODE [42], and UniProt [43] annotations (Fig. 3C). A track hub is also available (https://bit.ly/48h925v) for use in UCSC [36] and Ensembl [44] genome browsers. These resources enable users to evaluate the exons deleted and whether the allele is likely to represent a null given the current gene models.

Improved programmatic data access

In conjunction with rebuilding the IMPC website we have improved and simplified programmatic access to the IMPC data. We have focused on helping users to integrate IMPC data into their workflows by reducing steps required between data access and analysis and widening access for non technical users. IMPC data is available from an Apache Solr API service. To make it easier for users to access IMPC data programmatically (without knowledge of SOLR) a Python library (https://github.com/mpi2/impc-api) now wraps the API and a training course “Accessing Mouse Phenotypes and Disease Associations with the IMPC Solr API” (https://bit.ly/4nD3kzV) explains how to use the library and outlines common query patterns and best practice.

Data archiving

IMPC has gathered a unique collection of 852K images collected from 28 different procedures during the phenotyping process. The set of 465K Xray images are now archived in the BioImage Archive (BIA; https://www.ebi.ac.uk/biostudies/studies/S-BIAD2244), and we are in the process of depositing the remaining images. The BIA is a permanent repository for image data [45] ensuring they are available for the wider image analysis user community to use as a reference dataset for AI and to ensure long term availability of complex and large image datasets.

Discussion

The quantity of high quality IMPC data that has passed a rigorous QC process, see [3] and the improved data acquisition section, has grown by 15 million data points since 2023 to just over 100 million data points and is anticipated to continue to grow at a similar rate. Modernization of the portal means the resource is now in a much better position to be able to manage this quantity of high dimensional data while retaining a performant user interface. The new infrastructure has made the resource more sustainable, docker images enable deployment of the resource in different cloud environments and the microservice architecture makes it easier to add new features. The modern user interface keeps all the functionality and look and feel of the original site, while improving the graphical analysis tool and adding new features, such as the genome browser and the allele comparison tool. A usability study to evaluate the interface improved the layout of the summary section, resulting in an effective path to finding allele information, a key function of the site, and helped improve the navigation between phenotypes and supporting data. Testing also refined the layout and communication of information producing a web portal that is easier to use. Accompanying the revision of the website the documentation and online training courses were updated and improved, notably for programmatic access of data through the API for which a Python library was created to make it easier to access the data.

IMPC tracks the impact of the project through papers published involving reagents generated by the project, currently >8200 have been curated. This helps understand communities using IMPC data, which potentially could help foster collaborations. For example, a tool to quantify the asymmetry in embryos and assess the contribution of genes to abnormal asymmetry has recently been developed based on IMPC 3D image data that could help understand developmental disorders and neurological disorders including schizophrenia, autism spectrum disorder, and Down’s syndrome [46]. A summary of the impact of the publications identified though the IMPC tracking system along with additional examples involving biomarker discovery, investigation of gene therapy interventions and drug efficacy among others has recently been published [8].

While the data for each IMPC data release is available from the EBI FTP site, the deposition of IMPC image data into the BioImage Archive [45] provides an additional way to disseminate IMPC data and will promote further reuse of the data by a wider audience, including image machine learning specialists. IMPC data is also incorporated into other resources, such as MGI [17], Open Targets [7], and UniProt [43] increasing access to the data for a wider user community. For example, IMPC phenotype data in Open Targets [7] has been used as part of the validation strategy to prioritize candidate disease genes based on tissue specific protein-protein associations [47]. The IMPC track hub, which can be loaded into either UCSC [36] or Ensembl [44] genome browsers, offers users of those resources the ability to view IMPC alleles in the context of other genomic features or their own experimental data.

The KG, that includes IMPC and Genome-Wide Association Studies (GWAS) data [30], could be used to add new functionality to the IMPC web portal. While the current section on human disease associations and computational identification of candidate disease models is centred around Mendelian, single-gene disorders, ongoing efforts aim to map mouse phenotypes to Experimental Factor Ontology [48] encoded GWAS traits and integrate this information into the portal. For example, data could be extracted from the KG to power a disease based search based on links to GWAS data and human orthologs of mouse genes. The IMPC KG is also being used to add IMPC mouse phenotype data to other KGs, such as the Common Fund Data Ecosystem Data Distillery KG [49] and UniProt [43] ProKN KG. To further enhance the distribution and reuse of IMPC data we will ensure IMPC data is presented effectively to large language models (LLMs) via a Model Context Protocol server and explore how we can integrate the KG with LLMs. One potential application of a LLM for the website would be to build an interactive help desk, as has been done for other resources such as the Proteomics Identifications Database [50], to provide users with a fast response to questions and an alternative way to access the web site documentation. The value of engaging your user community to help with the design of the website should not be underestimated, and it is important to ensure existing features work well and are valued by users before considering adding new features.

Acknowledgements

We are extremely grateful to the individuals who gave up their time to help us with the usability study and thank the members of the IMPC and all users of our services. We thank Tudor Groza for initiating the cloud native project and Jonny Lu and Binoop Nanu for early prototyping. We acknowledge the EBI cloud consultants Santiago Insua and David Ernesto Gómez Gutiérrez for their help in deploying the website. We are grateful to the IMPC production working group for their work that made it possible to display the ES Cell and CRISPR alleles on the genome browser. We would also like to acknowledge NIH programme directors Colin Fletcher and Oleg Mirochnitchenko and programme analysts Maya Vanzanten and Sofia Martin for their help and support.

Author contributions: Robert Wilson (Conceptualization [equal], Data curation [equal], Software [equal], Supervision [equal], Writing – original draft [lead], Writing – review & editing [equal]), Tugba Bülbül Ataç (Data curation [equal], Formal analysis [equal], Validation [equal]), Tsz Kwan Cheng (Data curation [equal], Formal analysis [equal], Validation [equal]), Anthony Frost (Conceptualization [equal], Data curation [equal], Software [equal], Supervision [equal], Validation [equal]), Osman Güneş, (Data curation [equal], Software [equal]), Marina Kan (Data curation [equal], Formal analysis [equal], Software [equal], Visualization [equal]), Piia Keskivali-Bond (Conceptualization [equal], Data curation [equal], Formal analysis [equal], Supervision [equal], Validation [equal], Writing – original draft [equal]), Federico López Gómez, (Data curation [equal], Formal analysis [equal], Software [equal], Visualization [equal]), James McLaughlin (Software [equal]), Jakub Mucha (Data curation [equal], Software [equal]), Tawanda Munava (Software [equal], Visualization [equal]), Carla Oliveira (Visualization [equal]), Diego Pava (Data curation [equal], Formal analysis [equal], Software [equal], Writing – original draft [equal], Writing – review & editing [equal]), Jose Francisco Pena Estrada (Software [equal], Visualization [equal]), Ewan Selkirk (Data curation [equal], Formal analysis [equal], Software [equal], Validation [equal]), Bora Vardal (Data curation [equal], Software [equal], Validation [equal]), Sara Wells (Conceptualization [equal], Funding acquisition [equal], Project administration [equal], Supervision [equal]), Pilar Cacheiro (Conceptualization [equal], Formal analysis [equal], Funding acquisition [equal], Project administration [equal], Software [equal], Supervision [equal], Writing – original draft [equal], Writing – review & editing [equal]), Damian Smedley (Conceptualization [equal], Funding acquisition [equal], Project administration [equal], Supervision [equal], Writing – review & editing [equal]), Helen Parkinson (Conceptualization [equal], Funding acquisition [equal], Project administration [equal], Supervision [equal], Writing – review & editing [equal])

Contributor Information

Robert Wilson, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Tuğba Bülbül Ataç, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Tsz Kwan Cheng, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Anthony Frost, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Osman Güneş, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Marina Kan, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Piia Keskivali-Bond, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Federico López Gómez, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

James McLaughlin, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Jakub Mucha, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Tawanda Munava, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Carla Oliveira, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Diego Pava, Clinical Pharmacology and Precision Medicine, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, United Kingdom.

Jose Francisco Peña Estrada, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Ewan Selkirk, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Bora Vardal, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Sara Wells, Mary Lyon Centre at MRC Harwell, Harwell Campus, Oxfordshire OX11 0RD, United Kingdom.

Pilar Cacheiro, Clinical Pharmacology and Precision Medicine, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, United Kingdom.

Damian Smedley, Clinical Pharmacology and Precision Medicine, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, United Kingdom.

Helen Parkinson, European Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Genome Campus, Hinxton CB10 1SD, United Kingdom.

Conflict of interest

None declared.

Funding

National Institutes of Health (NIH) [2UM1HG006370-11, 5UM1HG006370-13, 3UM1HG006370-12S2, 1U24OD038424-01]; EMBL-EBI Core Funding. Funding to pay the Open Access publication charges for this article was provided by NIH.

Data availability

References

  • 1. Brown SD, Moore MW. The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping. Mamm Genome. 2012;23:632–40. 10.1007/s00335-012-9427-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Lloyd KCK. Commentary: the International Mouse Phenotyping Consortium: high-throughput in vivo functional annotation of the mammalian genome. Mamm Genome. 2024;35:537–43. 10.1007/s00335-024-10068-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Groza T, Gomez FL, Mashhadi HHet al. The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Res. 2023;51:D1038–45. 10.1093/nar/gkac972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Cacheiro P, Pava D, Parkinson Het al. Computational identification of disease models through cross-species phenotype comparison. Dis Model Mech. 2024;17:dmm050604. 10.1242/dmm.050604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Ferrero E, Dunham I, Sanseau P. In silico prediction of novel therapeutic targets using gene-disease association data. J Transl Med. 2017;15:182. 10.1186/s12967-017-1285-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Brommage R, Powell DR, Vogel P. Predicting human disease mutations and identifying drug targets from mouse gene knockout phenotyping campaigns. Dis Model Mech. 2019;12:dmm038224. 10.1242/dmm.038224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Buniello A, Suveges D, Cruz-Castillo Cet al. Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic Acids Res. 2025;53:D1467–75. 10.1093/nar/gkae1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Hölter SM, Cacheiro P, Smedley Det al. IMPC impact on preclinical mouse models. Mamm Genome. 2025;36:384–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Grady SK, Peterson KA, Murray SAet al. A graph theoretical approach to experimental prioritization in genome-scale investigations. Mamm Genome. 2024;35:724–33. 10.1007/s00335-024-10066-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Pandey AK, Lu L, Wang Xet al. Functionally enigmatic genes: a case study of the brain ignorome. PLoS One. 2014;9:e88889. 10.1371/journal.pone.0088889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Riba M, Garcia Manteiga JM, Bošnjak Bet al. Revealing the acute asthma ignorome: characterization and validation of uninvestigated gene networks. Sci Rep. 2016;6:24647. 10.1038/srep24647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Elrick H, Peterson KA, Willis BJet al. Impact of essential genes on the success of genome editing experiments generating 3313 new genetically engineered mouse lines. Sci Rep. 2024;14:22626. 10.1038/s41598-024-72418-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Cacheiro P, Muñoz-Fuentes V, Murray SAet al. Human and mouse essentiality screens as a resource for disease gene discovery. Nat Commun. 2020;11:655. 10.1038/s41467-020-14284-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Cacheiro P, Marengo G, Gorkin DUet al. The spectrum of gene intolerance to variation: Insights from a rare disease cohort. 2025, medRxiv, 10.1101/2025.01.28.25321201, 29 January 2025, pre-print: not peer-reviewed. [DOI]
  • 15. 100,000 Genomes Project Pilot Investigators; Smedley D, Smith KR, Martin Aet al. 100,000 genomes pilot on rare-disease diagnosis in health care – preliminary Report. N Engl J Med. 2021;385:1868–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Koscielny G, Yaikhom G, Iyer Vet al. The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. 2014;42:D802–9. 10.1093/nar/gkt977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Baldarelli RM, Smith CL, Ringwald Met al. Mouse Genome Informatics: an integrated knowledgebase system for the laboratory mouse. Genetics. 2024;227:iyae031. 10.1093/genetics/iyae031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Seal RL, Braschi B, Gray Ket al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 2023;51:D1003–9. 10.1093/nar/gkac888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Bello SM, Anagnostopoulos AV, Carmody LCet al. Expanding and refining the mammalian phenotype ontology to enhance disease model discovery. Dis Model Mech. 2025;18:dmm052385. 10.1242/dmm.052385, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Gargano MA, Matentzoglu N, Coleman Bet al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 2024;52:D1333–46. 10.1093/nar/gkad1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Haselimashhadi H, Mason JC, Mallon AMet al. OpenStats: a robust and scalable software package for reproducible analysis of high-throughput phenotypic data. PLoS One. 2020;15:e0242933. 10.1371/journal.pone.0242933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Cacheiro P, Westerberg CH, Mager Jet al. Mendelian gene identification through mouse embryo viability screening. Genome Med. 2022;14:119. 10.1186/s13073-022-01118-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Vetro A, Pelorosso C, Balestrini Set al. Stretch-activated ion channel TMEM63B associates with developmental and epileptic encephalopathies and progressive neurodegeneration. Am J Hum Genet. 2023;110:1356–76. 10.1016/j.ajhg.2023.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Cousin MA, Creighton BA, Breau KAet al. Pathogenic SPTBN1 variants cause an autosomal dominant neurodevelopmental syndrome. Nat Genet. 2021;53:1006–21. 10.1038/s41588-021-00886-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Amberger JS, Bocchini CA, Scott AFet al. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019;47:D1038–43. 10.1093/nar/gky1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Rath A, Olry A, Dhombres Fet al. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum Mutat. 2012;33:803–8. 10.1002/humu.22078. [DOI] [PubMed] [Google Scholar]
  • 27. Foreman J, Perrett D, Mazaika Eet al. DECIPHER: improving Genetic Diagnosis Through Dynamic Integration of Genomic and Clinical Data. Annu Rev Genomics Hum Genet. 2023;24:151–76. 10.1146/annurev-genom-102822-100509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Smedley D, Oellrich A, Köhler Set al. PhenoDigm: analyzing curated annotations to associate animal models with human diseases. Database (Oxford). 2013;2013:bat025. 10.1093/database/bat025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Putman TE, Schaper K, Matentzoglu Net al. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res. 2024;52:D938–49. 10.1093/nar/gkad1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Cerezo M, Sollis E, Ji Yet al. The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity. Nucleic Acids Res. 2025;53:D998–D1005. 10.1093/nar/gkae1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Matentzoglu N, Bello SM, Stefancsik Ret al. The Unified Phenotype Ontology: a framework for cross-species integrative phenomics. Genetics. 2025;229:iyaf027. 10.1093/genetics/iyaf027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Agca Y, Amos-Landgraf J, Araiza Ret al. The mutant mouse resource and research center (MMRRC) consortium: the US-based public mouse repository system. Mamm Genome. 2024;35:524–36. 10.1007/s00335-024-10070-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Ali Khan A, Valera Vazquez G, Gustems Met al. INFRAFRONTIER: mouse model resources for modelling human diseases. Mamm Genome. 2023;34:408–17. 10.1007/s00335-023-10010-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Karp NA, Mason J, Beaudet ALet al. Prevalence of sexual dimorphism in mammalian phenotypic traits. Nat Commun. 2017;8:15475. 10.1038/ncomms15475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Lex A, Gehlenborg N, Strobelt Het al. UpSet: visualization of intersecting sets. IEEE Trans Vis Comput Graph. 2014;20:1983–92. 10.1109/TVCG.2014.2346248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Perez G, Barber GP, Benet-Pages Aet al. The UCSC Genome Browser database: 2025 update. Nucleic Acids Res. 2025;53:D1243–9. 10.1093/nar/gkae974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res. 2002;12:656–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Neph S, Kuehn MS, Reynolds APet al. BEDOPS: high-performance genomic feature operations. Bioinformatics. 2012;28:1919–20. 10.1093/bioinformatics/bts277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Robinson JT, Thorvaldsdottir H, Turner Det al. igv.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). Bioinformatics. 2023;39:btac830. 10.1093/bioinformatics/btac830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Goldfarb T, Kodali VK, Pujar Set al. NCBI RefSeq: reference sequence standards through 25 years of curation and annotation. Nucleic Acids Res. 2025;53:D243–57. 10.1093/nar/gkae1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Mudge JM, Carbonell-Sala S, Diekhans Met al. GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res. 2025;53:D966–75. 10.1093/nar/gkae1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. UniProt Consortium . UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 2025;53:D609–17. 10.1093/nar/gkae1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Dyer SC, Austine-Orimoloye O, Azov AGet al. Ensembl 2025. Nucleic Acids Res. 2025;53:D948–57. 10.1093/nar/gkae1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Hartley M, Kleywegt GJ, Patwardhan Aet al. The BioImage Archive - building a home for life-sciences microscopy data. J Mol Biol. 2022;434:167505. 10.1016/j.jmb.2022.167505. [DOI] [PubMed] [Google Scholar]
  • 46. Rolfe SM, Mao D, Maga AM. Streamlining asymmetry quantification in fetal mouse imaging: a semi-automated pipeline supported by expert guidance. Dev Dyn. 2025;254:999–1010. 10.1002/dvdy.70028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Laman Trip DS, van Oostrum M, Memon Det al. A tissue-specific atlas of protein-protein associations enables prioritization of candidate disease genes. Nat Biotechnol. 2025. 10.1038/s41587-025-02659-z. [DOI] [PubMed] [Google Scholar]
  • 48. Malone J, Holloway E, Adamusiak Tet al. Modeling sample variables with an experimental factor ontology. Bioinformatics. 2010;26:1112–8. 10.1093/bioinformatics/btq099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Ahooyi TM, Stear B, Simmons JAet al. The Data Distillery: a graph framework for semantic integration and querying of biomedical data. 2025. bioRxiv, 10.1101/2025.08.11.666099, 4 September 2025, pre-print: not peer-reviewed. [DOI]
  • 50. Bai J, Kamatchinathan S, Kundu DJet al. Open-source large language models in action: a bioinformatics chatbot for PRIDE database. Proteomics. 2024;24:e2400005. 10.1002/pmic.202400005. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

IMPC has gathered a unique collection of 852K images collected from 28 different procedures during the phenotyping process. The set of 465K Xray images are now archived in the BioImage Archive (BIA; https://www.ebi.ac.uk/biostudies/studies/S-BIAD2244), and we are in the process of depositing the remaining images. The BIA is a permanent repository for image data [45] ensuring they are available for the wider image analysis user community to use as a reference dataset for AI and to ensure long term availability of complex and large image datasets.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES