Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2025 Oct 14;54(D1):D1119–D1132. doi: 10.1093/nar/gkaf1024

IMGT® at scale: FAIR, dynamic, and automated tools for immune locus analysis

Gaoussou Sanou 1,b, Guilhem Zeitoun 2,b, Taciana Manso 3, Milad Eidi 4, Shamsa Batool 5, Anjana Kushwaha 6, François Grand 7, Myriam Croze 8, Axel Vaillant 9, Chahrazed Debbagh 10, Nika Abdollahi 11, Maria Georga 12, Ariadni Papadaki 13, Ifigeneia Sideri 14, Turkan Samadova 15, Joumana Jabado-Michaloud 16, Géraldine Folch 17, Véronique Giudicelli 18, Patrice Duroux 19, Sofia Kossida 20,21,
PMCID: PMC12807764  PMID: 41091930

Abstract

IMGT®, the international ImMunoGeneTics information system®, has advanced its comprehensive platform for the analysis of immunoglobulin (IG) and T-cell receptor (TR) genes through the development of new automated and scalable tools. This article presents major updates aligned with IMGT’s three axes of research. Axis I introduces dynamic resources such as IMGT/GeneTables, IMGT/AssemblyComparison, IMGT/CDRLengths, IMGT/MultiGenomeViewer, enabling real-time access to annotated genomic data, and IMGT/StatAssembly to assess quality of IG/TR loci in assemblies. Axis II enhances repertoire analysis with a redesigned IMGT/GeneFrequency tool and new customization features in IMGT/V-QUEST, supporting flexible exploration of IG and TR gene expression. Axis III improves the accurate prediction of peptide–MHC thanks to IMGT/RobustpMHC. Additionally, the IMGT knowledge graph and its therapeutic extension, IMGT/mAb-KG, provide semantically structured access to >100 million immunogenetic triplets, integrating IMGT databases and linking IMGT content to external biomedical resources. These developments promote standardization, interoperability, and integrative analysis across immunogenetics and clinical applications, reinforcing IMGT’s (accessible at https://www.imgt.org) role as a core reference in the era of FAIR data and personalized medicine.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

IMGT®, the international ImMunoGeneTics information system®, offers a standardized and comprehensive platform for analyzing and understanding immunoglobulins (IG, antibodies) and T-cell receptors (TRs) [1]. IMGT integrates high-quality curated databases [2], advanced bioinformatics tools, and a unified vocabulary to ensure consistent and accurate immune repertoire analysis [3]. IMGT includes several specialized databases [4], such as IMGT/LIGM-DB for nucleotide sequences, IMGT/GENE-DB for genes and alleles [5], IMGT/2Dstructure-DB and IMGT/3Dstructure-DB for amino acid sequences, and their two-dimensional (2D) and three-dimensional (3D) structures [6], and IMGT/mAb-DB dedicated to therapeutic monoclonal antibodies and their clinical applications [7]. To address the evolving needs of the community and overcome specific challenges in immunogenetics research, IMGT has developed new tools and resources with specialized capabilities for analyzing, visualizing, and interpreting adaptive immune system data. The development of these resources and tools is driven by three key axes of research and development [8]. Each axis introduces new resources or tools designed to address specific research challenges, thereby expanding the overall functionality of the IMGT workflow.

Axis I of IMGT is dedicated to deciphering the adaptive immune response by analyzing the organization and structure of IG and TR loci across jawed vertebrates. This fundamental research involves identifying and characterizing IG and TR genes and alleles, which are crucial for understanding the genetic basis of immune diversity. A significant achievement of Axis I is the extraction of complete IG and TR loci from genome assemblies, facilitating a comprehensive understanding of the immune repertoire. The resulting data are used to update IMGT databases such as IMGT/LIGM-DB and IMGT/GENE-DB. Within this axis, data can be visualized using various web resources that compile annotated data and provide knowledge pages, such as those found in the IMGT Web Resources: IMGT Repertoire. Historically, these webpages were static and manually generated. Recently, several have been automated, and new dynamic webpages have been created, including IMGT/GeneTables, IMGT/ComparisonAssembly, IMGT/CDRLengths, and IMGT/MultipleGenomeViewer. As well, software has been created to ensure quality of the IG/TR loci in genome assemblies: IMGT/StatAssembly.

The data generated through Axis I are compiled into IMGT reference directories, which serve as essential resources for the subsequent axes of research and development. These directories provide standardized references that support the exploration of expressed IG and TR repertoires in Axis II and analysis of the structural aspects of adaptive immune proteins in Axis III.

Axis II focuses on investigating IG and TR gene diversity and expression patterns, which are essential for understanding adaptive immune responses in both physiological and pathological contexts. It provides indispensable tools for the analysis of expressed IG and TR sequences, enabling studies on V(D)J recombinations, somatic hypermutations, and clonal expansions based on nucleotide sequences using IMGT/V-QUEST and its high-throughput version IMGT/HighV-QUEST. Further analytical capabilities are offered by IMGT/StatClonotype and IMGT/JunctionAnalysis. IMGT/GeneFrequency has been enhanced and provides new features and functionalities for an intuitive way to explore the gene frequencies in different species. In addition, IMGT/V-QUEST has been enhanced with new features allowing the customization of the reference directory set.

The last research axis of IMGT, Axis III, focuses on analyzing the two-dimensional (2D) and three-dimensional (3D) structures of adaptive immune proteins, including IG and TR. By emphasizing the structural analysis of these proteins, Axis III enhances our understanding of the molecular mechanisms underlying immune responses. In addition to the existing tools in this axis, such as IMGT/DomainGapAlign and IMGT/Collier-de-Perles, IMGT has developed IMGT/RobustpMHC [9], a new machine learning-based tool to predict peptide–MHC [major histocompatibility (MH) complex] interactions, a crucial process for immune recognition.

The design and content of the IMGT databases, along with the tools and web resources, are the result of an extensive process of curation and standardization of immunogenetics knowledge, facilitated by IMGT-ONTOLOGY [3]. To bridge the gap between nucleotide and amino acid sequence databases, IMGT created the IMGT knowledge graph (IMGT-KG) [10], a comprehensive resource integrating the five IMGT databases and connecting to external related biomedical resources including Open Biological and Biomedical Ontology Foundry resources or National Cancer Institute Thesaurus (NCIt) [11]. Building on this foundation, IMGT/mAb-KG is the IMGT-KG for therapeutic monoclonal antibodies (mAbs), providing access to knowledge about the mAbs, their structures, their targets, the related clinical indications, and their mechanism of action [12].

In this article, we highlight the latest advancements in IMGT, developed through its three research and development axes, which collectively establish IMGT® as an indispensable system for studying the adaptive immune system, assisting in the design of therapeutic interventions, and paving the way for personalized medicine.

Axis I: dynamic web resources and quality assesment tools

IMGT/GeneTables

IMGT/GeneTables provides access to the repertoire of genes and alleles for a given species and gene type within a locus (IGHV, IGHD, etc.). For that, the IMGT/GeneTables separates the IMGT reference sequences from IMGT literature sequences [5]. The IMGT/GeneTables provides information about the alleles, including the functionality, the clone name, the rearranged or transcribed information, the accession number, and the sequence associated with the alleles. Additionally, bibliographic references and specific notes regarding alleles are included in this table. IMGT/GeneTables is part of IMGT Repertoire, the global immunogenetics web resources providing access to expert-curated data on the IG, TR, MH, and related proteins of the immune system [8]. Until recently, the gene table for each species and gene type was generated manually following locus annotation. This approach was time-consuming, prone to human error, and could quickly become outdated when new annotations or modifications were introduced.

Recently, the IMGT/GeneTables has been automated, and data from >100 gene types across >30 species are now available to the public. The data are extracted in real-time from IMGT databases, ensuring consistency and reducing errors. The design has been updated to include new information, such as the IMGT allele confirmation score, which reflects the number of literature-reported sequences available for a given allele. The data can be downloaded as an Excel file. In species such as mouse, it is also possible to identify alleles that are specific to a single strain. Figure 1 provides a visualization example of the locus IGHJ in Homo sapiens.

Figure 1.

Figure 1.

Visualization of IGHJ locus of Homo sapiens as of 22 May 2025, accessible from IMGT webpage.

IMGT/AssemblyComparison

IMGT offers a wide range of annotated assemblies across different species. As of 17 July 2025, there are, for instance, 12 assemblies annotated for humans, 11 for dogs, and 7 for mice. These different assemblies can reveal gene variations, such as copy number variations (CNVs) [13] or differences in gene functionalities. To facilitate their comparison, we developed IMGT/AssemblyComparison, a new dynamic web interface for exploring locus gene repertoires. IMGT/AssemblyComparison is an automated tool that, for a given species and locus, displays all genes by subgroup and functionality, based on IMGT-annotated and localized assemblies from IMGT/GENE-DB genomic localizations. The tool identifies shared and distinct genes and variations in allele functionality. Data are retrieved in real time from IMGT databases to generate the assembly comparison. In the output (Fig. 2), functional differences between alleles of the same gene are highlighted, along with shared genes and other detected variations. The percentage of genes per functionality is also displayed as a bar chart. Additional information is provided in a pop-up window and a summary table at the end of the page, which can be downloaded as a CSV file.

Figure 2.

Figure 2.

Visualization of the IGH locus assembly in H. sapiens as of 6 August 2025, accessible via the IMGT website (top). Details are shown for the IGHV1 subgroup of H. sapiens, including one functional gene (IGHV1-69) and two pseudogenes (IGHV1-67 and IGHV1-68), all conserved across IMGT-annotated and localized assemblies.

IMGT/CDRLengths

IG and TR variable genes are categorized in subgroups [14] based on the similarity of their coding region (V-REGION). The length of the complementarity-determining regions (CDR) of the variable genes is an important factor contributing to gene diversity. In recognition of this, we introduce IMGT/CDRLengths, dynamic web pages that visualize the diversity of CDR lengths within each subgroup [1] as well as differences among alleles. The data are also retrieved in real time from IMGT databases and can be downloaded as a CSV file (Fig. 3).

Figure 3.

Figure 3.

Visualization of CDR lengths for the IGH locus of H. sapiens as of 22 May 2025, accessible from IMGT webpage.

IMGT/MultipleGenomeViewer

Inspired by the Richardson et al. work [15], we implemented a new version of the Multiple Genome Viewer (MGV), IMGT/MultipleGenomeViewer (IMGT/MGV), based on IMGT/GENE-DB genomic localizations. In addition to displaying all IG/TR genes and loci on the chromosome for each chosen assembly, IMGT/MGV facilitates position retrieval, gene and allele comparison, functionality overview, and the comparison of assemblies within the same species (Fig. 4). Sequences can be downloaded in a FASTA file. As of March 5th, 2025, data for four species are available.

Figure 4.

Figure 4.

Visualization of the mHomSap3.mat (Homo sapiens) assembly on IMGT/MGV as of 5 March 2025 (top). The lower panel shows the list of IGHJ5 alleles across human assemblies, along with their position within each locus, accessible from IMGT webpage.

IMGT/StatAssembly

Description of the tool

Inspired by the work of Zhu et al. on the CloseRead tool [16], IMGT/StatAssembly [17] allows the analysis of an alignment file in BAM format. This file can be generated using the minimap2 tool [18] and should contain a CS, MD, or CIGAR tag =/X (match/mismatch distinction).

IMGT/StatAssembly takes as input the positions of the IG and/or TR loci, either extracted from the IMGT Locus Description (IMGT Repertoire) or identified through sequence similarity. The tool then analyses each IG/TR locus region provided and counts the number of reads based on their mapping score (Fig. 5). Regions covered by fewer than three primary reads are labeled as “break” positions. Secondary (secondary alignments are alignments of reads of the lowest quality, due to repeats in the assembly, as stated by minimap2 [18] article.) and supplementary (supplementary alignments are additional alignments of a read that cannot be represented as a single linear alignment, typically due to chimeric or split reads, as described in the Minimap2 [18] article.) alignments are displayed, along with the number of overlapping primary reads (i.e., reads covering the given position and its adjacent positions (Fig. 5). The number of overlapping primary reads should be close to, if not equal to, the total number of primary reads. A decrease may indicate regions where reads fail to overlap, which could signal potential assembly issues. High-quality assemblies are expected to show consistent coverage with a predominance of green (high-mapping quality) reads and minimal numbers of secondary, supplementary, or low-mapping-quality primary reads.

Figure 5.

Figure 5.

This figure shows the coverage of reads according to their mapping score on the IGH locus for the human T2T-CHM13v2.0 assembly. Upper: The number of primary reads with a mapping quality of 60 (green), between 1 and 59 (yellow), and 0 (red) is counted for each position on the locus. The number of secondary alignments for each position is shown in black, and the number of supplementary/chimeric alignments in blue. Overlapping reads are shown in yellow. Regions where secondary alignments exceed primary ones are the zones around the constant genes (left) and some V genes (right), respectively, called CNV7 and CNV3 [13]. Bottom: If the number of primary reads (in green, yellow, and red) goes below 3, the “break” region is shown by a red bar. None are observed in this assembly.

With IMGT/StatAssembly, users can also visualize the PHRED score (PHRED quality score is used as an indicator of base quality in DNA sequencing) (upper graph, right axis) and the mismatch ratio (number of primary reads matching the base divided by the number of primary reads aligning at this base) on the top graph (left axis) (Fig. 6). The mismatch rate for the entire read is displayed in the lower graph. A high-quality assembly should exhibit a high PHRED score (high-fidelity reads/HiFi) and a low mismatch rate. In this case, the average mismatch rate does not exceed 0.5%, with an average of <10% of mismatches at each position. The average PHRED score is 80.

Figure 6.

Figure 6.

Figure showing the PHRED score and mismatch rate per position (upper) and the mismatch rate over the whole read (lower) on the IGH locus for the human T2T-CHM13v2.0 assembly.

Finally, if gene positions are provided, the number of reads is displayed (marked 1), including matching reads (substitution without indels/marked 3) and identical reads (marked 2) for the precise gene positions, based on the reference sequence. By default, positions with fewer than 10 identical reads or an identical-to-total read ratio below 80% are flagged as warning positions and highlighted in yellow. Positions where this ratio drops below 60% are considered suspicious and marked in red. The black curve (marked 4) represents the number of reads that perfectly match (100% match) across this entire region (Fig. 7). A confident allele is defined as one with the highest number of reads that perfectly match the reference sequence relative to the total number of reads. The tolerance threshold can also be assessed: both sequence match and sequence identity should fall within 20% of the total number of reads, without triggering any warnings (unless fewer than 10 matching reads are present) or indicating suspicious positions.

Figure 7.

Figure 7.

Number of reads aligned over the IGHV3-6 gene V-REGION [NC_060938.1 complement (100326646..100326940)] for the human T2T-CHM13v2.0 assembly. The blue curve (1) represents all reads aligned at each position within the IGHV3-6 V-REGION. The green curve (2) shows reads matching the reference base, while the dark blue curve (3) indicates reads with substitutions at that position. The black curve (4) represents reads that perfectly match the entire IGHV3-6 V-REGION.

Using IMGT/StatAssembly, users can also visualize these results as CSV files. Regarding genes, a file with multiple columns is generated (see Supplementary data).

In conclusion, IMGT/StatAssembly provides an effective means to assess the quality of a genome assembly based on read data. User-defined thresholds can be set as parameters to customize the analysis. The tool also enables evaluation of allele confidence based on read support. By combining IMGT/StatAssembly with the established IMGT assembly quality rules, both the overall assembly quality and the reliability of the associated alleles can be systematically assessed. This procedure has been adopted and incorporated in the IMGT standardized biocuration workflow and the standardized IMGT nomenclature.

Comparaison with CloseRead

Although CloseRead (v1.0) produces assembly quality results, it does not evaluate the quality of reads for specific genes or alleles. In fact, CloseRead considers a read to be full-length only if there are no mutations across its entire length, excluding indels. However, due to the intrinsic error rate of sequencing reads, this strategy is often impractical. Interestingly, we calculated 9 reads that matched the entire IGHA1 C-GENE-UNIT at 100%, whereas CloseRead counted 25. For IGHV3-22, CloseRead reported 34 reads with a perfect match, while we counts 30. For IGHJ2, we found 41 matching reads, whereas CloseRead counted only 27. This discrepancy arises because the mismatch error rate (2.2 × 10−3) is ∼10 times higher than the mismatch rate (1.87 × 10−4) in this assembly.

Consequently, for short genes, IMGT/StatAssembly counts are generally higher, whereas for long genes, our counts are typically lower. Therefore, we consider it of paramount importance to recalculate and re-evaluate misalignment and mismatch rates. We define a read as perfectly mapped if it contains no mismatches or indels within the given gene region. This re-evaluation was made possible by modifying the library used for BAM file analysis, as the current libraries (hts-lib in Rust and pysam in Python) do not provide this functionality. The resulting library, named extended-htslib, is available under the MIT license. This represents a significant advantage of IMGT/StatAssembly over CloseRead, as it is essential for the accurate validation of new alleles by IMGT rules.

Based on the treatment speed, CloseRead pipeline does not use indexes and can therefore take several hours, depending on the size of the BAM files. In contrast, IMGT/StatAssembly uses indexing and is implemented in a compiled language, allowing it to perform analyses in seconds for assemblies with a lower mismatch rate.

If we consider the multiple mapping, Minimap2 [18] may encounter reads that can be mapped to multiple positions. In such cases, the primary alignment is typically assigned a lower mapping quality, as reflected in both tools. However, CloseRead discards the alternative alignments known as secondary or supplementary alignments. Since these can reveal potential assembly errors, we chose to display and count them. By default, they are shown only in the coverage graph (Fig. 5).

Finally, IMGT/StatAssembly generates a dedicated graph for each gene (Fig. 7) illustrating the number of matching and non-matching reads, along with the count of perfectly aligned reads. This representation offers a more intuitive and detailed visualization compared to CloseRead.

Perspectives

Our rules evaluate assemblies and alleles in IG/TR loci to ensure consistency, completeness, and accuracy of our databases. However, for some assemblies, the raw data may be sporadic or inaccessible. We therefore encourage a collective effort to make assembly sequencing and construction more transparent and standardized, so the scientific community can obtain the highest-quality assemblies possible. Hifiasm and Verkko are commonly used haplotype-resolved assemblers; Hifiasm is based on string graph, whereas Verkko and LJA [19] are based on Bruijn graph. Nevertheless, no new version of LJA has been released in over three years. Some consortia like Vertebrate Genomes Project [20] aim to achieve this goal and describe procedures for assembly construction. Complete and error-free genomes are particularly important for research and medicine, especially for IG/TR loci.

IMGT/StatAssembly provides information about assembly quality. However, several additional criteria must be taken into account and are part of IMGT assembly quality rules—such as the number of assemblies with the same gene order, the sequencing technology used, and the assembly method applied. Confidence in the accuracy of alleles and assemblies is higher when high-quality long reads (e.g. HiFi) [21], haplotype-resolved assembly methods [22], and complete genome assemblies such as telomere-to-telomere (e.g. T2T) assemblies [23] are used. Assemblies such as T2T-CHM13v2.0, which are derived from the haploid cell line CHM13htert, provide a complete haplotype but do not reflect human genomic diversity [24].

While huge progress has been made in generating complete assemblies [23], challenges remain in producing high-quality assemblies [25]. Moreover, some vertebrate species still have very few assemblies available. A balance must be achieved between assembly availability, sequencing methods, assembly construction, and haplotype resolution in order to accurately capture the full extent of genetic diversity in jawed vertebrates while minimizing false positives.

AXIS II—IMGT tools for the analysis of IG and TR repertoires

IMGT/GeneFrequency

IMGT/GeneFrequency is an interactive tool that dynamically evaluates, from IMGT resource [8], the usage of IG and TR variable (V), diversity (D), joining (J), and constant (C) genes expressed in rearranged complementary DNA (cDNA) sequences and for which the molecule recognized by the antigen receptor may be known. On one hand, this tool allows us to highlight how frequently a gene is used; on the other, for a given specificity, it shows which genes contribute to the synthesis of the molecules that recognize it.

The V, D, J, and C genes and alleles that code the IG and TR are managed in IMGT/GENE-DB [5] according to the concepts of Classification of IMGT-ONTOLOGY [3, 26, 27]. IG and TR cDNAs that have been submitted to public databases (GenBank, EMBL) are integrated and annotated in the IMGT nucleotide sequence database, IMGT/LIGM-DB [2]. The annotation includes the identification of the involved IG or TR genes and alleles, the identification of the keywords (concepts of identification [28]), the description of the features with IMGT labels (concepts of descriptions [29]), and their delimitation according to the IMGT unique numbering [30]. Annotation of cDNA rearranged sequences in IMGT/LIGM-DB can be done automatically for productive sequences with a classical organization by IMGT/Automat [31], or semi-manually by biocurators. In the latter case, the specificity is indicated when known.

The web interface of IMGT/GeneFrequency through IMGT-KG [10] query page is organized in two panels (Fig. 8). The right panel is dedicated to the selection of the genes for a given locus and displays the results. The left panel provides a multi-selection parameters, allowing users to select the species, the locus, the specificity, and gene type. The lower part of the left panel lists the available display options, with multiple choices options: bar plot, gene with the known position in the locus only, gene summary table, gene and specificity table, and specificity heatmap. By default, when loading the IMGT/GeneFrequency query page, the bar plot is displayed for the number of cDNA sequences expressing genes of the H. sapiens IGH locus, for the four gene types, regardless of the sequence specificity (Fig. 8).

Figure 8.

Figure 8.

Number of cDNA sequences assigned per gene in the IGH locus of H. sapiens for any specificity.

The bar plots are provided per species, locus, and specificity. On the x-axis, we have IMGT gene names ordered by their IMGT gene order in the genomic locus. However, non-localized genes are listed on the far left part of the bar plot. In order to focus on localized genes only, the user might select the option “Display only genes with known position in the selected locus”. On the y-axis, the number of annotated cDNA sequences in which the genes are expressed is displayed. Only genes expressed in cDNA are shown.

The bars for V, D, J, and C genes are colored in accordance with the IMGT Color Menu for genes. Note that the bar for a given gene can comprise two parts: a colored part for the number of sequences for the gene that was identified without ambiguity, and a gray part that indicates that the sequence could be assigned to other genes of the same type (for example due to somatic mutations for IG, extensive trimming, and/or partial sequences). Hovering the mouse on the colored or gray bar displays information about the gene name, gene type, the gene order, and the number of annotated sequences.

The results can be displayed in tables per species and IMGT group, in which the genes are sorted by their IMGT gene order. The gene summary table indicates the number of IMGT/LIGM-DB sequences in which it is expressed as well as the names of expressed related alleles and their corresponding numbers. The table may comprise two lines per gene: one for the number of sequences in which it is expressed unambiguously (single, for single gene) and one for sequences where several possibilities exist (several, for several genes). The gene and specificity table indicates for each gene pair the specificity, the number of sequences in which it is identified to be expressed non-ambiguously (single gene assigned seq), ambiguously (several genes assigned seq nb; white part), and the total.

Clicking on a gene name in tables displays in a new page the “Table of annotated IMGT/LIGM-DB cDNA rearranged sequences” [Accession number, Allele name, Sequence length, Sequence definitions, Specificity(ies)] provided by IMGT/GENE-DB for that gene. Direct links to IMGT/LIGM-DB entries allow the sequence annotations to be displayed according to the IMGT-ONTOLOGY and IMGT/ScientificChart standards.

Finally, the specificity heatmap evaluates the total number of cDNA (associated to a color intensity) per specificities and per genes. Bar plots and heatmaps can be downloaded in SVG or PNG format, whereas tables can be exported as CSV files.

IMGT/GeneFrequency results are dynamically computed from IMGT-KG and will be updated with the evolution of the content of IMGT-KG and the underlying databases IMGT/GENE-DB and IMGT/LIGM-DB regarding human and mouse species.

To summarize, IMGT/GeneFrequency based on IMGT-KG, provides an overview of the IG and TR gene expression through the annotated cDNA sequences of IMGT/LIGM-DB, which is enriched with the annotated nucleotide sequences from INSDC (GenBank or ENA). The tool will be available for other species if the number of cDNA sequences integrated in IMGT/LIGM-DB is evaluated as significant to represent the expression of the genes of a given locus. Following the same requirements (sufficient sequence representation), granularity of the results could also include the expression of genes at the allele level. In this current version, it should be noticed that IMGT/LIGM-DB does not include rearranged gDNA or cDNA sequence sets from SRA.

IMGT/V-QUEST: customize the reference directory set feature

IMGT/V-QUEST is the IMGT’s flagship bioinformatics tool, specifically designed for the accurate analysis of IG and TR rearranged nucleotide sequences [32]. IMGT/V-QUEST empowers academic researchers and clinicians by providing standardized, high-quality immunogenetic annotations of rearranged IG and TR sequences, enabling the study of expressed repertoires in both normal and pathological contexts. The tool aligns and compares the user-submitted sequences with the IMGT reference sequences of V, D, and J genes and alleles curated from the IG and TR loci for all supported species and strains.

The IMGT reference directory can be modified or customized based on allele functionality, the inclusion or exclusion of orphan gene sequences, and the number of alleles to be considered (all alleles or ‘01’ alleles only) (Fig. 9). The restriction of the IMGT reference directory for inbred animal models to sequences from a given strain (Arlee or Swanson for trout IGH and TRB, C57BL/6J strain for mouse IG and TR) has also been progressively introduced.

Figure 9.

Figure 9.

Evolution of reference directory management in IMGT/V-QUEST. The default reference directory set is derived from IMGT/GENE-DB and can be refined through species/strain and receptor type or locus selection, gene set options, and allele selection criteria. Newly introduced features (in dashed boxes) enable user-driven inclusion and exclusion of specific alleles, enhancing customization and flexibility.

Starting with version 3.7.0, as part of ongoing efforts to enhance the flexibility and user-centric functionality of IMGT/V-QUEST, we have introduced a major new feature that allows greater personalization of the IMGT/V-QUEST reference directory and analysis protocol. This feature, accessible through the “Advanced parameters,” offers users two important customization options of the IMGT/V-QUEST reference directory: (i) selective exclusion of allele reference sequences from the standard IMGT reference directory and (ii) inclusion of FASTA sequences representing potentially novel alleles provided by the user (Figs 9 and 10) [8, 33].

Figure 10.

Figure 10.

The IMGT/V-QUEST interface illustrating the steps to activate the “Customize the reference directory set” feature: (A) select the species and receptor type or locus; (B) enable the customization of the reference sequence directory; (C) exclude and/or include specific sequences from the IMGT/V-QUEST reference directory set.

This option allows users to control the reference pool by excluding specific sequences from the standard IMGT reference dataset. Exclusion can be based on gene type, IMGT gene name, IMGT allele name, IMGT functionality (e.g. F, ORF, pseudogene), IMGT partiality for incomplete V, D, or J-REGION, and—for human datasets—IMGT allele confirmation (evaluating the number of IMGT genomic sequences in which the allele was confirmed).

This option supports scenarios where a smaller or more focused reference dataset is required, such as removing pseudogenes or alleles that are not sufficiently confirmed for a particular study (Fig. 10C). Notably, to maintain consistency with IMGT reference standards, users cannot exclude all available alleles for a given chain or gene. In addition, users can now enter and integrate their own custom nucleotide sequences in FASTA format directly into the IMGT/V-QUEST analysis pipeline. This new option significantly enhances the tool’s usability, allowing users to explore the addition of potential new genes and/or alleles to the current IMGT reference database [5, 14].

Researchers working on highly specialized or individualized repertoires will benefit from the ability to analyze custom sequences within the trusted and familiar framework of the IMGT/V-QUEST. To ensure accurate interpretation, all submitted sequences must represent unrearranged germline configurations, fall within the expected length range for V and J gene types, and be free of truncations, partial regions, or indels when aligned with the IMGT unique numbering framework [14]. Furthermore, to ensure system integrity and optimal performance, a limit has been placed on the number of external reference sequences that can be submitted per analysis. This restriction is intended to balance end-user flexibility with computational efficiency while ensuring a consistent experience across different use cases.

Taken together, these enhancements represent a significant milestone in personalized immunogenetics research, providing users with greater flexibility while preserving the robustness of the IMGT/V-QUEST system.

To enhance reproducibility of the IMGT reference directory modification, IMGT/V-QUEST now provides functionality for saving the user-defined reference set configuration, including all the excluded alleles. Users can export their allele exclusion lists, enabling the exact reference customization to be saved, shared, or reloaded for future analyses. In addition, the output report explicitly documents modifications of the reference directory. This ensures full traceability of the analytical context and supports standardized interpretation and comparison of results across studies.

Users are advised to exercise caution when excluding sequences from the reference directory of IMGT/V-QUEST or including user-defined FASTA sequences. Changing the reference dataset either by exclusion or inclusion can significantly affect IMGT/V-QUEST analysis results, with potentially impacting gene assignment, alignment values, and downstream interpretation. It is the user’s responsibility to ensure that their custom reference set is biologically appropriate and relevant to their specific use case.

To ensure user privacy and data confidentiality, IMGT/V-QUEST does not store or retain any sequences uploaded by users. Submitted user-defined FASTA sequences are retained only for the duration of the session and are automatically removed upon completion of the analysis.

In conclusion, the “customize the reference directory set” feature of IMGT/V-QUEST provides the user with greater control and analytical flexibility, enabling customized analyses through the exclusion of specific alleles or the inclusion of user-provided sequences. This capability enhances reproducibility, transparency, and relevance across a wide range of research and clinical applications. The same functionality is planned for future integration in IMGT/HighV-QUEST.

AXIS III—IMGT tools for structural analysis of immunoproteins

Axis III provides tools to analyze structural immunoproteins, including 2D structures, 3D structures, and therapeutic mAbs. In this axis, we introduced IMGT/mAb-KG, a knowledge graph (KG) for exploring therapeutic mAbs [12], and IMGT/RobustpMHC, a robust machine learning tool trained for class-I MHC peptide binding prediction [9].

IMGT/mAb-KG is the IMGT-KG specialized in representing and describing therapeutic proteins like mAbs, their therapeutic use, and their related clinical indications. IMGT/mAb-KG provides access to over 139 629 triplets describing 1489 mAbs and related proteins, ∼500 targets, and over 500 clinical indications [12]. In addition, IMGT/mAb-KG describes mAbs associated with their mechanism of action, their construction, and the various clinical studies associated. Linked with IMGT-KG, IMGT/mAb-KG provides external links to other domain resources including Thera-SAbDab (Therapeutic Structural Antibody Database) [34], PharmGKB (comprehensive resource curating knowledge on the impact of genetic variation on drug response) [35], PubMed, and HGN C (HUGO Gene Nomenclature Committee), positioning IMGT/mAb-KG as an essential resource for mAb engineering. To access IMGT/mAb-KG, users have two options: a SPARQL query interface for advanced users and an exploration interface for intuitive browsing of mAb-related knowledge, including targets, clinical indications, and mechanisms of action, presented through graph visualization. All the results can be downloaded as image files for visualization or as CSV files for SPARQL queries.

The accurate prediction of peptide–MHC class I and II binding probabilities is a critical endeavor in immunoinformatics, with broad implications for vaccine development and immunotherapies. To address this, we developed IMGT/RobustpMHC that harnesses the potential of unlabeled data in improving the robustness of peptide–MHC binding predictions through a self-supervised learning strategy. The tool is available at https://www.imgt.org/RobustpMHC/ and predicts peptide–MHC I and II binding. On each prediction page, the user can upload sequences in a fasta file or select existing HLA sequences as examples and subsequently execute the predictions. The results consist of a list of HLA and peptide with prediction values, which can be downloaded as a CSV.

IMGT-KG: a knowledge graph for immunogenetics data

KGs have emerged as one of the most effective methods for data or knowledge integration and federation, gaining widespread acceptance in both academic and industrial contexts [36]. In immunogenetics, the complex and interconnected nature of the field, combined with the need for sophisticated queries to address complex or integrative biomedical research questions, has rapidly accelerated the adoption of knowledge graph technologies [37].

To unify and connect various IMGT resources, we developed IMGT-KG, the first findable, accessible, interoperable, and reusable (FAIR) knowledge graph (KG) in immunogenetics, providing access to structured and enriched immunogenetic data [10]. IMGT-KG bridges the gap between nucleotide and protein sequences of IMGT databases. For that, IMGT-KG acquires data from IMGT databases, then represents, describes, and structures immunogenetics entities and their interrelationships in a KG using semantic web standards and technologies. In addition, IMGT-KG is connected to external resources in the same domain, such as the Relation Ontology [38], Feature Annotation Location Description Ontology [39], NCIt [11], and Sequence Ontology [40], and other IMGT-KG opens the way for effective queries and integrative immuno-omics analyses over >100 million triplets. Users can access these two IMGT-KGs using the SPARQL language through https://imgt.org/imgt-kg/.

Conclusion

To decipher the complex mechanisms behind the adaptive immune responses and the associated diversity of the involved receptors, we assist in a massive generation of immunogenetics data at different levels (genomics, transcriptomics, proteomics, …), thus plunging the immunogenetics domain into the era of Big Data [1, 41, 42]. To manage, analyze, and interpret this rich immunogenetics data, IMGT has set up various databases, tools, and resources from the nucleotide sequences to protein sequences for analyzing, visualizing, and interpreting these data through three axes [8].

In response to the growing and evolving needs of the immunogenetics research community, IMGT has enhanced its existing tools and resources and developed new ones, either starting from scratch or by collaborating with the scientific community to build upon existing tools. In fact, in Axis I, new user-friendly, dynamic gene tables have replaced time-consuming and error-prone gene tables and CDR length tables. New tools for analyzing assemblies, such as IMGT/AssemblyComparison and IMGT/MGV, have been created. Axis I also implements a set of rules and a new tool: IMGT/StatAssembly, which validates the quality of loci and alleles in the context of the large-scale generation of immunogenetics data.

In Axis II, we introduced a new generation of IMGT/GeneFrequency, with a modern interface and new features and functionalities. The new version IMGT/GeneFrequency is based on the IMGT-KG [10] and introduces different visualizations (barplot and heatmap) and tables (gene summary, gene, and specificity). We also enhanced IMGT/V-QUEST by introducing the possibility to customize the reference directory set using the gene type, the gene and the allele name, the allele functionality, and the allele partiality. This new feature also allows the addition of the user sequence in the IMGT/V-QUEST analysis pipeline, thus enhancing the result for new genes and alleles.

In Axis III, we introduced IMGT/RobustpMHC [9], a new robust machine learning tool for peptide–MHC prediction. Based on self-supervising learning strategy, IMGT/RobustpMHC aims to boost the development of immunotherapies and vaccines.

To enhance IMGT data accessibility and integrative immunogenetics analyses, IMGT has developed IMGT-KG [10], the first FAIR KG in the immunogenetics field, bridging the gap between nucleotide and protein sequences of IMGT database while also connecting IMGT resources to external knowledge in the biomedical domain. In addition to IMGT-KG, we recently introduced a specialized KG dedicated to the therapeutic monoclonal antibodies: IMGT/mAb-KG. Built on semantic web technologies, both IMGT-KG and IMGT/mAb-KG provide facilities to access IMGT immunogenetics data through an efficient and powerful query language (SPARQL) and an intuitive visual query interface in the case of IMGT/mAb-KG. Leveraging semantic web principles, IMGT-KG and IMGT/mAb-KG are pivotal in standardizing and publishing immunogenetics data on the web, thereby positioning IMGT and its associated axes within the global Linked Open Data framework [43].

Supplementary Material

gkaf1024_Supplemental_File

Acknowledgements

We would like to thank Anthony Boureux for their insightful discussions on k-mers. We are also grateful to the members of the China National Center for Bioinformation for their valuable input. We thank Yixin Zhu and Anton Bankevich for their helpful discussions on the CloseRead software. We acknowledge Hervé Seitz’s team at Montpellier MGX GenomiX and Christophe Klopp for valuable exchanges regarding sequencing and genome assembly reconstruction. We are also thankful to Anna Tran at MéLiS (UCBL)/Hospices Civils de Lyon for her initial work and to Joel Richardson and the team from MGI for sharing the MGV code with us. We are grateful to Maël Rollin for his initial developments for the customization of the reference directory in IMGT/V-QUEST. Finally, we thank all members of the IMGT® team for their expertise and constant motivation. IMGT® is a member of the French Infrastructure Institut Français de Bioinformatique, as well as of BioCampus, MAbImprove, and IBiSA.

Author contributions: Gaoussou Sanou (Data curation [equal], Software [equal], Visualization [equal], Writing—original draft [equal], Writing—review & editing [equal]), Guilhem Zeitoun (Data curation [equal], Software [equal], Visualization [equal], Writing—original draft [equal], Writing—review & editing [equal]), Taciana Manso (Data curation [equal], Writing—review & editing [equal]), Milad Eidi (Software [equal], Visualization [equal], Writing—original draft [equal]), Shamsa Batool (Data curation [equal], Writing—review & editing [equal]), Anjana Kushwaha (Data curation [equal], Software [equal], Visualization [equal], Writing—review & editing [equal]), François Grand (Software [equal], Writing—review & editing [equal]), Myriam Croze (Data curation [equal], Writing—review & editing [equal]), Axel Vaillant (Software [equal], Visualization [equal], Writing—review & editing [equal]), Chahrazed Debbagh (Data curation [equal], Writing—review & editing [equal]), Nika Abdollahi (Data curation [equal], Writing—review & editing [equal]), Maria Georga (Data curation [equal], Writing—review & editing [equal]), Ariadni Papadaki (Data curation [equal], Writing—review & editing [equal]), Ifigeneia Sideri (Data curation [equal], Writing—review & editing [equal]), Turkan Samadova (Data curation [equal], Writing—review & editing [equal]), Joumana Jabado-Michaloud (Data curation [equal], Writing—review & editing [equal]), Géraldine Folch (Data curation [equal], Writing—review & editing [equal]), Véronique Giudicelli (Software [equal], Supervision [equal], Visualization [equal], Writing—original draft [equal], Writing—review & editing [equal]), Patrice Duroux (Software [equal], Supervision [equal], Visualization [equal], Writing—review & editing [equal]), and Sofia Kossida (Funding acquisition [lead], Supervision [equal], Writing—original draft [equal], Writing—review & editing [lead]).

Contributor Information

Gaoussou Sanou, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Guilhem Zeitoun, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Taciana Manso, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Milad Eidi, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Shamsa Batool, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Anjana Kushwaha, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

François Grand, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Myriam Croze, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Axel Vaillant, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Chahrazed Debbagh, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Nika Abdollahi, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Maria Georga, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Ariadni Papadaki, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Ifigeneia Sideri, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Turkan Samadova, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Joumana Jabado-Michaloud, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Géraldine Folch, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Véronique Giudicelli, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Patrice Duroux, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France.

Sofia Kossida, IMGT®, the international ImMunoGeneTics Information System®, Institute of Human Genetics, University of Montpellier, Scientific Research National Center (CNRS), Montpellier, France; Institut Universitaire de France (IUF), Paris, France.

Supplementary data

Supplementary data is available at NAR online.

Conflict of interest

None declared.

Funding

Scientific Research National Center (CNRS); University of Montpellier, France; Institut Universitaire de France (IUF), Paris, France. We acknowledge the support of Immun4Cure University Hospital Institute “Institute for innovative immunotherapies in autoimmune diseases” (France 2030 / ANR-23-IHUA-0009) and Institut du développement et des ressources en informatique scientifique (IDRIS) under the allocation 036029 (2010–2025) made by GENCI (GrandEquipement National de Calcul Intensif). Funding to pay the Open Access publication charges for this article was provided by Institut Universitaire de France (IUF).

Data availability

IMGT® is freely available online for academics and non-profit use at https://www.imgt.org/. All the resources referred to in this article are accessible from IMGT® webpages. IMGT/StatAssembly source code and documentation are available on GitLab (https://src.koda.cnrs.fr/imgt-igh/statassembly). Extended-htslib is available on an MIT-licensed crate at https://crates.io/crates/extended-htslib.

References

  • 1. Lefranc MP, Giudicelli V, Duroux Pet al.. IMGT®, the international ImMunoGeneTics information system̥® 25 years on. Nucleic Acids Res. 2015; 43:D413–22. 10.1093/nar/gku1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Giudicelli V IMGT/LIGM-DB, the IMGT® comprehensive database of immunoglobulin and T cell receptor nucleotide sequences. Nucleic Acids Res. 2006; 34:D781–4. 10.1093/nar/gkj088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Giudicelli V, Lefranc MPP. IMGT-ONTOLOGY 2012. Front Genet. 2012; 3:1–16. 10.3389/fgene.2012.00079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Lefranc MP IMGT, the international ImMunoGeneTics database®. Nucleic Acids Res. 2003; 31:307–10. 10.1093/nar/gkg085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Giudicelli V, Chaume D, Lefranc MP. IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res. 2005; 33:256–61. 10.1093/nar/gki010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Kaas Q, Ruiz M, Lefranc MP. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data. Nucleic Acids Res. 2004; 32:D208–10. 10.1093/nar/gkh042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Manso T, Kushwaha A, Abdollahi Net al.. Mechanisms of action of monoclonal antibodies in oncology integrated in IMGT/mAb-DB. Front immunol. 2023; 14:1129323. 10.3389/fimmu.2023.1129323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Manso T, Folch G, Giudicelli Vet al.. IMGT® databases, related tools and web resources through three main axes of research and development. Nucleic Acids Res. 2022; 50:D1262–72. 10.1093/nar/gkab1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Kushwaha A, Duroux P, Giudicelli Vet al.. IMGT/RobustpMHC: robust training for class-I MHC peptide binding prediction. Brief Bioinform. 2024; 25:bbae552. 10.1093/BIB/BBAE552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Sanou G, Giudicelli V, Abdollahi Net al.. IMGT-KG: a knowledge graph for immunogenetics. Lect Notes Comput Sci. 2022; 13489:628–42. 10.1007/978-3-031-19433-7_36. [DOI] [Google Scholar]
  • 11. Golbeck J, Fragoso G, Hartel Fet al.. The National Cancer Institute’s Thesaurus and Ontology. J Web Semantics. 2003; 1:75–80. 10.2139/ssrn.3199007. [DOI] [Google Scholar]
  • 12. Sanou G, Manso T, Todorov Ket al.. IMGT/mAb-KG: the knowledge graph for therapeutic monoclonal antibodies. Front Immunol. 2024; 15:1393839. 10.3389/FIMMU.2024.1393839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Lefranc MP, Lefranc G. IMGT®Homo sapiens IG and TR loci, gene order, CNV and haplotypes: new concepts as a paradigm for jawed vertebrates genome assemblies. Biomolecules. 2022; 12:381. 10.3390/BIOM12030381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Lefranc MP Immunoglobulins: 25 Years of immunoinformatics and IMGT-ONTOLOGY. Biomolecules. 2014; 4:1102–39. 10.3390/biom4041102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Richardson JE, Baldarelli RM, Bult CJ. Multiple genome viewer (MGV): a new tool for visualization and comparison of multiple annotated genomes. Mamm Genome. 2022; 33:44–54. 10.1007/S00335-021-09904-1,. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Zhu Y, Watson C, Safonova Yet al.. CloseRead: a tool for assessing assembly errors in immunoglobulin loci applied to vertebrate long-read genome assemblies. Genome Biol. 2025; 26:1–23. 10.1186/S13059-025-03594-7/FIGURES/7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. ZEITOUN G, DEBBAGH C, Georga Met al.. IMGT/StatAssembly. Zenodo. 2025; 10.5281/ZENODO.15396812. [DOI]
  • 18. Li H New strategies to improve minimap2 alignment accuracy. Bioinformatics. 2021; 37:4572–4. 10.1093/bioinformatics/btab705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Bankevich A, Bzikadze AV, Kolmogorov Met al.. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nat Biotechnol. 2022; 40:1075–81. 10.1038/S41587-022-01220-6;SUBJMETA=114,212,2302,2785,61,631;KWRD=GENOME+ASSEMBLY+ALGORITHMS. [DOI] [PubMed] [Google Scholar]
  • 20. Rhie A, McCarthy SA, Fedrigo Oet al.. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021; 592:737. 10.1038/S41586-021-03451-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Hon T, Mars K, Young Get al.. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 2020; 7:399. 10.1038/s41597-020-00743-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Cheng H, Asri M, Lucas Jet al.. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods. 2024; 21:967–70. 10.1038/s41592-024-02269-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet. 2024; 25:658–70. 10.1038/s41576-024-00718-w. [DOI] [PubMed] [Google Scholar]
  • 24. Nurk S, Koren S, Rhie Aet al.. The complete sequence of a human genome. Science. 2022; 376:44–53. 10.1126/SCIENCE.ABJ6987,. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Espinosa E, Bautista R, Larrosa Ret al.. Advancements in long-read genome sequencing technologies and algorithms. Genomics. 2024; 116:110842. 10.1016/J.YGENO.2024.110842. [DOI] [PubMed] [Google Scholar]
  • 26. Duroux P, Kaas Q, Brochet Xet al.. IMGT-Kaleidoscope, the formal IMGT-ONTOLOGY paradigm. Biochimie. 2008; 90:570–83. 10.1016/j.biochi.2007.09.003. [DOI] [PubMed] [Google Scholar]
  • 27. Lefranc MP From IMGT-ONTOLOGY CLASSIFICATION axiom to IMGT standardized gene and allele nomenclature: for immunoglobulins (IG) and T cell receptors (TR). Cold Spring Harb Protoc. 2011; 6:627–32. 10.1101/pdb.ip84. [DOI] [PubMed] [Google Scholar]
  • 28. Lefranc MP From IMGT-ONTOLOGY IDENTIFICATION axiom to IMGT standardized keywords: for immunoglobulins (IG), T cell receptors (TR), and conventional genes. Cold Spring HarbProtoc. 2011; 2011:604–13. 10.1101/PDB.IP82. [DOI] [PubMed] [Google Scholar]
  • 29. Lefranc MP From IMGT-ONTOLOGY DESCRIPTION axiom to IMGT standardized labels: for immunoglobulin (IG) and T cell receptor (TR) sequences and structures. Cold Spring Harb Protoc. 2011; 6:614–26. 10.1101/PDB.IP83,. [DOI] [PubMed] [Google Scholar]
  • 30. Lefranc MP IMGT unique numbering for the variable (V), constant (C), and groove (G) domains of IG, TR, MH, IgSF, and MhSF. Cold Spring Harbor Protocols. 2011; 6:633–42. 10.1101/PDB.IP85,. [DOI] [PubMed] [Google Scholar]
  • 31. Giudicelli V, Chaume D, Jabado-Michaloud J, Lefranc MP. Immunogenetics sequence annotation: the strategy of IMGT based on IMGT-ONTOLOGY. Stud Health Technol Inform. 2005; 116:3–8. [PubMed] [Google Scholar]
  • 32. Giudicelli V, Duroux P, Rollin Met al.. IMGT®immunoinformatics tools for standardized V-DOMAIN analysis. Methods in Molecular Biology. 2022; 2453:New York, NY, United States: Humana Press Inc; 477–531. 10.1007/978-1-0716-2115-8_24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Brochet X, Lefranc MP, Giudicelli V. IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis. Nucleic Acids Res. 2008; 36:503–8. 10.1093/nar/gkn316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Raybould MIJ, Marks C, Lewis APet al.. Thera-SAbDab: The Therapeutic Structural Antibody Database. Nucleic Acids Res. 2020; 48:D383–8. 10.1093/nar/gkz827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Gong L, Whirl-Carrillo M, Klein TE. PharmGKB, an integrated resource of Pharmacogenomic Knowledge. Curr Protocol. 2021; 1:1–31. 10.1002/cpz1.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Tiwari S, Al-Aswadi FN, Gaurav D. Recent trends in knowledge graphs: theory and practice. Soft Comput. 2021; 25:8337–55. 10.1007/s00500-021-05756-8. [DOI] [Google Scholar]
  • 37. Chen H, Yu T, Chen JY. Semantic Web meets Integrative Biology: a survey. Brief Bioinform. 2013; 14:109–25. 10.1093/bib/bbs014. [DOI] [PubMed] [Google Scholar]
  • 38. Guardia GDA, Vêncio RZN, de Farias CRG. A UML profile for the OBO relation ontology. BMC Genomics. 2012; 13:S3. 10.1186/1471-2164-13-S5-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Bolleman JT, Mungall CJ, Strozzi Fet al.. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. J Biom Semant. 2016; 7:39. 10.1186/s13326-016-0067-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Mungall CJ, Batchelor C, Eilbeck K. Evolution of the Sequence Ontology terms and relationships. J Biom Inf. 2011; 44:87–93. 10.1016/j.jbi.2010.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Lefranc MP, Lefranc G. IMGT® and 30 years of immunoinformatics insight in antibody V and C domain structure and function. Antibodies. 2019; 8:29. 10.3390/antib8020029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Giudicelli V, Lefranc MP. Ontology for immunogenetics: the IMGT-ONTOLOGY. Bioinformatics. 1999; 15:1047–54. [DOI] [PubMed] [Google Scholar]
  • 43. Bizer C, Heath T, Berners-Lee T. Linked data—the story so far. Int J Semantic Web Inf Syst. 2009; 5:1–22. 10.4018/jswis.2009081901. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. ZEITOUN G, DEBBAGH C, Georga Met al.. IMGT/StatAssembly. Zenodo. 2025; 10.5281/ZENODO.15396812. [DOI]

Supplementary Materials

gkaf1024_Supplemental_File

Data Availability Statement

IMGT® is freely available online for academics and non-profit use at https://www.imgt.org/. All the resources referred to in this article are accessible from IMGT® webpages. IMGT/StatAssembly source code and documentation are available on GitLab (https://src.koda.cnrs.fr/imgt-igh/statassembly). Extended-htslib is available on an MIT-licensed crate at https://crates.io/crates/extended-htslib.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES