Abstract
iDog (https://ngdc.cncb.ac.cn/idog/) is a comprehensive public resource for domestic dogs (Canis lupus familiaris) and wild canids, designed to integrate multi-omics data and provide data services for the worldwide canine research community. Notably, iDog 2.0 features a 15-fold increase in genomic samples, including 29.55 million single nucleotide polymorphisms (SNPs) and 16.54 million insertions/deletions (InDels) from 1929 modern samples and 29.09 million SNPs from 111 ancient Canis samples. Additionally, 43487 breed-specific SNPs and 530 disease/trait-associated variants have been identified and integrated. The platform also includes data from 141 BioProjects involving gene expression analyses and a single-cell transcriptome module containing data from 105 057 Beagle hippocampus cells. iDog 2.0 also includes an epignome module that evaluates DNA methylation patterns across 547 samples and chromatin accessibility across 87 samples for the analysis of gene expression regulation. Additionally, it provies phenotypic data for 897 dog diseases, 3207 genotype-to-phenotype (G2P) pairs, and 349 dog disease-associated genes, along with two newly constructed ontologies for breed and disease standardization. Finally, 13 new analytical tools have been added. Given these enhancements, the updated iDog 2.0 is an invaluable resource for the global cannie research community.
Graphical Abstract
Graphical Abstract.
Introduction
iDog (https://ngdc.cncb.ac.cn/idog/) is an integrated data resource dedicated to providing publicly accessible data services and analytical tools for the scientific research of domestic dogs (Canis lupus familiaris) and wild canids (1). Launched in 2019, iDog has become a widely used resource for studying genetic variants and gene expression in canine functional genomics (2–4). It has played an important role in facilitating online data mining (5,6), contributing to a diverse range of research areas in canine science, including but not limited to canine diseases and population genetics (7,8). Growing interest in canine research has driven the accumulation of extensive omics data including genomic, transcriptomic, and epigenomic profiles. This expanding dataset serves as a valuable foundation for future investigations, particularly those focusing on domestication processes (9), behavioral patterns, morphological development (10), and disease susceptibility (11). To further promote the use of canine data resources and enhance online data analysis functionalities, iDog has undergone a substantial update based on CanFam4 (12), including comprehensive analysis and integration of existing omics datasets and an extensive collection of diverse online tools. Here we introduce iDog 2.0 and briefly summarize its data expansion and improved functionalities. Compare to other canine-related resources, such as GVM (13), IAnimal (14) and CanISO (15), iDog 2.0 offers more comprehensive data integration—including modules for ancient DNA, disease-associated variants, single-cell transcriptomes, and phenotypes—along with a wider selection of commonly used online tools, such as Fst and Pi.
Materials and methods
Data collection
High-quality genomic variation data, including Variant Call Format (VCF) files for 1929 samples, Zoonomia phyloP score files, and imputation reference panel files, were downloaded from the international Dog10K project (16). Raw sequence data of genomes, transcriptomes, and epigenomes were obtained from the Sequence Read Archive (SRA) (17), Genome Sequence Archive (GSA) (18,19), and European Nucleotide Archive (ENA) (20). The CanFam4 genome served as a reference for omics data analysis, with annotation files downloaded from Ensembl (21). Newly identified genotype-to-phenotypes (G2P) associations were manually curated from published genome wide association studies (GWAS). Disease- and trait-associated variants were extracted from OMIA (22), while additional disease information, associated genes, and relevant literature were integrated from both OMIA and BioKA (23).
Data processing
For the genomic data, the VCF file of 1929 samples were annotated using the standardized Ensembl Variant Effect Predictor (VEP) tool (24). Single nucleotide polymorphisms (SNPs) with an allele frequency ≥ 80% in one breed and ≤ 20% across other breeds were identified as breed-specific SNPs. For ancient canid genomic data, raw reads were mapped to the CanFam4 reference genome using BWA-MEM (v0.7.17-r1188) (25) and GATK (v4.8.1) (26). Subsequently, variants in ancient DNA were imputed using GLIMPSE2 (27) with the Dog10K phased VCF files serving as the reference panel (Supplementary Figure S1, Supplementary Figure S2). For transcriptomic analysis, transcript abundance was quantified using Kallisto (v 0.46.0) (28), and raw reads were aligned to the genome using STAR (v2.7.3a) (29). Transcripts Per Million (TPM) values were calculated for each gene via RSEM (v5.32.1) (30), and then normalized by log2 (TPM + 1) (Supplementary Figure S3). For DNA methylation analysis, raw reads were mapped to the CanFam4 genome and polymerase chain reaction (PCR) duplications were removed using Bismark (v0.23.1) (31). After methylation calling for each cytosine, a methylation call report was produced using Bismark (Supplementary Figure S4A). For ATAC analysis, clean reads were aligned to the CanFam4 genome using Bowtie2 (v2.5.3) (32), then sorted and deduplicated using SAMtools (33) (v1.15.1). After merging the biological replicates, peaks were called using MACS2 (v2.2.4) (34) (Supplementary Figure S4B). Homologous genes in the CanFam4 genome were identified via eggNOG-mapper (35), along with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation. Genomic positions of variants from OMIA were converted to the CanFam4 genome using the UCSC LiftOver tool (36). Dog Breed Ontology (DBO) was constructed based on the breeds integrated in iDog, while Dog Disease Trait Ontology (DDTO) was developed using Disease Ontology (https://disease-ontology.org/) (37) as a reference, integrating dog-specific diseases and traits from iDog.
Database contents
Data summary
In recent years, iDog has been notably updated in both data content and volume (Table 1), covering gene information, gene annotations, metadata, ontology standards, genomes, genomic variation, gene expression profiles, single-cell data, epigenomic profiles, phenotypic data and analytical tools (Figure 1). Compared to its initial version, iDog 2.0 features 44 417 KEGG annotations for 9538 genes and provides 5 829 228 homologous protein pairs across 3756 species. Additionally, metadata have been collected from 176 BioProjects, 5625 BioSamples, and 8309 disease-associated publications. Two new ontologies have been developed, including DBO with 455 terms and DDTO with 1536 terms, to standardize breed and disease names. Multi-omics data have also expanded to include four genomes, 46 085 484 genomic variants, 29 089 701 imputed ancient SNPs, 43 487 breed-specific SNPs, and 530 disease/trait-associated variants. Additionally, the database now incorporates data from 141 RNA sequencing (RNA-seq) projects, 105 057 single-cell transcriptomes from 26 cell clusters, 547 DNA methylation samples, 87 ATAC-seq samples. Moreover, iDog 2.0 features curated and updated 897 dog diseases, 3207 G2P pairs and 349 disease-associated genes related to canine phenotypes. In addition, 13 online tools have been provided to facilitate data exploration and analysis. The organization and presentation of data have also been refined to improve usability and accessibility for researchers.
Table 1.
Statistics and comparison between iDog version 1.0 and version 2.0
| Data item | Version 2.0 | Version 1.0 (NAR 2019) | |
|---|---|---|---|
| Gene, annotation and meta-data | Genes | 306 53 (CanFam4) 32 220 (CanFam3.1) |
32 220 (CanFam3.1) |
| Dog disease-associated genes | 349 | 229 | |
| KEGG annotations | 44 417 annotations for 9538 genes | NA | |
| Homologous genes | 5 829 228 homologous proteins for 3756 species | 177 551 homologous genes for 10 species | |
| Literature | 8309 | 6535 | |
| BioProjects | 176 | NA | |
| BioSamples | 5625 | NA | |
| Ontology Standards | Dog breed ontology | 456 terms | NA |
| Dog disease ontology | 1536 terms | NA | |
| Multi-omics data | Genomes | CanFam3.1 | NA |
| CanFam4 | NA | ||
| Genome variations | 29 546 066 SNPs and 16 539 418 InDels from 1929 individuals | 42 871 184 SNPs from 127 individuals | |
| 29 089 701 imputed ancient SNPs | NA | ||
| 43 487 breed-specific variants | NA | ||
| 27 579 124 variants with phyloP score | NA | ||
| 530 diseases/traits variations | NA | ||
| Expression profiles | 141 BioProjects, 84 tissues, 2947 experiments | 7 BioProjects, 5 tissues, 83 experiments | |
| Heatmap of genes expressed in 84 normal tissues, gene specificity calculations | NA | ||
| 29 615 gene expressed in 26 cell lines | NA | ||
| 29 656 gene expressed in 43 breeds | NA | ||
| Volcano and relationship maps of DEGs* in 31 dog diseases | NA | ||
| Single-cell transcriptomes | 105 057 cells, 3,49 featured genes in the hippocampus | NA | |
| DNA methylation profiles | CpG methylation levels for 547 samples | NA | |
| 8 682 824 peaks from 87 samples | NA | ||
| Phenotypes & Diseases | 482 standard breeds | 473 standard breeds | |
| 897 dog diseases | 783 dog diseases | ||
| 3207 G2P pairs | 594 G2P pairs | ||
| Tools | Artificial intelligence tools | Intelligent search1 | NA |
| Dog breed image classification2 | NA | ||
| Genomic tools | Genomic coordinates conversion3 | NA | |
| Variation tools | Haplotype analysis4 | NA | |
| Fst, Pi and Tajima's D | NA | ||
| Expression tools | Expression profiling5 | NA | |
| log2 (value + 1) normalization6 | NA | ||
| Z-score calculation7 | NA | ||
| Functional enrichment tools | GO enrichment | NA | |
| KEGG enrichment | NA | ||
DEGs, differential expressed genes; In Tools module, 1 indicates DogRAG, 2 indicates Dog Visual Classification, 3 indicates Assembly Converter, 4 indicates Haplotype Block, 5 indicates ExpPattern, 6 indicates ExpNormalization (value can be COUNT/TPM/FPKM), and 7 indicates ExpStandardization.
Figure 1.
Overview of data content and organization in iDog 2.0.
Genomes, genes and annotations
The Genome module provides comprehensive genome and gene information for canids and includes two additional reference genomes: CanFam4 and CanFam3.1, comprising 30 653 and 32 220 genes from Ensembl (21), respectively. Six new annotated sections have been added to the gene detail pages, including disease-associated variant data, KEGG pathway, gene differential expression profiles, single-cell transcriptomic markers, CpG methylation landscapes across various tissues, and chromatin accessibility peaks. Gene annotation statistics are available from the gene table, with directly links to specific details.
Variation
The Variome module provides comprehensive information on canine genomic variations. A collection of 29 546 066 genomic SNPs and 16 539 418 InDels have been obtained from 1929 samples within the Dog10K project. An additional 29 089 701 imputed ancient SNPs from 111 ancient samples have been analyzed and incorporated. The geographic locations of these ancient samples can be visualized on a world map, alongside a detailed table containing imputed genotypes for each sample, which can be filtered by location or calibrated BP date (Figure 2A). Newly identified breed-specific genomic variations have been added, including 43 487 breed-specific SNPs annotated for 145 breeds, along with a detailed table that contains the position, genotype, calculated allele frequency, and annotated variation information for each specific breed (Figure 2B). To assess sequence conservation, Zoonomia phyloP constraint scores for 27 579 124 SNPs have been obtained, and displayed in a table corresponding to the searched genomic regions, visualized using JBrowse2 (38). Furthermore, 530 dog disease- and trait-associated variants have been newly collected from OMIA (36). All variation data files are available for download via FTP.
Figure 2.
Screensshots of iDog 2.0. (A) Worldwide distribution map for 111 ancient samples, alongside a detailed table containing geographic location, genotype and calibrated BP date. (B) Pie chart illustrating statistics of breed-specific SNPs for 145 breeds, alongside a detailed table containing position, genotype, calculated allele frequency, and annotated variation information. (C) Gene expression heatmap of various tissues, alongside a detailed table showing gene expression levels for each specific tissue. (D) Volcano plots and relationship maps visualizing differential gene expression, accompanied by a box plot illustrating gene expression between cases and controls. (E) Box plot showing CpG rates across genes, promoters, and 5′ and 3′UTRs, accompanied by a detailed table showing comprehensive CpG methylation levels for each specific gene region. (F) For chromatin accessibility, genome browser displays peaks, accompanied by a detailed table of peak information. (G) Ontology definition of a specific disease term featured in DDTO. (H) Screenshot of haplotype analysis results and linkage disequilibrium (LD) heatmap of genomic region encompassing chr1: 389.50–397.59 kb.
Gene expression
The Transcriptome module presents a comprehensive overview of the transcript landscape, aggregating data from 141 BioProjects and 2947 BioSamples. iDog 2.0 has identified 29849 genes expressed in 84 tissues, 29 656 genes expressed in 43 breeds, and 29 615 genes expressed in 26 cell lines. Notably, 28 145 differentially expressed genes have been identified across 31 diseases, including 20 types of cancer and 11 other diseases. The module provides a heatmap displaying gene expression levels across various tissues, as well as a detailed table showing gene expression levels (e.g. mean and median TPM values) across normal tissues, cell lines, and breeds (Figure 2C). Gene specificity is also calculated using the tspex algorithm (https://github.com/apcamargo/tspex/) and provided for each gene in the normal tissue section. In the disease section, general disease information and differentially expressed genes between cases and controls are available. These data are visually represented through volcano maps, relationship maps, and box plots (Figure 2D). Additionally, a detailed table featuring average and median gene expression levels, fold changes, and false discovery rates (FDR) is provided, with direct links to Gene Ontology (GO) and KEGG analysis tools. All gene expression data can be downloaded from the FTP server for further analysis.
Single-cell expression
The newly integrated Single-Cell module houses transcriptomic data from 105 057 single-nucleus cells from the hippocampus of a Beagle dog (39). These cells are grouped into 26 clusters, with the top 100 markers for each cluster showcased based on adjusted P-values. Gene markers are further visualized through a word cloud map, with links to detailed gene information accessible by clicking the gene symbol. In addition, high-resolution immunofluorescent images of 10 genes are available for further exploration.
Epigenome
The newly integrated Epigenome module integrates data from DNA methylation and ATAC-seq, enabling the exploration of potential regulatory regions within the genome. In total, iDog 2.0 has evaluated the genomic methylation levels of 29803 genes from 547 samples and assessed chromatin accessibility across 8 682 824 peaks from 87 samples. DNA methylation results are presented using a box plot to display CpG methylation rates across genes, promoters, and 5′ and 3′ untranslated regions (UTRs). Accompanying this, a detailed table provides comprehensive CpG methylation levels for each gene region (Figure 2E). Additionally, a genome browser is readily available, enabling users to examine CpG methylation levels for specific genes. Similarly, for chromatin accessibility, data are presented both as a table with detailed peak information and through a genome browser that displays the corresponding peaks (Figure 2F). All datasets are available for direct download from the FTP server for further analysis.
Phenotype
The Phenotype module mainly focuses on dog breeds, diseases, and traits. Currently, the DBO contains 456 terms, while the DDTO hosts 1536 terms, standardizing 482 breeds, 897 diseases and 3207 curated G2P pairs (Figure 2G). Compared to the previous version, iDog 2.0 now includes 114 additional dog diseases, 2613 new G2P pairs and 120 newly curated dog disease-associated genes. Statistical charts depicting the distribution of data related to breeds, diseases, G2P pairs have been added. Furthermore, breed-specific variants, breed-associated disease variants, and gene expression data have been incorporated into the breed detail pages, while disease differential expression data and disease-associated variants have been integrated into the disease detail pages. For the 120 newly curated dog disease-associated genes, corresponding human homologs and their genetic phenotype information have been integrated from OMIM (40).
Online tools
The Tools module introduces novel tools for artificial intelligence-based retrieval and breed classification, in addition to integrating several widely used tools for genome, transcriptome, and gene function analysis (Table 1). The Retrieval-Augmented Generation search tool (DogRAG) employs a Large Language Model (LLM) built upon the knowledge and annotated data in iDog, enabling efficient inquiries about dog breeds, diseases, genetic variations, and gene expression. For breed classification, the Visual Breed Classification (DogVC) tool applies deep learning techniques to automatically identify and classify dog breeds from user-uploaded images with an impressive accuracy rate of 94.5%. To support diverse genome data analysis needs, the module offers three positive selection tools (41), a haplotype block tool (42) (Figure 2H), and a genome coordinate conversion tool (43). For transcriptome analysis, the module features ExpPattern, which visualizes gene expression pattern across tissues and assesses gene specificity (https://github.com/apcamargo/tspex/). Two additional tools have also been designed to address batch effect issues commonly encountered in transcriptomic data processing. Finally, GO and KEGG (44) are employed for gene function enrichment analysis, allowing for detailed insights into the biological roles of genes.
Discussion and future plans
iDog 2.0 serves as a comprehensive multi-omics resource for both domestic dogs and wild canids. Its well-structured data organization, user-friendly interfaces, and various online tools make it an indispensable resource for researchers, veterinarians, and dog owners engaged in canine studies and care. With the ongoing development of genomics and functional genomics, we anticipate rapid expansion of iDog in the coming years, providing high-density variation data critical for understanding population genetics, diseases, and dog breeding in both domestic dogs and wild canids. As the assembly and publication of genomes from various dog breeds and their closely related species increase, constructing a pan-genome inversion index and comprehensive variation dataset becomes a priority. Moreover, metagenomic and single-cell transcriptomic datasets are becoming increasingly important for more detailed analyses. In the future, we will continuously enhance iDog by integrating an even wider range of multi-omics datasets and constructing a comprehensive dog pan-genome dataset. Since DogRAG is currently in its foundational version, our future efforts will focus on expanding the scope and volume of data to provide more precise and professional answers, as well as upgrading our hardware to speed up response times. For DogVC, we also plan to develop a mobile application to make it more convenient for users to access its functionalities on portable devices. Additional enhancements will focus on refining web interfaces, improving genomic annotations, and expanding toolkits to provide even greater utility for the global canine research community.
Supplementary Material
Acknowledgements
We thank the high-performance computing platform of NGDC for providing the powerful computational resources.
Contributor Information
Yanhu Liu, Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650091, China.
Yibo Wang, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Jiani Sun, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing, China.
Demian Kong, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Bowen Zhou, Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650091, China.
Mengting Ding, Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650091, China.
Yuyan Meng, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing, China.
Guangya Duan, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Ying Cui, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Zhuojing Fan, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China.
Ya-Ping Zhang, Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650091, China.
Wenming Zhao, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Bixia Tang, National Genomics Data Center, China National Center for Bioinformation, Beijing 100049, China; Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100049, China.
Data availability
iDog 2.0 is available online for free at https://ngdc.cncb.ac.cn/idog and does not require user registration. The data analysis code for transcriptomics, differentially expressed genes in diseases, and epigenomics is freely available on GitHub (https://github.com/Br1anChou/idog) and FigShare. The DOI link for iDog code is https://doi.org/10.6084/m9.figshare.27238002.v1.
Supplementary data
Supplementary Data are available at NAR Online.
Funding
National Natural Science Foundation of China [32100506, 32388102]; Strategic Priority Research Program of Chinese Academy of Sciences [XDB38050300 to WZ]; STI2030-Major Projects [2021ZD0203900]; Spring City Plan: The High–level Talent Promotion and Training Project of Kunming [2022SCP001]. Y.H.L. are supported by the Youth Innovation Promotion Association, Chinese Academy of Sciences; Yunnan Revitalization Talent Support Program Young Talent Project. Funding for open access charge: National Natural Science Foundation of China.
Conflict of interest statement. The authors declare no conflicts of interest.
References
- 1. Tang B., Zhou Q., Dong L., Li W., Zhang X., Lan L., Zhai S., Xiao J., Zhang Z., Bao Y. et al. iDog: an integrated resource for domestic dogs and wild canids. Nucleic Acids Res. 2019; 47:D793–D800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Arendt M.L., Sakthikumar S., Melin M., Elvers I., Rivera P., Larsen M., Saellström S., Lingaas F., Rönnberg H., Lindblad-Toh K. PIK3CA is recurrently mutated in canine mammary tumors, similarly to in human mammary neoplasia. Sci. Rep. 2023; 13:632–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Cho S.H., Seung B.J., Kim S.H., Bae M.K., Lim H.Y., Sur J.H. EGFR overexpression and sequence analysis of KRAS, BRAF, and EGFR mutation hot spots in canine intestinal adenocarcinoma. Vet. Pathol. 2021; 58:674–682. [DOI] [PubMed] [Google Scholar]
- 4. Hadji Rasouliha S., Barrientos L., Anderegg L., Klesty C., Lorenz J., Chevallier L., Jagannathan V., Rösch S., Leeb T. A RAPGEF6 variant constitutes a major risk factor for laryngeal paralysis in dogs. PLoS Genet. 2019; 15:e1008416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Friedrich J., Strandberg E., Arvelius P., Sánchez-Molano E., Pong-Wong R., Hickey J.M., Haskell M.J., Wiener P. Genetic dissection of complex behaviour traits in German Shepherd dogs. Heredity (Edinb). 2019; 123:746–758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Peignier S., Calevro F. Gene self-expressive networks as a generalization-aware tool to model gene regulatory networks. Biomolecules. 2023; 13:526–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Wang X., Zhou B.W., Yang M.A., Yin T.T., Chen F.L., Ommeh S.C., Esmailizadeh A., Turner M.M., Poyarkov A.D., Savolainen P. et al. Canine transmissible venereal tumor genome reveals ancient introgression from coyotes to pre-contact dogs in North America. Cell Res. 2019; 29:592–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. He H., Yang H., Foo R., Chan W., Zhu F., Liu Y., Zhou X., Ma L., Wang L.F., Zhai W. Population genomic analysis reveals distinct demographics and recent adaptation in the black flying fox (Pteropus alecto). J Genet Genomics. 2023; 50:554–562. [DOI] [PubMed] [Google Scholar]
- 9. Mooney J.A., Yohannes A., Lohmueller K.E. The impact of identity by descent on fitness and disease in dogs. Proc. Natl. Acad. Sci. U.S.A. 2021; 118:e2019116118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Mastrangelo S., Biscarini F., Riggio S., Ragatzu M., Spaterna A., Cendron F., Ciampolini R. Genome-wide association study for morphological and hunting-behavior traits in Braque Français Type Pyrénées dogs: a preliminary study. Vet. J. 2024; 306:106189. [DOI] [PubMed] [Google Scholar]
- 11. Amin S.B., Anderson K.J., Boudreau C.E., Martinez-Ledesma E., Kocakavuk E., Johnson K.C., Barthel F.P., Varn F.S., Kassab C., Ling X. et al. Comparative molecular life history of spontaneous canine and human gliomas. Cancer Cell. 2020; 37:243–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Wang C., Wallerman O., Arendt M.-L., Sundström E., Karlsson Å., Nordin J., Mäkeläinen S., Pielberg G.R., Hanson J., Ohlsson Å. et al. A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun. Biol. 2021; 4:185–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Li C., Tian D., Tang B., Liu X., Teng X., Zhao W., Zhang Z., Song S. Genome Variation Map: a worldwide collection of genome variations across multiple species. Nucleic Acids Res. 2021; 49:D1186–D1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Fu Y., Liu H., Dou J., Wang Y., Liao Y., Huang X., Tang Z., Xu J., Yin D., Zhu S. et al. IAnimal: a cross-species omics knowledgebase for animals. Nucleic Acids Res. 2023; 51:D1312–D1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Yang I.S., Jang I., Yang J.O., Choi J., Kim M.S., Kim K.K., Seung B.J., Cheong J.H., Sur J.H., Nam H. et al. CanISO: a database of genomic and transcriptomic variations in domestic dog (Canis lupus familiaris). BMC Genomics. 2023; 24:613–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Meadows J.R.S., Kidd J.M., Wang G.-D., Parker H.G., Schall P.Z., Bianchi M., Christmas M.J., Bougiouri K., Buckley R.M., Hitte C. et al. Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture. Genome Biol. 2023; 24:187–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Katz K., Shutov O., Lapoint R., Kimelman M., Brister J.R., O’Sullivan C. The Sequence Read Archive: a decade more of explosive growth. Nucleic Acids Res. 2021; 50:D387–D390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Wang Y., Song F., Zhu J., Zhang S., Yang Y., Chen T., Tang B., Dong L., Ding N., Zhang Q. et al. GSA: genome Sequence Archive. Genomics Proteomics Bioinformatics. 2017; 15:14–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zhang S.S., Chen T.T., Zhu J.W., Zhou Q., Chen X., Wang Y.Q., Zhao W.M. GSA: genome Sequence Archive. Yi Chuan. 2018; 40:1044–1047. [DOI] [PubMed] [Google Scholar]
- 20. Yuan D., Ahamed A., Burgin J., Cummins C., Devraj R., Gueye K., Gupta D., Gupta V., Haseeb M., Ihsan M. et al. The European Nucleotide Archive in 2023. Nucleic Acids Res. 2024; 52:D92–D97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Harrison P.W., Amode M.R., Austine-Orimoloye O., Azov A.G., Barba M., Barnes I., Becker A., Bennett R., Berry A., Bhai J. et al. Ensembl 2024. Nucleic Acids Res. 2024; 52:D891–D899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Nicholas F.W. Online Mendelian Inheritance in Animals (OMIA): a comparative knowledgebase of genetic disorders and other familial traits in non-laboratory animals. Nucleic Acids Res. 2003; 31:275–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wang Y., Lin Y., Wu S., Sun J., Meng Y., Jin E., Kong D., Duan G., Bei S., Fan Z. et al. BioKA: a curated and integrated biomarker knowledgebase for animals. Nucleic Acids Res. 2024; 52:D1121–D1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R., Thormann A., Flicek P., Cunningham F. The Ensembl Variant Effect Predictor. Genome Biol. 2016; 17:122–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20:1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Rubinacci S., Hofmeister R.J., Sousa da Mota B., Delaneau O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat. Genet. 2023; 55:1088–1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016; 34:525–527. [DOI] [PubMed] [Google Scholar]
- 29. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2012; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011; 12:323–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Krueger F., Andrews S.R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011; 27:1571–1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012; 9:357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M. et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021; 10:giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nusbaum C., Myers R.M., Brown M., Li W. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008; 9:R137–R145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Cantalapiedra C.P., Hernández-Plaza A., Letunic I., Bork P., Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 2021; 38:5825–5829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hinrichs A.S. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006; 34:D590–D598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Schriml L.M., Arze C., Nadendla S., Chang Y.W., Mazaitis M., Felix V., Feng G., Kibbe W.A. Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012; 40:D940–D946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Diesh C., Stevens G.J., Xie P., De J., Martinez T., Hershberg E.A., Leung A., Guo E., Dider S., Zhang J. et al. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol. 2023; 24:74–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Zhou Q.J., Liu X., Zhang L., Wang R., Yin T., Li X., Li G., He Y., Ding Z., Ma P. et al. A single-nucleus transcriptomic atlas of the dog hippocampus reveals the potential relationship between specific cell types and domestication. Natl. Sci. Rev. 2022; 9:nwac147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Amberger J.S., Bocchini C.A., Schiettecatte F., Scott A.F., Hamosh A. OMIM.org: online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015; 43:D789–D798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T. et al. The variant call format and VCFtools. Bioinformatics. 2011; 27:2156–2158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Dong S.S., He W.M., Ji J.J., Zhang C., Guo Y., Yang T.L. LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files. Brief Bioinform. 2021; 22:bbaa227. [DOI] [PubMed] [Google Scholar]
- 43. Zhao H., Sun Z., Wang J., Huang H., Kocher J.P., Wang L. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics. 2014; 30:1006–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., Feng T., Zhou L., Tang W., Zhan L. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (Camb). 2021; 2:100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
iDog 2.0 is available online for free at https://ngdc.cncb.ac.cn/idog and does not require user registration. The data analysis code for transcriptomics, differentially expressed genes in diseases, and epigenomics is freely available on GitHub (https://github.com/Br1anChou/idog) and FigShare. The DOI link for iDog code is https://doi.org/10.6084/m9.figshare.27238002.v1.



