Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2020 Nov 10;49(D1):D1186–D1191. doi: 10.1093/nar/gkaa1005

Genome Variation Map: a worldwide collection of genome variations across multiple species

Cuiping Li 1,2,3, Dongmei Tian 3,4,3, Bixia Tang 5,6,3, Xiaonan Liu 7,8,9,10,3, Xufei Teng 11,12,13,14,3, Wenming Zhao 15,16,17,18, Zhang Zhang 19,20,21,22,, Shuhui Song 23,24,25,26,
PMCID: PMC7778933  PMID: 33170268

Abstract

The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. It aims to collect and integrate genome variations for a wide range of species, accepts submissions of different variation types from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Compared with the previous version, particularly, a total of 22 species, 115 projects, 55 935 samples, 463 429 609 variants, 66 220 associations and 56 submissions (as of 7 September 2020) were newly added in the current version of GVM. In the current release, GVM houses a total of ∼960 million variants from 41 species, including 13 animals, 25 plants and 3 viruses. Moreover, it incorporates 64 819 individual genotypes and 260 393 manually curated high-quality genotype-to-phenotype associations. Since its inception, GVM has archived genomic variation data of 43 754 samples submitted by worldwide users and served >1 million data download requests. Collectively, as a core resource in the National Genomics Data Center, GVM provides valuable genome variations for a diversity of species and thus plays an important role in both functional genomics studies and molecular breeding.

INTRODUCTION

The Genome Variation Map (GVM; https://bigd.big.ac.cn/gvm/), as a core resource of the National Genomics Data Center (CNCB-NGDC) (1), part of the China National Center for Bioinformation (CNCB), is a public data repository of genome variations. Since its inception in 2017 (2), GVM has served as a central public resource for genome variations and played an important role in both functional genomics studies and molecular breeding (3,4). For instance, variants and knowledge associations deposited in GVM have been used in several data resources (e.g. IC4R (5), SR4R (6), MBKbase for rice (7), GWAS Atlas (8) and Animal-ImputeDB (9)). Over the past several years, advances in high-throughput sequencing technologies have empowered large-scale population genome sequencing projects, leading to massive genome variations identified at unprecedented rates. Consequently, GVM has accepted >50 data submissions (10–12) from all over the world, and as of September 2020, accordingly housed a large number of genome variations from 41 species, including not only human, but also domesticated animals, cultivated plants and viruses, particularly SARS-CoV-2, a coronavirus provoking the ongoing global pandemic. Meanwhile, GVM has served >1 million data download requests (https://bigd.big.ac.cn/gvm/statistics). Importantly, to provide high-quality variant data and metadata and deliver user-friendly data services, GVM has been frequently updated in the past years by standardizing the curation model and process, improving the web functionalities for data submission, browse and download, providing the database tutorial in PPT and video, and adding external links to other public databases, such as dbSNP (13), GWAS Catalog (14), NCBI genome (15), ENSEMBL (16), JGI (17), maizedb (18) and DRDB (19). Here we present an updated release of GVM and briefly describe its recent updates and data growth.

DATA COLLECTION AND METHODS

Whole-genome resequencing projects were collected from published literatures, and raw sequence data were downloaded from Sequence Read Archive (SRA) (20) and Genome Sequence Archive (GSA) (21,22). All collected raw sequence reads were subjected to quality control using Trimmomatic-0.36 and cleaned reads were aligned to the reference genomes using BWA-MEM (23). Those aligned reads were then merged into a single BAM file and sorted by SAMtools (19), and marked for duplicates using MarkDuplicates in GATK-4.0.5.0 (24). After removing duplicate reads, high-quality variants were identified by both GATK HaplotypeCaller and SAMtools mpileup, and base quality was recalibrated by Base Quality Score Recalibration (BQSR). Then, an intermediate genomic GVCF file for each sample was produced by running HaplotypeCaller in GVCF mode, and Genotype-GVCFs in GATK was applied to pool all GVCF files together to create a VCF file containing all raw variants. These raw variants were further filtered by using SelectVariants and VariantFiltration in GATK. Default parameters were used in the variant calling. The effects of all variants were annotated using VEP (25) as well as in-house pipelines. Functional annotation of variants were performed based on GO (26), UniProt (27) and Pfam (28). Furthermore, the genotype-to-phenotype (G2P) associations were manually curated from published GWAS literatures, and the relationships between sequence variants and phenotypic traits were established.

DATA MODULES AND DATA GROWTH

Over the past several years, GVM has been significantly updated regarding data modules and data volume. To better present genomic variants, all relevant entities and metadata in GVM are organized into six modules in terms of species, project, sample, variant, association and submission. Moreover, the number of genomic variants hosted in GVM is growing rapidly from ∼497 million in 19 species in August 2017 to ∼960 million in 41 species in August 2020 (Table 1). An illustration of all collected species and data statistics is presented in Figure 1 (with details in Supplementary Table S1). Compared with the previous version, particularly, a total of 22 species, 115 projects, 55 937 samples, 463 429 609 variants, 66 220 associations and 56 submissions (as of 7 September 2020) were newly added in the current version of GVM.

Table 1.

Statistics and comparison between the two versions of GVM

GVM 2018 (As of September 2017) GVM 2021 (As of September 2020)
Species 19 41
Projects 87 202
Samples 8884 64 819
Number of SNPs 434 525 449 818 769 204
Number of indels 62 454 393 141 640 247
Associations 194 173 260 393
Submitted samples NA 43 754
Variant annotation GO/UniProt/ClinVar/OMIM GO/UniProt/ClinVar/OMIM/Pfam
Global distribution of samples NA Available
FTP NA Available

Figure 1.

Figure 1.

A skeleton view of all collected species in GVM and their corresponding samples and variants.

The Species module provides a comprehensive overview on all collected species as well as their associated projects, samples, variants and associations (if available), which together are organized in a tabular table and linked to internal and external resources (Figure 2A). As of 7 September 2020, there are a total of 41 species, including 13 animals, 25 plants and 3 viruses. The newly updated species include three animals (cat, horse, and tarpan), 10 cultivated plants (carrot, cassava, common bean, cotton, cucumber, date palm, grape, apricot, rapeseed and wheat), five traditional Chinese medicine plants (Catharanthus roseus, Cannabis, Ganoderma, Jatropha and Salvia miltiorrhiza) and three coronaviruses (SARS-CoV-2, SARS-CoV and MERS-CoV). Since the outbreak of severe respiratory disease COVID-19 in late December 2019, SARS-CoV-2 has been rapidly spread as a global pandemic. To help worldwide researchers better understand the genome variation and transmission of SARS-CoV-2 (29), we analyzed genome sequences of SARS-CoV-2 as well as two close relatives (SARS-CoV and MERS-CoV) and made their genomic variants publicly available for the global research community through 2019nCoVR (19). As of 7 September 2020, there are a total of 16 934 variants identified from 52 466 high-quality SARS-CoV-2 assemblies as well as 477 and 1742 variants from 105 SARS-CoV and 248 MERS-CoV assemblies, respectively.

Figure 2.

Figure 2.

Screenshots of data modules in terms of species, project, sample, variation, and association. (A) Species overview. (B) Project metadata. (C) Sample metadata and global distribution. (D) Genomic variants. (E) Genotype-to-phenotype association.

In the Project and Sample modules, we compiled the metadata of whole-genome resequencing projects (Figure 2B) and samples (Figure 2C), respectively. The Project module displays an overview of all resequencing projects, involving sequenced sample size, sampling material, sequencing technology, data type and average sequencing coverage. Besides, bibliographic details (e.g. title, year, journal, PubMed ID) and a short description for each publication are collectively summarized, which are helpful for users to quickly understand the outline of the sequencing project(s) of interest. Likewise, the Sample module provides a detailed description on sequenced samples, including sample name, cultivar/breeder, geographic information (from which sequenced samples were collected), etc. A unique accession ID was assigned for each sample, and the number of sampled materials for a specific geographic region was further mapped in a world map, providing a worldwide landscape on the distribution of samples for each species and accordingly facilitating researchers to evaluate the sample representativeness and genetic diversity.

The Variant module (Figure 2D) provides a catalog of genome variations, including SNPs and indels, identified from a diversity of species (details see methods). For each variant, a unique identifier was assigned and its related details including variant coordinate, reference and alternative alleles, minor allele frequency, and hyperlinks to external databases (e.g. dbSNP, ClinVar) were provided. To help users prioritize the potentially functional SNPs, GVM provides comprehensive annotations for each variant, including consequence type, variant effect, population frequency and phenotype association, and also incorporates the functional domain information from UniProt and Pfam. Moreover, with the rapid accumulation of huge amounts of genomic variants, we further calculated the SNP density for each species and found that the number of SNP markers ranged from 1 to 64 per kb, with an averaged distance of 131 bp between adjacent SNP loci (Figure 1). In short, the SNP-based high-density genetic map for each species is critically important for a wide range of functional studies.

The Association module (Figure 2E) integrates a total of 78 950 high-quality (P < 0.001) genotype-to-phenotype (G2P) GWAS associations for 12 non-human species that were manually curated from 304 publications. These G2P associations account for 735 traits across seven cultivated plants (cotton, Japanese apricot, maize, rapeseed, rice, sorghum and soybean) and five domesticated animals (chicken, cattle, goat, pig and sheep). More importantly, all associations and traits have been further annotated and organized based on a suite of ontologies (Plant Trait Ontology, Animal Trait Ontology for Livestock, etc.) in GWAS Atlas (8), and these G2P associations are of great significance for genetic research on important traits and breeding application.

The Submission module offers online data submission services and accepts multiple data formats including VCF, GVCF and HapMap. It allows variation submission for any species and from any particular genome (e.g. mitochondria and chloroplast), and supports batch online submission from all over the world. A detailed instruction for data submission is available on https://bigd.big.ac.cn/gvm/instruction. Besides, the complete collection of those released variants can be downloaded directly via FTP at ftp://download.big.ac.cn/GVM. As of 7 September, 2020, GVM contains 56 submissions describing variations from 23 species (e.g. human, soybean and maize). According to our data statistics in GVM, more than half of submitted samples (43 754) belong to human (26 956, 61.6%) and the rest are animals (8195, 18.7%) and plants (8601, 19.7%).

DOWNLOAD

GVM provides open access to all publicly available variants, which are downloadable in both VCF and FASTA formats at https://bigd.big.ac.cn/gvm/download. The brief VCF file contains genomic position, reference and alternative alleles and variant quality, and the FASTA file provides 50nt flanking sequences for each variant. According to the user’s feedback, we newly added the detailed VCF file containing the genotype information for all samples, which would be more useful for users to conduct in-depth GWAS functional analysis.

FUTURE PLANS

GVM, as a public data repository of genomic variants, features comprehensive integration of different types of genome variations for a wide range of species. With the development of high-throughput sequencing technology, GVM is expected to continue to grow rapidly over the next following years. As GVM offers high-density variation map for each species, these variants are of critical significance for population genetics, evolutionary analysis, association studies and genomic breeding. Thus, future developments are to generate different reference SNP panels, including hapmapSNPs after data filtration and genotype imputation, tagSNPs after removing linkage disequilibrium-based redundancy SNPs, fixedSNPs selected from genes exhibiting selective sweep signatures and barcodeSNPs selected from DNA fingerprinting simulation. In fact, it has been implemented in the 3000 Rice Genome Project (30) and SNP Ready for Rice (SR4R, http://sr4r.ic4r.org/) (19). Additionally, these SNP datasets will be readily for optimal design of low-density (LD), medium-density (MD) or high-density (HD) SNP chip, which would be helpful to develop a rapid, accurate and efficient method for genotyping several hundred or thousand polymorphisms in large numbers of individuals. Furthermore, ongoing efforts will also include optimization of curation models and processes, integration of more variation datasets, enhancement of genomic variant annotation, and improvement of web interfaces and data analysis pipelines.

Supplementary Material

gkaa1005_Supplemental_Files

ACKNOWLEDGEMENTS

We thank our colleagues, students, and a number of users for reporting bugs and sending comments.

Contributor Information

Cuiping Li, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

Dongmei Tian, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

Bixia Tang, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

Xiaonan Liu, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Xufei Teng, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Wenming Zhao, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Zhang Zhang, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Shuhui Song, China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Strategic Priority Research Program of the Chinese Academy of Sciences [XDA24040201 to S.S., XDA19090116 to S.S., XDA19050302 to Z.Z.]; National Key R&D Program of China [2020YFC0848900, 2018YFD1000505, 2017YFC0907502]; 13th Five-year Informatization Plan of Chinese Academy of Sciences [XXH13505-05]; Genomics Data Center Construction of Chinese Academy of Sciences [XXH-13514-0202]; K .C. Wong Education Foundation (to Z.Z.); International Partnership Program of the Chinese Academy of Sciences [153F11KYSB20160008]; Youth Innovation Promotion Association of Chinese Academy of Science [2017141 to S.S.]. Funding for open access charge: Strategic Priority Research Program of the Chinese Academy of Sciences.

Conflict of interest statement. None declared.

REFERENCES

  • 1. National Genomics Data Center Members and Partners Database resources of the national genomics data center in 2020. Nucleic Acids Res. 2020; 48:D24–D33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Song S., Tian D., Li C., Tang B., Dong L., Xiao J., Bao Y., Zhao W., He H., Zhang Z.. Genome Variation Map: a data repository of genome variations in BIG Data Center. Nucleic Acids Res. 2018; 46:D944–D949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. He G., Wang Z., Guo J., Wang M., Zou X., Tang R., Liu J., Zhang H., Li Y., Hu R. et al.. Inferring the population history of Tai-Kadai-speaking people and southernmost Han Chinese on Hainan Island by genome-wide array genotyping. Eur. J. Hum. Genet. 2020; 28:1111–1123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Liu S., Li C., Wang H., Wang S., Yang S., Liu X., Yan J., Li B., Beatty M., Zastrow-Hayes G. et al.. Mapping regulatory variants controlling gene expression in drought response and tolerance in maize. Genome Biol. 2020; 21:163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Sang J., Zou D., Wang Z., Wang F., Zhang Y., Xia L., Li Z., Ma L., Li M., Xu B. et al.. IC4R-2.0: Rice genome reannotation using massive RNA-seq data. Genomics Proteomics Bioinformatics. 2020; 18:161–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Yan J., Zou D., Li C., Zhang Z., Song S., Wang X.. SR4R: An integrative SNP resource for genomic breeding and population research in rice. Genomics Proteomics Bioinformatics. 2020; 18:173–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Peng H., Wang K., Chen Z., Cao Y., Gao Q., Li Y., Li X., Lu H., Du H., Lu M. et al.. MBKbase for rice: an integrated omics knowledgebase for molecular breeding in rice. Nucleic Acids Res. 2020; 48:D1085–D1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Tian D., Wang P., Tang B., Teng X., Li C., Liu X., Zou D., Song S., Zhang Z.. GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res. 2020; 48:D927–D932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Yang W., Yang Y., Zhao C., Yang K., Wang D., Yang J., Niu X., Gong J.. Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation. Nucleic Acids Res. 2020; 48:D659–D667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zhou Z., Li M., Cheng H., Fan W., Yuan Z., Gao Q., Xu Y., Guo Z., Zhang Y., Hu J. et al.. An intercross population study reveals genes associated with body size and plumage color in ducks. Nat. Commun. 2018; 9:2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Zhang Z., Jia Y., Chen Y., Wang L., Lv X., Yang F., He Y., Ning Z., Qu L.. Genomic variation in Pekin duck populations developed in three different countries as revealed by whole-genome data. Anim. Genet. 2018; 49:132–136. [DOI] [PubMed] [Google Scholar]
  • 12. Liu Y., Du H., Li P., Shen Y., Peng H., Liu S., Zhou G.A., Zhang H., Liu Z., Shi M. et al.. Pan-Genome of wild and cultivated soybeans. Cell. 2020; 182:162–176. [DOI] [PubMed] [Google Scholar]
  • 13. Sherry S.T., Ward M.H., Kholodov M., Baker J., Phan L., Smigielski E.M., Sirotkin K.. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001; 29:308–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E. et al.. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019; 47:D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Pruitt K.D., Tatusova T., Brown G.R., Maglott D.R.. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012; 40:D130–D135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R. et al.. Ensembl 2020. Nucleic Acids Res. 2020; 48:D682–D688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Nordberg H., Cantor M., Dusheyko S., Hua S., Poliakov A., Shabalov I., Smirnova T., Grigoriev I.V., Dubchak I.. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 2014; 42:D26–D31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Polacco M., Coe E., Fang Z., Hancock D., Sanchez-Villeda H., Schroeder S.. MaizeDB - a functional genomics perspective. Comp. Funct. Genomics. 2002; 3:128–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Zhao W.M., Song S.H., Chen M.L., Zou D., Ma L.N., Ma Y.K., Li R.J., Hao L.L., Li C.P., Tian D.M. et al.. The 2019 novel coronavirus resource. Yi Chuan. 2020; 42:212–221. [DOI] [PubMed] [Google Scholar]
  • 20. Kodama Y., Shumway M., Leinonen R. International Nucleotide Sequence Database, C. . The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012; 40:D54–D56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Wang Y., Song F., Zhu J., Zhang S., Yang Y., Chen T., Tang B., Dong L., Ding N., Zhang Q. et al.. GSA: Genome Sequence Archive. Genomics Proteomics Bioinformatics. 2017; 15:14–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Zhang S.S., Chen T.T., Zhu J.W., Zhou Q., Chen X., Wang Y.Q., Zhao W.M.. [GSA: Genome Sequence Archive]. Yi Chuan. 2018; 40:1044–1047. [DOI] [PubMed] [Google Scholar]
  • 23. Li H., Durbin R.. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26:589–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M. et al.. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20:1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R., Thormann A., Flicek P., Cunningham F.. The Ensembl variant effect predictor. Genome Biol. 2016; 17:122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. The Gene Ontology, C. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019; 47:D330–D338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018; 46:2699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter S.C., Qureshi M., Richardson L.J., Salazar G.A., Smart A. et al.. The Pfam protein families database in 2019. Nucleic Acids Res. 2019; 47:D427–D432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Zhang Z., Song S., Yu J., Zhao W., Xiao J., Bao Y.. The Elements of Data Sharing. Genomics Proteomics Bioinformatics. 2020; 18:1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wang W., Mauleon R., Hu Z., Chebotarov D., Tai S., Wu Z., Li M., Zheng T., Fuentes R.R., Zhang F. et al.. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018; 557:43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkaa1005_Supplemental_Files

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES