Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2011 Dec 1;40(Database issue):D1047–D1054. doi: 10.1093/nar/gkr1182

GWASdb: a database for human genetic variants identified by genome-wide association studies

Mulin Jun Li 1, Panwen Wang 1, Xiaorong Liu 1, Ee Lyn Lim 1,2, Zhangyong Wang 1,3, Meredith Yeager 4,5, Maria P Wong 6, Pak Chung Sham 7, Stephen J Chanock 5, Junwen Wang 1,*
PMCID: PMC3245026  PMID: 22139925

Abstract

Recent advances in genome-wide association studies (GWAS) have enabled us to identify thousands of genetic variants (GVs) that are associated with human diseases. As next-generation sequencing technologies become less expensive, more GVs will be discovered in the near future. Existing databases, such as NHGRI GWAS Catalog, collect GVs with only genome-wide level significance. However, many true disease susceptibility loci have relatively moderate P values and are not included in these databases. We have developed GWASdb that contains 20 times more data than the GWAS Catalog and includes less significant GVs (P < 1.0 × 10−3) manually curated from the literature. In addition, GWASdb provides comprehensive functional annotations for each GV, including genomic mapping information, regulatory effects (transcription factor binding sites, microRNA target sites and splicing sites), amino acid substitutions, evolution, gene expression and disease associations. Furthermore, GWASdb classifies these GVs according to diseases using Disease-Ontology Lite and Human Phenotype Ontology. It can conduct pathway enrichment and PPI network association analysis for these diseases. GWASdb provides an intuitive, multifunctional database for biologists and clinicians to explore GVs and their functional inferences. It is freely available at http://jjwanglab.org/gwasdb and will be updated frequently.

INTRODUCTION

Thousands of genetic variants (GVs) associated with human traits and diseases have been identified by genome-wide association studies (GWAS). The advent of high throughput technologies, such as next-generation sequencing and very high-density microarrays, enable us to capture genome-wide variation on a much larger scale. With increasing sample sizes, GWAS studies based on these technologies will produce more information at higher resolutions. We will be able to detect many traits/diseases associated GVs, such as single nucleotide polymorphisms (SNPs), copy number variations (CNVs), and insertions and deletions (Indels) (1,2).

To understand the underlying regulatory and metabolic significance of these GVs, we have to consider biological evidences from different sources. However, in developing databases and web resources to integrate multidimensional functional annotations, researchers will inevitably encounter the following difficulties: (i) Searching and gathering GWAS results from published data for a specific trait/disease can be tedious and time-consuming. Researchers have to locate the publications by searching PubMed or other databases, and then gather GVs by manual curation either from the main text or from related supplementary materials for each publication. (ii) Individual curation lacks a universal criterion for data handling, which might cause data inconsistency and consequently affects the quality of the downstream analysis. (iii) Inference of the functional role of these GVs from heterogeneous databases will also be a challenge. Information (genomic elements, genetic and disease associated attributes) from different databases (such as dbSNP, HapMap, RefSeq, Ensemble and OMIM) needs to be gathered. If the information is not readily available, functional prediction will need to be performed using various available software or web servers.

Fortunately, several databases and tools have been developed to cope with these problems. The NHGRI GWAS Catalog has collected more than 5800 GVs from published GWAS (up to August 2011). The database used GWAS studies reporting at least one GV with P < 5.0 × 10−8, and the collected GVs were limited to P < 1.0 × 10−5 (3). This database also contains some statistical features including odds ratios and estimated risk allele frequencies. Johnson and O'Donnell have published a full gene-annotated GWAS database, which contains 56 411 GWAS genotype–phenotype associations with a threshold P < 1.0 × 10−3 (4). GWAS Central (previously named HGVbaseG2P) is another manually curated database that provides a centralized compilation of high level summary data from genetic association studies (5). Other databases also focus on data integration of GVs from GWAS, such as dbGaP PheGenI (6), Genetic Association Database (GAD) (7), HuGE Navigator (8), Varietas (9) and Snpedia. Many bioinformatics tools have been developed to quickly locate genome elements around GVs and to infer their putative functions, such as SNPselector (10), SNP Function Portal (11), F-SNP (12), SNPit (13), SNPLogic (14), SNPnexus (15), SCAN (16), GWAS analyzer (17) and pfSNP (18). These Web-based resources continually strive to provide a comprehensive knowledge base of the characteristics and functions of GVs.

However, existing resources also have limitations in satisfying the increasing demands of current GWAS research: (i) Many true disease susceptibility loci have relatively moderate P values which are ignored in existing databases. GVs with moderate effect sizes, usually filtered by strict cutoffs, can be directly related to diseases through gene–gene interaction in the context of regulatory networks and pathways (19). (ii) Most of the existing databases focus only on one or several aspects of the functional annotations, and not on GV-disease relationships. An integrative, comprehensive, up-to-date GWAS-based knowledge base that focuses on disease classification is needed.

Here, we present GWASdb, a user friendly database that combines collections of GVs from GWAS together with their functional annotations and disease classifications. We aim to provide an integrative, multidimensional functional annotation portal to help researchers and clinicians maximize the usage of the most recent GWAS data. The database provides the following information: (i) In addition to all the GVs annotated in the NHGRI GWAS Catalog, we manually curated the GVs that are marginally significant (P < 1.0 × 10−3) collected from supplementary materials of each original publication. (ii) We provide extensive functional annotations for these GVs. (iii) The GVs have been manually classified according to disease using Disease-Ontology Lite (DOLite) and Human Phenotype Ontology (HPO). The database can be used to conduct gene-based pathway enrichment and PPI network association analysis for diseases with sufficient variants.

DATABASE DESIGN AND CONTENT

We provide an intuitive, well-organized and easy-to-use web interface that allows users to explore the GVs from different perspectives, including genome, disease, gene regulation and protein interactions. Users can quickly search and locate a queried region by inputting the dbSNP id, gene symbol and chromosome region, or by directly clicking the data point on the plot of the GWAS overview. We have also developed a web-based genome viewer (Gviewer) to dynamically display the related information. Furthermore, to facilitate communication with other servers, we provide web service interfaces for machine-based large-scale data retrieval. We anticipate the database can facilitate follow-up analysis of specific diseases and can help researchers generate hypotheses by integrating multidimensional information concerning the target GV. The overall structure of our database is shown in Figure 1.

Figure 1.

Figure 1.

The overview of GWASdb database design. GWASdb consists of three main functions: precise scientific curation and resources integration on GWAS, comprehensive annotation of genetic variants and disease-oriented analysis in terms of DOLite and HPO.

Data curation and collection

One major source of data for GWASdb was from the NHGRI GWAS Catalog. The GWAS Catalog has collected data on thousands of GVs from the literature, adopting a stringent criterion to ensure data consistency and integrity. SNP-trait associations for each GV gathered from each paper were limited to P < 1.0 × 10−5, and the database also restricted the number of SNP-trait associations extracted from each paper to 50 (3). We extended the scope of this database by using a relatively loose cutoff of P < 1.0 × 10−3 for data from each paper, and where possible GVs were included from supplementary materials. We used the same standards for other criteria, including P values derived from the largest sample size, population selected from a combined analysis or the largest one. Our purpose was to incorporate more GVs with moderate P values and to have more comprehensive functional annotations. At this current stage, we have gathered 70 411 GVs, 64 000 more than in the NHGRI GWAS Catalog (see Supplementary Data). Other well-organized GWAS databases also incorporated were Johnson and O'Donnell (4), dbGaP PheGenI (6), GAD (7), GWASCentral (5) and PharmGKB (20). We found many overlapping GVs annotated in these databases, which we combined by selecting only the most significant ones from the redundant GVs. We also omitted the GVs that we had already included from the NHGRI GWAS Catalog. In total, we obtained 146 537 GVs from the consolidation of several databases, 20 times more than in the NHGRI GWAS Catalog (see Supplementary Data). All the GVs can be viewed at either the whole genome level or at the chromosome level using the circular genome plot.

Constructing GV functional annotation

All the collected GVs were mapped to the latest dbSNP132 database. We then integrated comprehensive annotations from various sources for these GVs. These annotations were systematically divided into seven categories as follows: GV summary, genomic mapping, regulatory effect, amino acid substitution, evolution, gene expression and disease annotation (Table 1). For each category, we investigated the possible functional roles of each selected GV. For example, in the category of regulatory effect, we computed the affinity changes caused by different alleles of each GV, such as the affinities between transcriptional factors and their binding sites (21–23), microRNAs and their targets (23), and predicted splicing sites (18). The statistical significances of the binding affinity changes were calculated based on permutations of the binding partners (24).

Table 1.

Description of annotations organized in GWASdb

Level Item Description Reference
Snp Summary General information dbSNP 132 annotation for each GV dbSNP-Q (32)
Genome-wide association Manual curation and collection GWASdb
1000 Genome SNP SNPs and indels in 1000 Genomes Project 1049 subjects (May 2011 release) 1000 genome project
LD plot LD data from HapMap Phase II+III HapMap
Genomic mapping Reference gene Gene annotation from NCBI Refseq NCBI Refseq
Ensemble gene Gene annotation from Ensemble Ensemble
Known gene Gene annotation from UCSC UCSC
Small RNA snoRNA and miRNA annotations from UCSC UCSC
MicroRNA target TargetScan generated miRNA target site predictions UCSC
Transcriptional factor binding site Transcription factor binding sites conserved in the human/mouse/rat alignment, based on transfac Matrix Database (v7.0) UCSC
Enhancer Human Enhancer verified by experiment VISTA Enhancer DB (33)
Insulator CTCF binding site database for characterization of human genomic insulators CTCFBSDB (34)
Regulatory effects Transcriptional factor binding site affinity GV affinity of TFBS prediction based on fold energy change with PWM scanning GWASdb, TRANSFAC (35) JASPAR (36), UniPROBE (37)
MicroRNA target site affinity (for Pita) GV affinity of miRNA target prediction based on fold and hybrid energy change for PITA top targets GWASdb, PITA (38)
MicroRNA target site affinity (for Miranda) GV affinity of miRNA target prediction based on hybrid energy change for miRanda targets GWASdb, miRanda (39)
Splicing site affinity GV affinity of splicing site prediction ssSNPTarget (40)
Amino acid substitution Non-synonymous SNP functional prediction Non-synonymous GV deterioration prediction dbNSFP (41)
Evolution SNP positive selection The estimation of FST and heterozygosity of GV for positive selection SNP@Evolution (42)
Gene positive selection The estimation of FST and heterozygosity of gene for positive selection SNP@Evolution
Conserved functional RNA Conserved functional RNA, through RNA secondary structure predictions made with the EvoFold program UCSC
Conserved elements Conserved elements produced by the PhastCons program based on a whole-genome alignment of vertebrates UCSC
Gene expression Three way SNP expression association Gene co-expression relationships with GV effect SNPxGE2 (43)
Disease association OMIM Online Mendelian Inheritance in Man OMIM
DGV Curated catalogue of structural variation in the human genome Database of Genome Variants
GAD Archive of human genetic association studies of complex diseases and disorders Genetic Association Database

We further calculated how the annotated GVs are distributed in different genomic regions. As shown in Figure 2a, 43.5% of all GVs are in the gene regions, such as intron, nonsense, missense, cds-indel, cds-synon, frameshift, 3′-UTR, 5′-UTR, 3′-nearGene and 5′-nearGene, as defined by dbSNP132. The rest of the GVs (∼56.5%) are located in intergenic regions, which are areas that contain enhancers, promoter elements and many other long range regulators, and thus may be involved in gene regulation and regulatory networks (25). The top 15 traits/diseases with the most abundant GVs in our database are shown in Figure 2b.

Figure 2.

Figure 2.

Classifications of GVs from the genic regions and according to the traits/diseases in GWASdb. (a) The proportion of GV/gene transcripts with different functional properties in the genic regions (total representing 43.5% of all GVs in GWASdb). (b) The Top 15 traits/diseases which have the most significant GVs in database based on DOLite catalog.

Mapping of GVs using DOLite and HPO

DOLite is a simplified annotation of gene–disease associations. It was constructed from the OBO Foundry Disease Ontology (26). DOLite uses 561 independent nodes to describe gene–disease associations and is highly suited for GV-disease mapping in our database. We were able to successfully map 70% of our GVs into these nodes. However, DOLite does not include other phenotypes, such as height, weight and addiction, so another ontology database, HPO (27), was used. We were able to successfully map the rest of the GVs in our database in terms of HPO.

Disease-oriented analysis using DOLite and HPO

The mapping of GVs to diseases enables us to perform disease level meta-analysis. It is important to understand the underlying mechanism of SNP–disease association, particularly in the context of pathways and networks. Our database allows users to perform meta-analysis on multiple studies targeting the same disease, defined by a unique term in DOLite or HPO. We used the KGG package (28) to search for enriched pathways or protein–protein interaction networks (PPI). We omitted the disease terms that contained less than 400 GVs because pathway and PPI enrichment analysis need a large dataset of genes.

WEB INTERFACE AND DATA QUERYING

The GWASdb web site provides six straightforward components: Guidance, GWAS overview, Gviewer, DOLite Viewer, HPO Tree Viewer and Customized Page. These help researchers locate and explore the GVs of interest and its related functional annotations.

The guidance page

The GWAS guidance page is the front page of the database. The user should first read this page to get a general idea on the contents and how to use various functions of the database. On the left-upper corner of the page, there is a sliding menu with menu items that the users can start with. If the users want to get all the GVs in the whole genome level, they can click on the ‘Overview’ item. If they are interested in a particular disease, they can start from either ‘DOLITE’ or ‘HPO’ items. If they want to analyze a list of SNPs of they own, they can start from the ‘CUSTOMIZED’ item.

The GWAS overview page

The GWAS overview page displays a circular GWAS plot showing the global view of the top GVs in each human chromosome. The dots in the plot represent the top two GVs from each study and different colors represent different diseases (Figure 3a). Other information is shown as spectral plots in the inner circles of the plot, such as CNV hotpots, dbSNP density, HapMap density, 1000 genome density and OMIM gene distribution (Figure 3b). By clicking on the ideogram of each chromosome, the user will be presented with a new circular plot displaying a single chromosome showing the top five GVs from each disease. By clicking on a single dot, users will be brought to the Gviewer page and general information of GV will be displayed, such as dbSNP id, P value, study source and DOLite catalog number.

Figure 3.

Figure 3.

Illustration of the circular GWAS plot. (a) Overview of the circular GWAS plot, dots show the top two GVs for each study. (b) A description of each of the components in the plot.

The Gviewer

The Gviewer is a web-based genome browser that dynamically displays the different tracks related to the queried GV. Gviewer currently provides four tracks (GV, RefGene, OMIM Gene and DGV) that show the elements around the target GV. More tracks will be added in the future. Users can either click on the arrow buttons or drag the tracks to show the surrounding regions. By clicking elements on the track, users can get detailed information in a popup message box. When a GV is clicked, comprehensive functional annotations of this GV will be displayed on the right pane, which will update with the user actions in the Gviewer. To improve the user experience, selecting different tabs does not switch to another page and waiting time is kept to a minimum because the page loading is asynchronous and the page rendering is progressive.

For example, if a user inputs the dbSNP id (rs437179) in top search bar or by clicking this GV in GWAS overview plot, the user will be automatically forwarded to Gviewer. This shows the GV location in the gene body of SKIV2L, an OMIM gene (600478), together with copy number variants. By clicking on each annotation tab in the right pane, users will obtain the following detailed functional annotations about this GV: (i) it is associated with rheumatoid arthritis (P value of 6.15E-20); (ii) it was reported in HapMap and 1000 genome project with average heterozygosity of 0.39; (iii) it has an miRNA (hsa-mir-1236) located in its upstream region; (iv) its two alleles significantly change the transcriptional factor binding site affinities (transcriptional factors: LM105 and GAMYB); (v) it is a non-synonymous SNP; (vi) it is located in the conserved region undergoing positive selection; (vii) it is associated with the differential co-expression between two genes (DEFB4 and OAS1); (viii) it has extensive variants and diseases association (OMIM: 600478; DGV: 3602, 36507; GAD: 557471, 557472, 557473) (see Supplementary Figure S1).

The DOLite viewer and HPO viewer

To demonstrate the disease/trait classifications of these GVs, we provide the DOLite viewer and HPO viewer. GWASdb displays an interactive Manhattan Plot viewer for easy visualization of GVs mapped to a DOLite node or HPO tree. By selecting each disease or phenotype node, a Manhattan Plot will be instantly drawn on the left pane, with each dot representing a GV. The detailed information on all GVs associated with this disease is simultaneously shown on the right pane. Users can hover the mouse over the GV dot to view a brief description of the GV. When the GV dot is clicked, detailed information will be highlighted on the right pane. By clicking the arrow icon on the highlighted information, the user can continue to the Gviewer page to see the detailed functional annotation of this GV. We also provide pathway and PPI analysis for DOLite terms or HPO nodes that total more than 400 GVs. Two additional tabs can be accessed for gene-based pathway analysis calculated from the KGG package (28) and PPI network analysis rendered by Cytoscape (29) (Supplementary Figure S2).

Searching the GWASdb database

On the front page, users can perform a quick search in any of the five search categories of dbSNP id, gene symbol, chromosome region, DOLite and HPO phenotype terms. The system will show instant hints messages when the user only inputs part search terms or show alert messages if the search term is not recognized. After clicking the ‘Go’ button, the server will display different views depending on which search category was selected. For example, if dbSNP id was queried, the system will display the highlighted SNP in the Gviewer pane together with comprehensive annotations on the right pane. If the SNP id is in an older version format, the system will automatically convert it to the latest version and process the query. For gene or genomic region searches, the system will show all GVs in that region in the Gviewer pane together with literature information on the right pane. For disease or phenotype queries, a Manhattan Plot will be displayed on the left pane. The user can then click on a particular GV on the plot and the system will display the Gviewer page with that GV highlighted.

The customized page

This customized page allows users to study a list of GVs of their choice. The users will input the list of GVs and select their disease of interest, either as a DOLite term or a HPO node. The server will search our local database for all the SNPs associated with this disease and compare them with the input GVs. A hypergeometric test will be performed to test whether the input GVs have any significant overlap with the GVs in the database. The overlapping and non-overlapping GVs will be displayed in different colors in a Manhattan Plot. By clicking on the dots on the plot, users can further explore the functional annotations of each GV.

Database implementation and downloading

GWASdb is a web-based query tool designed with Service-oriented architecture (SOA). We used jQuery and Raphaël JavaScript frameworks as the frontend to build Gviewer, which ensures high usability of web pages, and we used MySql as the backend database. Database sharding is used to handle the large amount of SNP data. To facilitate the communication with other servers, we have provided web service interfaces for machine-based large-scale data retrieval, which were built using Apache CXF technology (see Supplementary Data). All the functional annotations in the Gview page can be downloaded in batch, by clicking the ‘Get All Information in JSON’ on the right panel of the Gview page.

DISCUSSION

The GWASdb database can satisfy the demands of the scientific community for the exploration of ever increasing amounts of GWAS data. Many published bioinformatics tools have targeted functional annotations of GVs. We performed a function-oriented comparison with existing tools (see Supplementary Table S2). Using rich web application techniques, GWASdb offers great convenience to researchers for analysis of their GWAS data. Researchers can quickly locate and fetch the GVs of interest and examine the genetic information and functional annotations in great detail. Furthermore, they can explore pathway and PPI networks in the context of disease-oriented meta-analysis. This platform combined with other resources will be an effective tool to study the underlying disease mechanism in GVs. The GWASdb integrative database portal will be a valuable resource for researchers and clinicians.

The GWASdb focuses on specific features and functions of GWAS GVs and their disease classifications. The GWASdb database has collected GVs from six resources so far (NHGRI GWAS Catalog, Johnson and O'Donnell, dbGaP PheGenI, GAD, GWASCentral and PharmGKB). Due to inconsistency in data formats and difficulty of data curation, we did not delve into the experimental and sample description of each GWAS, such as population-related information, individual ratio, geographic region and mode of recruitment. Instead, we provide PubMed links for each GV in our database so that users can easily trace the information from the original publications. Since our purpose was to integrate potentially useful GVs from the literature, we used a predefined cutoff (P < 1.0 × 10−3) as our curation threshold. This cutoff was used because we found most reported moderate SNPs have GWAS significance between 10−2 and 10−4 (19,30). Nevertheless, lowering the P value cutoff will inevitably increase our false positives. The users can use the ‘customized’ page to hand pick the GVs of interest. There are experimental methods that can reduce the false positives, for example, validation of GWAS results from an independent cohort, or functional study. Computational methods can also be used to reduce the false positives. For example, it was recently reported that trait/disease-associated GVs are more likely to be expression Quantitative Trait Locus (eQTL). We can use eQTLs to filter the false positive and reveal the true association profile of the study (31).

With the advent of personal genome sequencing projects such as the 1000 Genomes Project, many novel mutations and disease-causing loci will be discovered in the near future. We will constantly recruit new GVs into our database as new GWAS data become available. At the same time, we will incorporate new bioinformatics algorithms and tools to improve the accuracy of functional annotations. In the next stage, we will incorporate SNPs that are not found by GWAS studies, but are in close Linkage Disequilibrium (LD) with the SNPs in GWASdb. This will greatly enhance the utility of this database because there are disease-causing GVs that were not covered by GWAS arrays. Besides, we also aim to collect data from important genome regions such as eQTLs, long non-coding RNA and DNA methylation sites in the next version of GWASdb, because SNPs in those regions may pose positive or negative effects on gene regulation. We will add more tracks to the Gviewer page to allow users to view more functional elements, such as SNP density, haplotype plot and important regulators. For GV annotation, we plan to integrate more data sources or pre-compute the functional predictions using recognized algorithms. The GWASdb database is freely available at http://jjwanglab.org/gwasdb and will be updated frequently.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Figures 1–2, Supplementary Tables 1–2.

FUNDING

The Small Project Fund (201007176262) of the University of Hong Kong; Research Grants Council of Hong Kong (781511M, 778609M, N_HKU752/10); Food and Health Bureau of Hong Kong (10091262); The intramural research program of the National Cancer Institute (NCI), NIH, USA. Funding for open access charge: Research Grants Council (781511M) of Hong Kong.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Kevin Mao and Tina Yuen of The Royal College of Surgeons in Ireland for their assistance in data curation.

REFERENCES

  • 1.The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PI, Deloukas P, Gabriel SB, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Johnson AD, O'Donnell CJ. An open access database of genome-wide association results. BMC Med. Genet. 2009;10:6. doi: 10.1186/1471-2350-10-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Thorisson GA, Lancaster O, Free RC, Hastings RK, Sarmah P, Dash D, Brahmachari SK, Brookes AJ. HGVbaseG2P: a central genetic association database. Nucleic Acids Res. 2009;37:D797–D802. doi: 10.1093/nar/gkn748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 2007;39:1181–1186. doi: 10.1038/ng1007-1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Becker KG, Barnes KC, Bright TJ, Wang SA. The genetic association database. Nat. Genet. 2004;36:431–432. doi: 10.1038/ng0504-431. [DOI] [PubMed] [Google Scholar]
  • 8.Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ. A navigator for human genome epidemiology. Nat. Genet. 2008;40:124–125. doi: 10.1038/ng0208-124. [DOI] [PubMed] [Google Scholar]
  • 9.Paananen J, Ciszek R, Wong G. Varietas: a functional variation database portal. Database. 2010;2010:baq016. doi: 10.1093/database/baq016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Xu H, Gregory SG, Hauser ER, Stenger JE, Pericak-Vance MA, Vance JM, Zuchner S, Hauser MA. SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics. 2005;21:4181–4186. doi: 10.1093/bioinformatics/bti682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang PL, Dai MH, Xuan WJ, McEachin RC, Jackson AU, Scott LJ, Athey B, Watson SJ, Meng F. SNP Function Portal: a web database for exploring the function implication of SNP alleles. Bioinformatics. 2006;22:E523–E529. doi: 10.1093/bioinformatics/btl241. [DOI] [PubMed] [Google Scholar]
  • 12.Lee PH, Shatkay H. F-SNP: computationally predicted functional SNPs for disease association studies. Nucleic Acids Res. 2008;36:D820–D824. doi: 10.1093/nar/gkm904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Shen TH, Carlson CS, Tarczy-Hornoch P. SNPit: a federated data integration system for the purpose of functional SNP annotation. Comput. Methods Programs Biomed. 2009;95:181–189. doi: 10.1016/j.cmpb.2009.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Pico AR, Smirnov IV, Chang JS, Yeh RF, Wiemels JL, Wiencke JK, Tihan T, Conklin BR, Wrensch M. SNPLogic: an interactive single nucleotide polymorphism selection, annotation, and prioritization system. Nucleic Acids Res. 2009;37:D803–D809. doi: 10.1093/nar/gkn756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chelala C, Khan A, Lemoine NR. SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics. 2009;25:655–661. doi: 10.1093/bioinformatics/btn653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gamazon ER, Zhang W, Konkashbaev A, Duan SW, Kistner EO, Nicolae DL, Dolan ME, Cox NJ. SCAN: SNP and copy number annotation. Bioinformatics. 2010;26:259–262. doi: 10.1093/bioinformatics/btp644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fong C, Ko DC, Wasnick M, Radey M, Miller SI, Brittnacher M. GWAS analyzer: integrating genotype, phenotype and public annotation data for genome-wide association study analysis. Bioinformatics. 2010;26:560–564. doi: 10.1093/bioinformatics/btp714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang JB, Ronaghi M, Chong SS, Lee CGL. pfSNP: an integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses. Hum. Mutat. 2011;32:19–24. doi: 10.1002/humu.21331. [DOI] [PubMed] [Google Scholar]
  • 19.Qin HD, Shugart YY, Bei JX, Pan QH, Chen L, Feng QS, Chen LZ, Huang W, Liu JJ, Jorgensen TJ, et al. Comprehensive pathway-based association study of DNA repair gene variants and the risk of nasopharyngeal carcinoma. Cancer Res. 2011;71:3000–3008. doi: 10.1158/0008-5472.CAN-10-0469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Altman RB. PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat. Genet. 2007;39:426. doi: 10.1038/ng0407-426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wang JW, Zhang SL, Schultz RM, Tseng H. Search for basonuclin target genes. Biochem. Biophys. Res. Commun. 2006;348:1261–1271. doi: 10.1016/j.bbrc.2006.07.198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Qin J, Li MJ, Wang P, Zhang MQ, Wang J. ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor. Nucleic Acids Res. 2011;39:W430–W436. doi: 10.1093/nar/gkr332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang G, Chen X, Chan L, Zhang M, Zhu B, Wang L, Zhu X, Zhang J, Zhou B, Wang J. An SNP selection strategy identified IL-22 associating with susceptibility to tuberculosis in Chinese. Sci. Rep. 2011;1:20. doi: 10.1038/srep00020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Li MJ, Sham PC, Wang JW. FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution. Bioinformatics. 2010;26:2897–2899. doi: 10.1093/bioinformatics/btq540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wang JW, Hannenhalli S. A mammalian promoter model links cis elements to genetic networks. Biochem. Biophys. Res. Commun. 2006;347:166–177. doi: 10.1016/j.bbrc.2006.06.062. [DOI] [PubMed] [Google Scholar]
  • 26.Du P, Feng G, Flatow J, Song J, Holko M, Kibbe WA, Lin SM. From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinformatics. 2009;25:i63–i68. doi: 10.1093/bioinformatics/btp193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 2008;83:610–615. doi: 10.1016/j.ajhg.2008.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li MX, Sham PC, Cherny SS, Song YQ. A knowledge-based weighting framework to boost the power of genome-wide association studies. PLoS One. 2010;5:e14480. doi: 10.1371/journal.pone.0014480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD. Cytoscape Web: an interactive web-based network browser. Bioinformatics. 2010;26:2347–2348. doi: 10.1093/bioinformatics/btq430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Adeyemo A, Gerry N, Chen G, Herbert A, Doumatey A, Huang H, Zhou J, Lashley K, Chen Y, Christman M, et al. A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet. 2009;5:e1000564. doi: 10.1371/journal.pgen.1000564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010;6:e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Saccone SF, Quan J, Mehta G, Bolze R, Thomas P, Deelman E, Tischfield JA, Rice JP. New tools and methods for direct programmatic access to the dbSNP relational database. Nucleic Acids Res. 2011;39:D901–D907. doi: 10.1093/nar/gkq1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pennacchio LA, Visel A, Minovitsky S, Dubchak I. VISTA Enhancer Browser—A database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35:D88–D92. doi: 10.1093/nar/gkl822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cui Y, Bao L, Zhou M. CTCFBSDB: a CTCF-binding site database for characterization of vertebrate genomic insulators. Nucleic Acids Res. 2008;36:D83–D87. doi: 10.1093/nar/gkm875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003;31:374–378. doi: 10.1093/nar/gkg108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32:D91–D94. doi: 10.1093/nar/gkh012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2009;37:D77–D82. doi: 10.1093/nar/gkn660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E. The role of site accessibility in microRNA target recognition. Nat. Genet. 2007;39:1278–1284. doi: 10.1038/ng2135. [DOI] [PubMed] [Google Scholar]
  • 39.Betel D, Koppal A, Agius P, Sander C, Leslie C. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol. 2010;11:R90. doi: 10.1186/gb-2010-11-8-r90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yang JO, Kim WY, Bhak J. ssSNPTarget: genome-wide splice-site single nucleotide polymorphism database. Hum. Mutat. 2009;30:E1010–E1020. doi: 10.1002/humu.21128. [DOI] [PubMed] [Google Scholar]
  • 41.Liu X, Jian X, Boerwinkle E. dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 2011;32:894–899. doi: 10.1002/humu.21517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Cheng F, Chen W, Richards E, Deng L, Zeng C. SNP@Evolution: a hierarchical database of positive selection on the human genome. BMC Evol. Biol. 2009;9:221. doi: 10.1186/1471-2148-9-221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wang Y, Joseph SJ, Liu X, Kelly M, Rekaya R. SNPxGE2: a database for human 3-way SNP-expression associations. Nature Precedings. 2011 (doi:10.1038/npre.2011.5704.1; epub ahead of print) [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES