Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2011 Mar 3;27(9):1309–1310. doi: 10.1093/bioinformatics/btr114

GBOOST: a GPU-based tool for detecting gene–gene interactions in genome–wide case control studies

Ling Sing Yung 1,*, Can Yang 1, Xiang Wan 1, Weichuan Yu 1,*
PMCID: PMC3105448  PMID: 21372087

Abstract

Motivation: Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene–gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene–gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene–gene interactions.

Results: We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card.

Availability: GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.

Contact: timyung@ust.hk; eeyu@ust.hk

Supplementary information: Supplementary data are avaliable at Bioinformatics online.

1 INTRODUCTION

Interaction patterns of single-nucleotide polymorphism (SNP) can be used to interpret genetic disease risks in individuals. With the advances of technologies, the burden of collecting genome-wide DNA sequence variations has been lifted. The burst of genotype data leads to an urge of efficient algorithms to analyze many genome-wide association studies (GWAS) data in a reasonable period of time. A comprehensive review (Cordell, 2009) summarized some popular methods in detecting gene–gene interactions. PLINK was recommended as the most computationally feasible method for detecting gene–gene interactions in genome-wide data. It was reported that PLINK finished the pairwise interaction examination of 89 294 SNPs selected from the WTCCC Crohn disease dataset with ∼5000 samples in 14 days (Cordell, 2009).

Recently, Wan et al. (2010) proposed a fast method, named BOOST, to examine all pairwise interactions in genome-wide case–control studies. BOOST completed the pairwise interaction analysis of human genome data with ∼350 000 SNPs and ∼5000 samples on a computer with 3 GHz central processing unit (CPU) and 4 GB memory in 60 h.

However, we can foresee that the growth of data will overwhelm BOOST in the near future. Ma et al. (2008) suggested that the analysis time in GWAS can be largely reduced by parallel computing. The development of graphical processing units (GPUs) enables modern display cards to have hundreds of cores, providing a high memory bandwidth at a low price. A recent GPU implementation of the multifactor dimensionality reduction (MDR) method (Greene et al., 2010) has significantly reduced the time required for detecting gene–gene interactions. The demand of massive memory operations in collecting contingency tables and independence of analyzing different SNP pairs in BOOST make it suitable to implement BOOST in GPU. Here we propose GBOOST, a GPU implementation of BOOST. GBOOST is able to finish the genome-wide interaction analysis of a typical dataset on a single workstation within a few hours.

2 METHODS

GBOOST is a software package targeting at gene–gene interaction analysis of large genome data. It is a C++ parallel implementation of the BOOST method using Compute Unified Device Architecture runtime application programming interface (Corporation, 2008).

The computational burden of BOOST lies in the screening stage. Thus, GBOOST modifies input data structures and parallelizes computations in the screening stage. Figure 1 gives the key differences between BOOST and GBOOST. Please refer to the Supplementary data for the detailed implementation of GBOOST.

Fig. 1.

Fig. 1.

Main steps in BOOST and GBOOST. GBOOST parallelizes the screening step in BOOST to achieve a speedup of 40.

3 RESULTS AND DISCUSSION

Table 1 shows the performance of BOOST and GBOOST on different datasets. It also provides basic visualization by using two publicly available libraries JFreeChart (http://www.jfree.org/jfreechart) and JUNG (http://jung.sourceforge.net/). Figure 2 presents a pathway graph example generated from one GBOOST result.

Table 1.

Running time of BOOST and GBOOST on different datasets

BOOST GBOOST
n = 5000, p = 5000 42 s 1.04 s
n = 5000, p = 10 000 170 s 4.11 s
n = 5003, p = 351 542 60 h 1.34 h

BOOST is tested on a computer with 3 GHz CPU. GBOOST is tested on a computer with a GTX 285 display card. Here n denotes sample size and p denotes number of SNPs.

Fig. 2.

Fig. 2.

A snapshot of a pathway graph generated from 20 interaction pairs from one GBOOST result. Each node is labeled by its SNP name and the node value is the marginal association score from the association analysis. The edge value is the interaction score of the linked nodes (i.e. SNP pairs). Scaling, rotation and translation are available in the pathway graph. Various functions are also avaliable to remove or highlight components in the pathway graph. The layout can be interactively customized.

In our future work, we plan to extend GBOOST to support execution on multiple GPUs and explore new memory optimization techniques.

Supplementary Material

Supplementary Data

ACKNOWLEDGEMENTS

We thank the editor and the anonymous reviewers for their constructive suggestions and comments.

Funding: This work was partially supported with grants RPC10EG04 and PCF004.09/10 from the Hong Kong University of Science and Technology.

Conflict of Interest: none declared.

REFERENCES

  1. Cordell H. Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 2009;10:392–404. doi: 10.1038/nrg2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. NVIDIA. NVIDIA compute unified device architecture programming guide version 2.1. Technical report. 2008 Availabel at: http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.1.pdf. [Google Scholar]
  3. Greene C., et al. Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics. 2010;26:694. doi: 10.1093/bioinformatics/btq009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ma L., et al. Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies. BMC Bioinformatics. 2008;9:315. doi: 10.1186/1471-2105-9-315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Wan X., et al. BOOST: A boolean representation-based method for detecting SNP-SNP interactions in genome-wide association studies. Am. J. Hum. Genet. 2010;87:325–340. doi: 10.1016/j.ajhg.2010.07.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES