Rejoinder to “A note on testing and estimation in marker-set association study using semiparametric quantile regression kernel machine”

Dehan Kong; Arnab Maity; Fang-Chi Hsu; Jung-Ying Tzeng

doi:10.1111/biom.12786

. Author manuscript; available in PMC: 2018 Jun 27.

Published in final edited form as: Biometrics. 2017 Nov 2;74(2):767–768. doi: 10.1111/biom.12786

Rejoinder to “A note on testing and estimation in marker-set association study using semiparametric quantile regression kernel machine”

Dehan Kong ¹, Arnab Maity ², Fang-Chi Hsu ³, Jung-Ying Tzeng ⁴

PMCID: PMC5932282 NIHMSID: NIHMS907052 PMID: 29096038

The authors replied as follows:

We applaud Zhan and Wu (2017) for presenting an alternative strategy for p-value calculation of quantile regression kernel machine (QRKM) (Kong et al., 2016). In the QRKM paper, we considered the problem of estimation and testing for association between a phenotype of interest and a marker-set under the QRKM framework, where the response values may contain outliers or have a skewed or heavy-tailed distribution. Zhan and Wu (2017) considered the testing problem in our paper and developed a fast QRKM (FQRKM) test based on a fast permutation method, which significantly improves the computational speed.

The testing procedure in QRKM is not fast mainly due to the heavy computation cost to calculate the null distribution of the score-type statistic. To mimic the null distribution, we need to fit the full model once and obtain the residuals, where the bootstrap tuning takes a significant amount of time. We then need to fit the null model N times, and calculate the permuted statistics for N times, where N is the number of permutations needed. This step also takes some time to run. Zhan and Wu (2017) proposed a neat idea to overcome the computational barrier. Instead of using explicit permutations to obtain the null distribution of the QRKM statistic, they considered a modified version, denoted by FQRKM statistic, which has a close connection with fast permutation testing strategies (Josse et al., 2008) developed for RV-type statistic. The main advantage of using the FQRKM statistic is that the null permutation distribution can be approximated by a simpler and known distribution. They adopted the Pearson type III approximation, where fast and analytical p-value calculation is allowed.

Zhan and Wu (2017) have shown that FQRKM test has similar performance as QRKM test regarding type I error and power using simulations. The minor issue of the FQRKM test is that the type I error is slightly inflated in lower or upper quantile when the sample size is small. We suspect that the Pearson type III approximation may not be very accurate under these scenarios, which causes the slightly inflation of the type I error. We have further applied the FQRKM test to the Vitamin Intervention for Stroke Prevention (VISP) Trial studied in Kong et al. (2016). We perform the same analysis on the nine candidate genes with 1587 subjects, considering the Homocysteine level at the quantiles of 0.1, 0.5 and 0.8 as our response variable, and using FQRKM test instead of our original QRKM test. The total computational time of FQRKM test is less than half an hour for all genes and all quantile levels, which is significantly faster than our original QRKM test, where we had to use separate node for each gene and each quantile. The test results are summarized in Table 1. From the results, we can see that the FQRKM test yields similar p-values as the QRKM test for each gene and each quantile level. We also find the gene CBS is significant at the quantiles 0.5 and 0.8, and the gene TCN1 is significant at the quantile 0.8 after Bonferroni correction (i.e., 0.05/(9 × 3) = 0.00185). The conclusions based on FQRKM are the same as our original QRKM test. The simulation results in Zhan and Wu (2017), as well as the VISP data analysis, show that FQRKM test has very similar performance as the QRKM test. Since FQRKM test is significantly faster than QRKM test, we recommend the readers to use the FQRKM test instead if they want to test for association between the genetic covariates and the response under the QRKM framework when the sample size is not small. For the estimation under the QRKM, the readers may still use the procedure proposed in Kong et al. (2016).

Table 1.

Results from the VISP trial. Columns 2–4 display the p-values of QRKM test for the 9 genes on different quantiles 0.1, 0.5, 0.8. Columns 5–7 display the p-values of FQRKM test for the 9 genes on different quantiles 0.1, 0.5, 0.8.

Genename	QRKM, τ = 0.1	QRKM, τ = 0.5	QRKM, τ = 0.8	FQRKM, τ = 0.1	FQRKM, τ = 0.5	FQRKM, τ = 0.8
BHMT	0.072	0.108	0.263	0.052	0.153	0.332
BHMT2	0.222	0.371	0.748	0.218	0.463	0.734
CBS	0.013	0.0004	0.0004	0.010	0.0002	0.0008
CTH	0.559	0.432	0.820	0.404	0.384	1.000
MTHFR	0.745	0.478	0.429	0.913	0.505	0.611
MTR	0.785	0.864	0.572	0.718	0.760	0.763
MTRR	0.020	0.070	0.860	0.018	0.062	0.999
TCN1	0.808	0.218	0.0006	0.841	0.195	0.0009
TCN2	0.455	0.622	0.835	0.398	0.703	0.937

Open in a new tab

Finally, we want to bring the readers’ attention to two interesting problems for future research. The first problem is related to the tuning method in our QRKM paper. In our estimation procedure, we used the bootstrap tuning, which has high computational cost especially when sample size and dimension of the covariates are large. We adopted the bootstrap tuning mainly for two reasons. First, it is more stable. We had tried other tuning methods such as Schwarz information criterion (Schwarz, 1978), cross validation and generalized approximate cross validation (Yuan, 2006), although these tuning methods worked well in some Monte Carlo runs, they failed in other cases, ending up with very bad estimates. There is a tradeoff between computation speed and stability of the solution. Second, by using bootstrap tuning, we can get standard error estimates for β̂ and β̂₀ as a byproduct; however, the standard error estimates can also be obtained if the asymptotic distribution of the estimates can be derived. Studying the asymptotic theories under the QRKM would be an interesting research topic. The second problem in QRKM/FQRKM is related to the multiple testing procedure in the VISP real data analysis. Currently, we use the Bonferroni correction, which may be very conservative because the estimates across different quantiles are in fact correlated. It is worth further investigation on multiple testing procedures across different quantiles.

Acknowledgments

Kong’s research was partially supported by the Natural Sciences and Engineering Research Council of Canada. Maity’s research was partially supported by NIH grant R00 ES017744 and a NCSU Faculty Research and Professional Development (FRPD) grant. Hsu’s research was partially supported by NIH grant U01 HG005160. Tzeng’s research was partially supported by NIH grant P01 CA142538.

Contributor Information

Dehan Kong, Department of Statistical Sciences, University of Toronto, Ontario, Canada.

Arnab Maity, Department of Statistics, North Carolina State University, North Carolina, U.S.A.

Fang-Chi Hsu, Department of Biostatistical Sciences, Wake Forest University, North Carolina, U.S.A.

Jung-Ying Tzeng, Department of Statistics and Bioinformatics Research Center, North Carolina State University, North Carolina, U.S.A. Department of Statistics, National Cheng-Kung University, Taiwan.

References

Josse J, Pagès J, Husson F. Testing the significance of the RV coefficient. Computational Statistics & Data Analysis. 2008;53:82–91. [Google Scholar]
Kong D, Maity A, Hsu FC, Tzeng JY. Testing and estimation in marker-set association study using semiparametric quantile regression kernel machine. Biometrics. 2016;72:364–371. doi: 10.1111/biom.12438. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwarz G. Estimating the Dimension of a Model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
Yuan M. GACV for quantile smoothing splines. Computational Statistics & Data Analysis. 2006;50:813–829. [Google Scholar]
Zhan X, Wu M. A note on testing and estimation in marker-set association study using semiparametric quantile regression kernel machine. Biometrics. 2017 doi: 10.1111/biom.12785. page to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Josse J, Pagès J, Husson F. Testing the significance of the RV coefficient. Computational Statistics & Data Analysis. 2008;53:82–91. [Google Scholar]

[R2] Kong D, Maity A, Hsu FC, Tzeng JY. Testing and estimation in marker-set association study using semiparametric quantile regression kernel machine. Biometrics. 2016;72:364–371. doi: 10.1111/biom.12438. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Schwarz G. Estimating the Dimension of a Model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]

[R4] Yuan M. GACV for quantile smoothing splines. Computational Statistics & Data Analysis. 2006;50:813–829. [Google Scholar]

[R5] Zhan X, Wu M. A note on testing and estimation in marker-set association study using semiparametric quantile regression kernel machine. Biometrics. 2017 doi: 10.1111/biom.12785. page to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Rejoinder to “A note on testing and estimation in marker-set association study using semiparametric quantile regression kernel machine”

Dehan Kong

Arnab Maity

Fang-Chi Hsu

Jung-Ying Tzeng

Table 1.

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Rejoinder to “A note on testing and estimation in marker-set association study using semiparametric quantile regression kernel machine”

Dehan Kong

Arnab Maity

Fang-Chi Hsu

Jung-Ying Tzeng

Table 1.

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases