Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics

Kazuharu Misawa

doi:10.1002/ggn2.202100066

. 2022 Apr 5;3(3):2100066. doi: 10.1002/ggn2.202100066

Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics

Kazuharu Misawa ^1,^✉

PMCID: PMC9744480 PMID: 36620199

Abstract

Recent advances in sequencing technologies enable genome‐wide analyses for thousands of individuals. The sequential kernel association test (SKAT) is a widely used method to test for associations between a phenotype and a set of rare variants. As the sample size of human genetics studies increases, the computational time required to calculate a kernel is becoming more and more problematic. In this study, a new method to obtain kernel statistics without calculating a kernel matrix is proposed. A simple method for the computation of two kernel statistics, namely, a kernel statistic based on a genetic relationship matrix (GRM) and one based on an identity by state (IBS) matrix, are proposed. By using this method, calculation of the kernel statistics can be conducted using vector calculation without matrix calculation. The proposed method enables one to conduct SKAT for large samples of human genetics.

Keywords: genetic relationship matrix, identity by state, rare variants, sequential kernel association tests

A simple method for the computation of two kernel statistics, one based on a genetic relationship matrix (GRM) and one based on an identity by state (IBS) matrix, is proposed. The proposed method can be used to conduct the sequence kernel association test (SKAT) for large human genetics datasets.

graphic file with name GGN2-3-2100066-g001.jpg

1. Introduction

A very large number of human genome sequences are now available for the study of human genetics, because of recent advances in genome sequencers. A recent study has shown that rare variants substantially contribute to phenotype variation.^[ ¹ ^] Because each linkage disequilibrium block can be analyzed independently, increases in the number of sites can be tackled with parallel computation.^[ ² ^] However, the statistical power of classical single‐marker association analysis for rare variants is quite limited.

To address this challenge, rare and low‐frequency variants are often grouped into gene or pathway levels, and the effects of multiple variants evaluated are based on collapsing methods.^[ ³ , ⁴ ^] The sequential kernel association test (SKAT)^[ ⁵ , ⁶ ^] is one such popular method. SKAT applies a test statistic S, which is defined by the quadratic form, S = y ^T Ky , where y is column vector of the phenotype defined by Equation (1).

y = {(y (1) y (2) \cdot \cdot \cdot y (n))}^{T}

(1)

where y(i) is the phenotype value of the i‐th individual and n is the sample size. In the following section, we assume the average of the elements of y is 0.

Evaluating the probability density of the null distribution of S is important for conducting SKAT, but it requires computing a matrix related to the genotype covariance between markers, which requires a very long computational time. When the length of y is n and the size of the matrix K is n ². For example, when the number of people is 10 000, the size of the kernel matrix is 100 000 000. A genetic relationship matrix (GRM) among individuals is used in genome‐wide complex trait analysis^[ ⁷ ^] and in principal component analysis.^[ ⁸ ^] Identity by state (IBS) defines similarity between individuals as the number of shared alleles. The IBS kernel is used in linear regression^[ ⁹ , ¹⁰ ^] and SKAT.^[ ⁵ ^]

The aim of the present study is to develop simple methods for the computation of these two kernel statistics without calculating a GRM and an IBS matrix explicitly.

2. Theory

2.1. Genotype Value Vectors

In the present study, all sites are assumed to be biallelic, namely, each site has a reference allele and an alternative allele. Let us define genotype value vectors at site $k$ . Let $a_{k} (i)$ be 1 when the individual $i$ is a homozygote of a reference allele at site $k$ , otherwise $a_{k} (i) = 0$ . Let $b_{k} (i)$ be 1 when the individual $i$ is the heterozygote at site $k$ , otherwise $b_{k} (i) = 0$ . Let $c_{k} (i)$ be 1 when the individual $i$ is a homozygote of an alternative allele at site $k$ , otherwise $c_{k} (i) = 0$ . The vectors a_k , b_k , c_k are defined by Equation (2).

\begin{matrix} a_{k} & = & {(\begin{matrix} \begin{matrix} a_{k} (1) & a_{k} (2) \end{matrix} & \begin{matrix} \dots & a_{k} (n) \end{matrix} \end{matrix})}^{T} \\ b_{k} & = & {(\begin{matrix} \begin{matrix} b_{k} (1) & b_{k} (2) \end{matrix} & \begin{matrix} \dots & b_{k} (n) \end{matrix} \end{matrix})}^{T} \\ c_{k} & = & {(\begin{matrix} \begin{matrix} c_{k} (1) & c_{k} (2) \end{matrix} & \begin{matrix} \dots & c_{k} (n) \end{matrix} \end{matrix})}^{T} \end{matrix}

(2)

Let us denote a_k , b_k , and c_k as the genotype value vectors. Because $a_{k} (i) + b_{k} (i) + c_{k} (i) = 1$ , it is worth noting that

a_{k} + b_{k} + c_{k} = 1

(3)

where 1 is defined by 1 = ${(\begin{matrix} \begin{matrix} 1 & 1 \end{matrix} & \begin{matrix} \dots & 1 \end{matrix} \end{matrix})}^{T}$ The alternative allele frequency, $p$ , is obtained by $p = ({b_{k}}^{T} 1 + 2 {c_{k}}^{T} 1) / (2 n)$ . In the following section, the Hardy–Weinberg equilibrium is assumed for this site. Namely, the frequencies of heterozygotes and homozygotes of the alternative alleles are $2 p (1 - p)$ and $p^{2}$ , respectively.

2.2. The GRM Kernel

The allele values are 0 for the reference allele, and 1 for the alternative allele. The separator between the alleles is “/” as used in the variant call format.^[ ¹¹ ^] Let $g_{k} (i)$ be the genotype value for individual $i$ at site $k$ . In the present study, $g_{k} (i)$ is the number of alternative alleles. The relationship between the genotype of the individual $i$ at site $k$ and $g_{k} (i)$ is shown in Table 1 . The vector g_k is defined by Equation (4).

g_{k} = {(\begin{matrix} \begin{matrix} g_{k} (1) & g_{k} (2) \end{matrix} & \begin{matrix} \cdot \cdot \cdot & g_{k} (n) \end{matrix} \end{matrix})}^{T}

(4)

Table 1.

Relationship between genotype values and the number of alternative alleles

Genotype

a_{k} (i)

b_{k} (i)

c_{k} (i)

g_{k} (i)

0/0

0/1

1/1

Open in a new tab

Table 1 displays the relationships among allelic states $a_{k} (i)$ , $b_{k} (i), c_{k} (i)$ , and $g_{k} (i)$ . Because $g_{k} (i) = b_{k} (i) + 2 c_{k} (i)$ , g_k is obtained by g_k = b_k + 2 c_k .

Let us denote the GRM at site k as X _k . We subtract the mean $μ_{k} = {\sum_{i = 1}^{n} g_{k} (i)} / n$ to obtain a matrix with row sums equal to 0. The ij‐th element of X _k at site k is obtained using Equation (5).

\begin{matrix} X_{k} (i, j) & = & \{g_{k} (i) - μ_{k}\} \{g_{k} (j) - μ_{k}\} \\ = & g_{k} (i) g_{k} (j) - μ_{k} g_{k} (i) - μ_{k} g_{k} (j) + {μ_{k}}^{2} \end{matrix}

(5)

\begin{matrix} X_{k} & = & (\begin{matrix} \begin{matrix} g_{k} (1) g_{k} (1) & g_{k} (1) g_{k} (2) \\ g_{k} (2) g_{k} (1) & g_{k} (2) g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (1) g_{k} (n) \\ \dots & g_{k} (2) g_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ g_{k} (n) g_{k} (1) & g_{k} (n) g_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & g_{k} (n) g_{k} (n) \end{matrix} \end{matrix}) \\ - μ_{k} (\begin{matrix} \begin{matrix} g_{k} (1) & g_{k} (1) \\ g_{k} (2) & g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (1) \\ \dots & g_{k} (2) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ g_{k} (n) & g_{k} (n) \end{matrix} & \begin{matrix} ⋮ \\ \dots & g_{k} (n) \end{matrix} \end{matrix}) \\ - μ_{k} (\begin{matrix} \begin{matrix} g_{k} (1) & g_{k} (2) \\ g_{k} (1) & g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (n) \\ \dots & g_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ g_{k} (1) & g_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & g_{k} (n) \end{matrix} \end{matrix}) + {μ_{k}}^{2} (\begin{matrix} \begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix} & \begin{matrix} \dots & 1 \\ \dots & 1 \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ 1 & 1 \end{matrix} & \begin{matrix} ⋮ \\ \dots & 1 \end{matrix} \end{matrix}) \end{matrix}

(6)

Subsequently, the matrix X _k can be obtained using the genotype value vectors.

Let us define a new matrix G _k . As shown in Table 1, the ij‐th element of G _k at site k can be calculated using Equation (7).

G_{k} (i, j) = g_{k} (i) g_{k} (j)

(7)

Subsequently, the matrix G _k would be obtained using the genotype value vectors.

\begin{matrix} G_{k} & = & (\begin{matrix} \begin{matrix} g_{k} (1) g_{k} (1) & g_{k} (1) g_{k} (2) \\ g_{k} (2) g_{k} (1) & g_{k} (2) g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (1) g_{k} (n) \\ \dots & g_{k} (2) g_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ g_{k} (n) g_{k} (1) & g_{k} (n) g_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & g_{k} (n) g_{k} (n) \end{matrix} \end{matrix}) \\ = & (\begin{matrix} \begin{matrix} g_{k} (1) \\ g_{k} (2) \end{matrix} \\ \begin{matrix} ⋮ \\ g_{k} (n) \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} g_{k} (1) & g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (n) \end{matrix} \end{matrix}) = g_{k} {g_{k}}^{T} \\ (\begin{matrix} \begin{matrix} g_{k} (1) & g_{k} (1) \\ g_{k} (2) & g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (1) \\ \dots & g_{k} (2) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ g_{k} (n) & g_{k} (n) \end{matrix} & \begin{matrix} ⋮ \\ \dots & g_{k} (n) \end{matrix} \end{matrix}) = 1 {g_{k}}^{T} \\ (\begin{matrix} \begin{matrix} g_{k} (1) & g_{k} (2) \\ g_{k} (1) & g_{k} (2) \end{matrix} & \begin{matrix} \dots & g_{k} (n) \\ \dots & g_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ g_{k} (1) & g_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & g_{k} (n) \end{matrix} \end{matrix}) = g_{k} 1^{T} \\ (\begin{matrix} \begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix} & \begin{matrix} \dots & 1 \\ \dots & 1 \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ 1 & 1 \end{matrix} & \begin{matrix} ⋮ \\ \dots & 1 \end{matrix} \end{matrix}) = 11^{T} \end{matrix}

(8)

Therefore, we obtain Equation (9).

X_{k} = G_{k} - μ 1 {g_{k}}^{T} - μ g_{k} 1^{T} + μ^{2} 11^{T}

(9)

It is worth noting that

y^{T} X_{k} y = y^{T} G_{k} y + μ y^{T} 1 {g_{k}}^{T} y + μ y^{T} g_{k} 1^{T} y + μ^{2} y^{T} 11^{T} y

(10)

Because $y^{T} 1 = 1^{T} y = 0$ , we obtain

y^{T} X_{k} y = y^{T} G_{k} y

(11)

By using the distributivity and associativity of matrix production, we obtain

G_{k} = (b_{k} + 2 c) {(b_{k} + 2 c_{k})}^{T}

(12)

Q_k is a scalar value of site k defined by Equation (13):

Q_{k} = y^{T} G_{k} y = y^{T} (b_{k} + 2 c_{k}) {(b_{k} + 2 c_{k})}^{T} y = {(y^{T} b_{k} + 2 y^{T} c_{k})}^{2}

(13)

because the transpose of a product of matrices is the product, in the reverse order, of the transposes of the factors. Note $y^{T} b_{k} + 2 y^{T} c_{k}$ is a scalar that can be obtained as

y^{T} b_{k} + 2 y^{T} c_{k} = \sum_{i = 1}^{n} y (i) \{b_{k} (i) + 2 c_{k} (i)\}

(14)

2.3. The IBS Kernel

IBS defines similarity between individuals as the number of shared alleles. The IBS kernel is used in linear regression^[ ⁹ , ¹⁰ ^] and the SKAT.^[ ⁵ ^] Let $IB S_{k} (i)$ be the ij‐th element of the IBS matrix, IBS _k , at site $k$ , which denotes the number of shared alleles by subjects $i$ and $j$ at site $k$ .

Table 2 displays the relationships between genotype values and IBS. From the table, we can observe the following relationship among genotype value vectors and the IBS matrix.

IB S_{k} (i, j) = 2 a_{k} (i) a_{k} (j) + b_{k} (i) + b_{k} (j) + 2 c_{k} (i) c_{k} (j)

(15)

Table 2.

Relationship between genotype values and identities by state (IBS)

Individual

i

Individual

j

Genotype

a_{k} (i)

b_{k} (i)

c_{k} (i)

Genotype

a_{k} (j)

b_{k} (j)

c_{k} (j)

IBS

0/0

0/1

0/0

1/1

0/1

0/0

0/1

1/1

0/0

1/1

0/1

1/1

Open in a new tab

Thus, the IBS matrix at site $k$ is obtained by

\begin{matrix} IB S_{k} & = & 2 (\begin{matrix} \begin{matrix} a_{k} (1) a_{k} (1) & a_{k} (1) a_{k} (2) \\ a_{k} (2) a_{k} (1) & a_{k} (2) a_{k} (2) \end{matrix} & \begin{matrix} \dots & a_{k} (1) a_{k} (n) \\ \dots & a_{k} (2) a_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ a_{k} (n) a_{k} (1) & a_{k} (n) a_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & a_{k} (n) a_{k} (n) \end{matrix} \end{matrix}) \\ + (\begin{matrix} \begin{matrix} b_{k} (1) & b_{k} (1) \\ b_{k} (2) & b_{k} (2) \end{matrix} & \begin{matrix} \dots & b_{k} (1) \\ \dots & b_{k} (2) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ b_{k} (n) & b_{k} (n) \end{matrix} & \begin{matrix} ⋮ \\ \dots & b_{k} (n) \end{matrix} \end{matrix}) \\ + (\begin{matrix} \begin{matrix} b_{k} (1) & b_{k} (2) \\ b_{k} (1) & b_{k} (2) \end{matrix} & \begin{matrix} \dots & b_{k} (n) \\ \dots & b_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ b_{k} (1) & b_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & b_{k} (n) \end{matrix} \end{matrix}) \\ + 2 (\begin{matrix} \begin{matrix} c_{k} (1) c_{k} (1) & c_{k} (1) c_{k} (2) \\ c_{k} (2) c_{k} (1) & c_{k} (2) c_{k} (2) \end{matrix} & \begin{matrix} \dots & c_{k} (1) c_{k} (n) \\ \dots & c_{k} (2) c_{k} (n) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ \\ c_{k} (n) c_{k} (1) & c_{k} (n) c_{k} (2) \end{matrix} & \begin{matrix} ⋮ \\ \dots & c_{k} (n) c_{k} (n) \end{matrix} \end{matrix}) \\ = & 2 a_{k} {a_{k}}^{T} + 1 {b_{k}}^{T} + b_{k} 1^{T} + 2 c_{k} {c_{k}}^{T} \end{matrix}

(16)

$R_{k}$ is a scalar value of site k defined by Equation (17).

R_{k} = y^{T} IB S_{k} y

(17)

By using the distributivity and associativity of matrix production, we obtain

\begin{matrix} R_{k} & = & y^{T} (a_{k} {a_{k}}^{T} + 1 {b_{k}}^{T} + b_{k} 1^{T} + c_{k} {c_{k}}^{T}) y \\ = & 2 y^{T} (a_{k} {a_{k}}^{T}) y + y^{T} 1 {b_{k}}^{T} y + y^{T} b_{k} 1^{T} y + 2 y^{T} (c_{k} {c_{k}}^{T}) y \\ = & 2 {(y^{T} a_{k})}^{2} + 2 {(y^{T} c_{k})}^{2} \end{matrix}

(18)

$y^{T} a_{k}$ and $y^{T} c_{k}$ are scalars that can be obtained using the inner product of two vectors. By using Equation (3), we can obtain $y^{T} a_{k} = y^{T} 1 - y^{T} b_{k} - y^{T} c_{k}$ .

When multiple single‐nucleotide polymorphisms (SNPs) are investigated, the entire GRM and IBS matrices are obtained using $S = \sum_{k = 1}^{l} w_{k} Q_{k}$ and $S = \sum_{k = 1}^{l} w_{k} R_{k}$ , respectively, where $w_{k}$ is weight of site $k$ and $l$ is the number of sites. SKAT allows the incorporation of flexible weight functions.^[ ¹² ^] Weights can normalize each data column to have the same variance^[ ¹³ ^] and can increase the power of tests.^[ ¹³ ^]

2.4. Computer Simulations

To evaluate the new method, I performed computer simulations. The python scripts used in the computer simulation are shown in Material 1, Supporting Information. The usages are in Material 2, Supporting Information. This program is ready for data analysis.

2.4.1. Genotype Selection

SNPs on the SLC22A2 gene that are known to affect uric acid levels^[ ¹ , ¹⁴ , ¹⁵ , ¹⁶ ^] were selected. Then, genetic variation of these SNPs of 2504 individuals were downloaded from the 1000 Genomes Project. Monomorphic sites were excluded. As a result, the sites in Table 3 were used in the computer simulation.

Table 3.

SNPs used in the computer simulation

Chromosome	Position on hg19	rsID
11	64360996	rs552232030
11	64361124	rs201136391
11	64361219	rs121907892
11	64366298	rs150255373
11	64367290	rs563239942
11	64368212	rs200104135
11	64368968	rs528619562

Open in a new tab

2.4.2. Phenotype Generations

The heterozygous individuals and the homozygous individuals of alternative allele of the uric acid level were set to be 1.0 µg dL⁻¹ lower than the homozygotes of the reference allele. A random variable that follows the normal distribution with mean 0.0 µg dL⁻¹ and standard deviation 1.0 µg dL⁻¹ was added to the uric acid level of each individual in the simulation as an environmental factor of uric acids level. These values are similar to the observed values.^[ ¹ ^]

2.4.3. Calculation of Test Statistics and Permutation Tests

For each of these phenotypes, the test statistics of GRM and IBS were observed. Then the permutation tests were performed with 1 000 000 permutations to calculate the probability of exceeding the observed score. The significance level was set to be 5 × 10^–6, because the number of tests of genome‐wide SKAT will be 10⁴. Each permutation test was repeated ten times (n = 10).

3. Results

Table 4 shows that there is no significant difference between the GRM and IBS in the statistical power (the chi‐square test, n = 10, P > 5%). Table 4 also shows that permutation tests can be conducted in a short period of time by using the methods proposed in this study.

Table 4.

Power and Computational time of the GRM and IBS tests

Method	The number of tests that reject the null hypothesis	Time
GRM	2 out of 10	1 min 38 s
IBS	3 out of 10	1 min 47 s

Open in a new tab

4. Discussion

We demonstrate that necessary variant/phenotype association test statistics can be obtained without obtaining eigenvalues and eigenvectors of GRM and IBS matrices, in the present study. The method is referred to as genotype value decomposition. The new methods proposed in this study are conducted with computational time of $O (n)$ , where $n$ is the sample size. Notably, these new methods are applicable for common variants as well as rare variants, even though the methods were developed for the association tests for rare variants. Sparse matrix computation can be used when all of variants are rare.

When the alternative allele frequency is very small, homozygotes of the alternative allele are very rare, so that c_k is ignorable. In other words, $Q_{k}$ can be approximately obtained by $Q_{k} \approx {(y^{T} b_{k})}^{2}$ . Under the same condition, ${(y^{T} a_{k})}^{2} \approx {(y^{T} b_{k})}^{2}$ and ${(y^{T} c_{k})}^{2} \approx 0$ , so that $R_{k}$ is approximately equal to $2 Q_{k}$ .

On one hand, when all sites are independent, the necessary probability density functions can be calculated using convolution of the probability density functions of all sites. On the other hand, it is difficult to obtain convolution of the probability density functions when the sites are linked and dependent on each other. In such cases, a permutation test is used.^[ ⁶ ^]

Because the statistics calculated by the new method are not approximations but exact values, the null distributions of these statistics are exactly the same as the test statistics with calculating GRM and IBS matrices. Wu et al.^[ ⁵ ^] showed that the test statistics approximately follow the chi‐square distribution. Furthermore, because the distribution is derived from an asymptotic distribution of its statistics, the p‐values for datasets with an insufficient number of samples may be inaccurate, which could cause inflation or power loss.^[ ^] In a permutation test, the test statistic null distribution can be approximated by fully resampling the observed traits without replacement. The proposed method can be useful for reducing computational time to obtain p‐values using resampling methods.

5. Conclusion

In the present paper, a genotype value decomposition method is proposed for handling the kernel matrices. The method can be referred to as genotype value decomposition. By using this method, calculation of the null distribution of the kernel statistics can be conducted with time complexity O(n). The proposed method enables one to conduct SKAT for large samples of human genetics.

Conflict of Interest

The author declares no conflict of interest.

Peer Review

The peer review history for this article is available in the Supporting Information for this article.

Supporting information

Supporting Information

Click here for additional data file.^{(151.9KB, pdf)}

Supplementary Information: Record of Transparent Peer Review

Click here for additional data file.^{(145.9KB, pdf)}

Acknowledgements

The author thanks Dr. Naomichi Matsumoto for his suggestions and encouragement. This work was supported by JSPS KAKENHI Grant Numbers JP17K08682, JP19K22647, JP20K07316. The author also thanks Steven M. Thompson, from Edanz Group for editing a draft of this manuscript.

Misawa K., Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics. Advanced Genetics 2022, 3, 2100066. 10.1002/ggn2.202100066

Data Availability Statement

The python code used in the study is available at https://github.com/kazumisawa/paraHaplo5 under the MIT license.

References

1. Misawa K., Hasegawa T., Mishima E., Jutabha P., Ouchi M., Kojima K., Kawai Y., Matsuo M., Anzai N., Nagasaki M., Genetics 2020, 214, 1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Misawa K., Kamatani N., Source Code Biol. Med. 2009, 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Lin D. Y., Tang Z. Z., Am. J. Hum. Genet. 2011, 89, 354. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Larson N. B., Chen J., Schaid D. J., Genet. Epidemiol. 2019, 43, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Wu M. C., Lee S., Cai T., Li Y., Boehnke M., Lin X., Am. J. Hum. Genet. 2011, 89, 82. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Hasegawa T., Kojima K., Kawai Y., Misawa K., Mimori T., Nagasaki M., BMC Genomics 2016;17, 745. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Yang J., Lee S. H., Goddard M. E., Visscher P. M., Am. J. Hum. Genet. 2011, 88, 76. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D., Nat. Genet. 2006, 38, 904. [DOI] [PubMed] [Google Scholar]
9. Kwee L. C., Liu D., Lin X., Ghosh D., Epstein M. P., Am. J. Hum. Genet. 2008, 82, 386. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Wessel J., Schork N. J., Am. J. Hum. Genet. 2006, 79, 792. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A., Handsaker R. E., Lunter G., Marth G. T., Sherry S. T., McVean G., Durbin R., Genomes Project Analysis Group , Bioinformatics 2011, 27, 2156.21653522 [Google Scholar]
12. Patterson N., Price A. L., Reich D., PLoS Genet. 2006, 2, e190. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Zhang J., Wu B., Sha Q., Zhang S., Wang X., Genet. Epidemiol. 2019, 43, 966. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Enomoto A., Kimura H., Chairoungdua A., Shigeta Y., Jutabha P., Cha S. H., Hosoyamada M., Takeda M., Sekine T., Igarashi T., Matsuo H., Kikuchi Y., Oda T., Ichida K., Hosoya T., Shimokata K., Niwa T., Kanai Y., Endou H., Nature 2002, 417, 447. [DOI] [PubMed] [Google Scholar]
15. Tin A., Li Y., Brody J. A., Nutile T., Chu A. Y., Huffman J. E., Yang Q., Chen M. H., Robinson‐Cohen C., Mace A., Liu J., Demirkan A., Sorice R., Sedaghat S., Swen M., Yu B., Ghasemi S., Teumer A., Vollenweider P., Ciullo M., Li M., Uitterlinden A. G., Kraaij R., Amin N., van Rooij J., Kutalik Z., Dehghan A., McKnight B., van Duijn C. M., Morrison A., et al., Nat. Commun. 2018, 9, 4228. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Claverie‐Martin F., Trujillo‐Suarez J., Gonzalez‐Acosta H., Aparicio C., Justa Roldan M. L., Stiburkova B., Ichida K., Martin‐Gomez M. A., Herrero Goni M., Carrasco Hidalgo‐Barquero M., Inigo V., Enriquez R., Cordoba‐Lanus E., Garcia‐Nieto V. M., RenalTube G., Clin. Chim. Acta 2018, 481, 83. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Click here for additional data file.^{(151.9KB, pdf)}

Supplementary Information: Record of Transparent Peer Review

Click here for additional data file.^{(145.9KB, pdf)}

Data Availability Statement

The python code used in the study is available at https://github.com/kazumisawa/paraHaplo5 under the MIT license.

[ggn2202100066-bib-0001] 1. Misawa K., Hasegawa T., Mishima E., Jutabha P., Ouchi M., Kojima K., Kawai Y., Matsuo M., Anzai N., Nagasaki M., Genetics 2020, 214, 1079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0002] 2. Misawa K., Kamatani N., Source Code Biol. Med. 2009, 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0003] 3. Lin D. Y., Tang Z. Z., Am. J. Hum. Genet. 2011, 89, 354. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0004] 4. Larson N. B., Chen J., Schaid D. J., Genet. Epidemiol. 2019, 43, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0005] 5. Wu M. C., Lee S., Cai T., Li Y., Boehnke M., Lin X., Am. J. Hum. Genet. 2011, 89, 82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0006] 6. Hasegawa T., Kojima K., Kawai Y., Misawa K., Mimori T., Nagasaki M., BMC Genomics 2016;17, 745. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0007] 7. Yang J., Lee S. H., Goddard M. E., Visscher P. M., Am. J. Hum. Genet. 2011, 88, 76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0008] 8. Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D., Nat. Genet. 2006, 38, 904. [DOI] [PubMed] [Google Scholar]

[ggn2202100066-bib-0009] 9. Kwee L. C., Liu D., Lin X., Ghosh D., Epstein M. P., Am. J. Hum. Genet. 2008, 82, 386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0010] 10. Wessel J., Schork N. J., Am. J. Hum. Genet. 2006, 79, 792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0011] 11. Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A., Handsaker R. E., Lunter G., Marth G. T., Sherry S. T., McVean G., Durbin R., Genomes Project Analysis Group , Bioinformatics 2011, 27, 2156.21653522 [Google Scholar]

[ggn2202100066-bib-0012] 12. Patterson N., Price A. L., Reich D., PLoS Genet. 2006, 2, e190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0013] 13. Zhang J., Wu B., Sha Q., Zhang S., Wang X., Genet. Epidemiol. 2019, 43, 966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0014] 14. Enomoto A., Kimura H., Chairoungdua A., Shigeta Y., Jutabha P., Cha S. H., Hosoyamada M., Takeda M., Sekine T., Igarashi T., Matsuo H., Kikuchi Y., Oda T., Ichida K., Hosoya T., Shimokata K., Niwa T., Kanai Y., Endou H., Nature 2002, 417, 447. [DOI] [PubMed] [Google Scholar]

[ggn2202100066-bib-0015] 15. Tin A., Li Y., Brody J. A., Nutile T., Chu A. Y., Huffman J. E., Yang Q., Chen M. H., Robinson‐Cohen C., Mace A., Liu J., Demirkan A., Sorice R., Sedaghat S., Swen M., Yu B., Ghasemi S., Teumer A., Vollenweider P., Ciullo M., Li M., Uitterlinden A. G., Kraaij R., Amin N., van Rooij J., Kutalik Z., Dehghan A., McKnight B., van Duijn C. M., Morrison A., et al., Nat. Commun. 2018, 9, 4228. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ggn2202100066-bib-0016] 16. Claverie‐Martin F., Trujillo‐Suarez J., Gonzalez‐Acosta H., Aparicio C., Justa Roldan M. L., Stiburkova B., Ichida K., Martin‐Gomez M. A., Herrero Goni M., Carrasco Hidalgo‐Barquero M., Inigo V., Enriquez R., Cordoba‐Lanus E., Garcia‐Nieto V. M., RenalTube G., Clin. Chim. Acta 2018, 481, 83. [DOI] [PubMed] [Google Scholar]

PERMALINK

Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics

Kazuharu Misawa

Abstract

1. Introduction

2. Theory

2.1. Genotype Value Vectors

2.2. The GRM Kernel

Table 1.

2.3. The IBS Kernel

Table 2.

2.4. Computer Simulations

2.4.1. Genotype Selection

Table 3.

2.4.2. Phenotype Generations

2.4.3. Calculation of Test Statistics and Permutation Tests

3. Results

Table 4.

4. Discussion

5. Conclusion

Conflict of Interest

Peer Review

Supporting information

Acknowledgements

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics

Kazuharu Misawa

Abstract

1. Introduction

2. Theory

2.1. Genotype Value Vectors

2.2. The GRM Kernel

Table 1.

2.3. The IBS Kernel

Table 2.

2.4. Computer Simulations

2.4.1. Genotype Selection

Table 3.

2.4.2. Phenotype Generations

2.4.3. Calculation of Test Statistics and Permutation Tests

3. Results

Table 4.

4. Discussion

5. Conclusion

Conflict of Interest

Peer Review

Supporting information

Acknowledgements

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases