#In our simulations, let: #1. Y represent a continuous phenotype measure. #2. X represent the allele frequency of a particular SNP. #3. PC1 and PC2 represent the first two principle components of the genetic related matrix of the colonies. #4. E represent a normally distributed random error term with mean 0 and variance σ2. #We generate observations according to the linear regression model: #Y = β0 + β1X + β2PC1 + β3PC2 + E (1) #Under this regression model, σ2 quantifies the residual variance of Y. #We first generate n = 9 observations of X, PC1, and PC2. The values of X are generated #from a uniform distribution between 0 and 0.5, and PC1 and PC2 are generated from normal distributions #and then transformed to be orthonormal to each other, to be consistent with the behavior of principle #components. We fix the seed of the random number generator to give a reproducible simulation. ## set seed of random number generator set.seed(1) ## sample size n = 9 ## simulate orthonormal principle components PC1 = rnorm(n) PC1 = PC1 / sqrt(sum(PC1^2)) PC2 = rnorm(n) PC2 = PC2 - sum(PC1 * PC2) * PC1 PC2 = PC2 / sqrt(sum(PC2^2)) ## simulate allele frequency for one SNP X = runif(n, 0, 0.5) #Next, for different values of the residual variance σ2, we generate different sets of n = 9 values of E. #For each set, we generate a corresponding set of values Y according to model (1). We then fit a linear #model regressing Y on X, PC1, and PC2, and record the resulting p-values for the significance of X. ## possible values of sigma sigmas = c(1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8, 1e-9, 1e-10, 1e-11) ## store p-values pvals = rep(NA,length(sigmas)) for (i in 1:length(sigmas)) { ## generate data using sigma = ith value of sigmas sigma = sigmas[i] ## generate epsilon epsilon = rnorm(n, sd = sigma) ## generate Y according to model (1) Y = X + PC1 + PC2 + epsilon ## fit regression model fit = lm(Y ~ X + PC1 + PC2) ## save p-value corresponding to X pvals[i] = summary(fit)$coefficients[2, 4] } #The simulation can be repeated using different values of X, PC1, PC2, and E, by changing the seed for #the random number generator, and the same qualitative results will hold (S2).