Skip to main content
. 2009 Dec 23;26(4):518–528. doi: 10.1093/bioinformatics/btp694

Table 1.

Parameter definitions

Parameter name Default Description
Mean length of a recurrently amplified stretch 390 kb This parameter represents the mean of an exponential distribution, which upon sampling determines the length of a recurrently amplified region across samples. By default, the distribution possesses a mean of 390 kb. This exponential distribution can produce recurrent stretches of over 1 or even 2 Mb.
Number of recurrently amplified stretches 5 This parameter determines the number of recurrently amplified stretches in the genome, all of which contain causal (driver) SNPs. A value of five represents a realistic number of truth positive regions.
Mean length of a non-recurrently amplified stretch 2.5 Mb This parameter represents the mean of an exponential distribution, which upon sampling determines the length of a non-recurrent (sample specific) amplified stretch for an individual sample. By default, the distribution possesses a mean of 2.5 Mb.
Number of non-recurrently amplified stretches 5 This parameter determines the number of non-recurrently amplified stretches for a particular sample. No such stretch contains causal (driver) SNPs.
Probability of amplifying the driver allele 0.90 At a driver SNP within a recurrently amplified stretch, a driver allele is pre-selected to be the factor driving tumor development. For a sample j heterozygous at the SNP, this parameter is the probability for: amplifying the phased haplotype within the stretch containing the driver allele. Otherwise, the other phased haplotype is amplified instead.
Probability of amplifying a sample within a recurrently amplified stretch 0.20 The mean probability that: a sample j (and therefore all its calls) is amplified within a recurrently amplified region. The true probability of such amplification for a sample j is determined via sampling from a normal distribution with μ = the parameter value and σ = 0.03. The default and σ values reflect what is observed in recurrently amplified regions in the real Illumina 550K data.
Bias to amplify driver allele homozygous calls 0.70 Either the major or minor allele at a driver SNP can be selected to be the driver allele. This parameter determines the ‘proportion’ of homozygous calls (corresponding to the driver allele) that are to be amplified at the driver SNP. The complement of this parameter (subtracted from 1.0) determines the ‘proportion’ of homozygous calls that are to be amplified for the complementary allele at the driver SNP. The ratio of this parameter value to its complement can be viewed as the relative risk (risk of amplification relative to the homozygous genotype). In other words, the ratio of this parameter value to its complement (the relative risk) is equal to the ratio of the proportion of homozygous calls that are amplified for the driver allele to the proportion of homozygous calls amplified for the other allele.

The parameters used in the simulations are described above. The default values were obtained by observing parameter-specific properties in a real Illumina 550K dataset obtained from The Cancer Genome Atlas (TCGA). The derivation of the default parameter values is discussed in Supplementary Material (Determination of simulation default parameters section).