Supporting Figure 9a

Supporting Figure 9b

Fig. 9.

Variance decays due to length-dependent noise in higher dimensions. When singular values are calculated for data sets consisting of genes of the same length, the singular values for dimensions 5–41 fall off with length. The decrease in singular values with length in dimensions 5–41 is similar to that seen with a randomized data set. This finding further supports the hypothesis that within-genome variance for eukaryotic genomes is smaller in higher dimensions because their typical gene length is longer. Subsets of the original data set were taken such that only genes within » 10% length range were included. No further sampling based on genome was done. Then a similar analysis was performed, and singular values were scaled based on the number of genes in the data set. In this figure, singular values are expressed as a fraction of the corresponding singular value for the data set with the shortest genes for comparison. Singular values are compared here as a measure of the variance in each dimension. (a) Results for nonrandomized data.  The x axis is the log (base e) of the length of the genes in that data set.  The y axis is the relative magnitude of the singular value compared with the corresponding singular value in the data set with the shortest genes.  The singular values for dimensions 5–41 decay in a fashion similar as they do in b for randomized data. (b) Results for randomized data.  The nucleotides of each gene were randomly shuffled before the analysis.