Abstract
DNA alterations have been observed in astrocytoma for decades. A copy-number genotype predictive of a survival phenotype was only discovered by using the generalized singular value decomposition (GSVD) formulated as a comparative spectral decomposition. Here, we use the GSVD to compare whole-genome sequencing (WGS) profiles of patient-matched astrocytoma and normal DNA. First, the GSVD uncovers a genome-wide pattern of copy-number alterations, which is bounded by patterns recently uncovered by the GSVDs of microarray-profiled patient-matched glioblastoma (GBM) and, separately, lower-grade astrocytoma and normal genomes. Like the microarray patterns, the WGS pattern is correlated with an approximately one-year median survival time. By filling in gaps in the microarray patterns, the WGS pattern reveals that this biologically consistent genotype encodes for transformation via the Notch together with the Ras and Shh pathways. Second, like the GSVDs of the microarray profiles, the GSVD of the WGS profiles separates the tumor-exclusive pattern from normal copy-number variations and experimental inconsistencies. These include the WGS technology-specific effects of guanine-cytosine content variations across the genomes that are correlated with experimental batches. Third, by identifying the biologically consistent phenotype among the WGS-profiled tumors, the GBM pattern proves to be a technology-independent predictor of survival and response to chemotherapy and radiation, statistically better than the patient's age and tumor's grade, the best other indicators, and MGMT promoter methylation and IDH1 mutation. We conclude that by using the complex structure of the data, comparative spectral decompositions underlie a mathematically universal description of the genotype-phenotype relations in cancer that other methods miss.
INTRODUCTION
Recurring DNA alterations have been recognized as a hallmark of cancer for over a century1 and observed in astrocytoma brain cancer for decades, without being translated into clinical use.2 Meanwhile, the prognosis, diagnosis, and treatment of astrocytoma have remained largely unchanged. Temozolomide, the one drug that progressed from trials to standard of care, modestly improves the one-year median survival time of grade IV astrocytoma, i.e., glioblastoma (GBM), by less than three months.3 This is despite advances in genomic profiling technologies and the growing number of publicly available genomic data.4,5 Only recently, a copy-number genotype predictive of an astrocytoma survival phenotype was discovered and only by using the generalized singular value decomposition (GSVD) to compare patient-matched primary adult GBM and, separately, grades III and II, i.e., lower-grade astrocytoma (LGA) tumor and normal genomes, profiled by Agilent comparative genomic hybridization (CGH) and Affymetrix single nucleotide polymorphism (SNP) microarray platforms, respectively.6,7 Note that primary GBM and LGA are different types of cancers. Their histopathologies overlap, and GBM is distinguished from LGA by the presence of necrosis or microvascular proliferation in the tumor. Their epidemiologies, however, differ, including the distributions of the results of existing tests, i.e., for MGMT promoter methylation and IDH1 mutation, and, therefore, also the distributions of treatments, i.e., chemotherapy and radiation.8
To test the mathematical universality and biological consistency of the tumor-exclusive genotype and phenotype, here we use the GSVD to additionally compare whole-genome sequencing (WGS) read-count profiles of astrocytoma tumor and patient-matched normal DNA9 from the Cancer Genome Atlas (TCGA). We used the same computational workflow to construct the WGS astrocytoma set of patients as we previously used to construct the Agilent GBM and Affymetrix LGA discovery and validation sets (Methods and Fig. S1 in the supplementary material). The resulting tumor and normal datasets have the structure of two matrices of N = 85 matched columns, i.e., patients, and M1 = 2 827 037 and M2 = 2 828 152 rows, i.e., tumor and normal 1K-nucleotide bins10,11 (Dataset S1).
The WGS technology complements the CGH and SNP microarray platforms to represent the main genomic profiling technologies. Note that each technology relies on a specific experimental design and a specialized computational protocol, which is sensitive to perturbations to the data, e.g., due to changes in the experimental batch or the computational preprocessing.12–14 This has contributed to a low reproducibility, <70% between technical replicates of the same sample and <50% between computational assessments of the same raw data, in assigning copy-number variations (CNVs) in normal DNA15 or copy-number alterations (CNAs) in tumor DNA. The WGS set of bins, while different from the Agilent CGH and Affymetrix SNP sets of probes, provides a high-resolution representation of the human genome, like the CGH and SNP sets. The ≈2.8M bins, across the autosome and the X chromosome, include almost all of the 213K CGH and 934K SNP probes. In addition, the bins fill in gaps in the genome which are not covered by either set of probes, mostly in genomic regions of constitutive heterochromatin domains, e.g., the centromeres and telomeres.
The WGS astrocytoma set of patients, while different from the mutually exclusive Agilent GBM and Affymetrix LGA discovery and validation sets of patients, statistically represents the astrocytoma patient population at large, like the GBM and LGA sets representing the GBM and LGA populations, respectively. The representation is in terms of both disease and normal phenotypes, e.g., gender and ethnicity, while reflecting biases against surgical resections in patients >75 years old or of diffuse tumors, which affect mostly GBM or LGA patients, respectively. The 85 WGS astrocytoma patients include ≈61%, 28%, and 11% primary GBM and grade III and II astrocytoma patients, diagnosed at the median ages of 60, 50, and 31 years, and with median survival times of 15, 58, and 63 months, respectively. IDH1 mutation was detected in 15%, 48%, and 86% of the tested GBM and grade III and II astrocytoma patients, respectively. Treatment by chemotherapy was noted for 77% GBM and 55% LGA patients. There are 62% male and 38% female patients. Of the 85 WGS astrocytoma patients, 24, i.e., ≈28%, complement the discovery sets of 251 GBM and 59 LGA patients. Of these 24 patients, 14 complement the validation sets of 184 GBM and 74 LGA patients and include GBM and grade III and II astrocytoma patients.
The WGS astrocytoma tumor and patient-matched normal datasets, while different from the Agilent GBM and Affymetrix LGA datasets, represent a range of approaches to tissue collection from 1993 to 2012 and DNA extraction and genomic characterization, like the Agilent GBM and Affymetrix LGA datasets. Participating in generating the data were 18 TCGA tissue source sites (TSSs), two biospecimen core resources (BCRs), and three genomic characterization centers (GCCs), employing two different types of DNA sequencing instruments. Even while controlling for intratumor heterogeneity, TCGA parameters, e.g., the tumor sample's volume, can span approximately two orders of magnitude.
We find that first the GSVD identifies the same genotype-phenotype relation as significant in, and exclusive to, the WGS astrocytoma tumor relative to the patient-matched normal profiles, here as in the previous GSVDs of Agilent GBM and, separately, Affymetrix LGA tumor and normal profiles. The identification is invariably blind to, i.e., without a priori information about, the clinical labels of the patients, the experimental labels of the samples, or the genomic coordinates of the bins or probes. This identified relation is invariably robust to perturbations to the minimally preprocessed data and independent of intratumor heterogeneity as it is reflected in the TCGA parameters.
Second, independent of the profiling technology, the GSVD blindly separates the tumor-exclusive genotype-phenotype relation from experimental batch effects. Affecting the WGS data, here we find guanine-cytosine (GC) content variations across the genomes that vary in magnitude between TCGA GCC and TSS batches. Affecting the microarray data, previously we found batches of, e.g., hybridization dates, scanners, and plates. Additional separation is from normal relations that are conserved in the tumor, e.g., the X chromosome genotype and the gender phenotype. Note that depending on the technology, this relation is represented in the data as a male-specific deletion or a female-specific amplification of the X chromosome relative to the autosome or the normal male genome, respectively.
Third, the tumor-exclusive genotype invariably predicts the phenotype of astrocytoma survival and response to chemotherapy and radiation statistically better than and independent of any other indicator, test, and treatment, here, for the WGS astrocytoma set of patients, as it did previously for the mutually exclusive Agilent GBM and Affymetrix LGA discovery and validation sets of patients.
We, therefore, conclude that the tumor-exclusive genotype-phenotype relation is appropriate for the adult astrocytoma population at large and suitable for all genomic profiling technologies. That is, that the GSVD formulated as a comparative spectral decomposition underlies a mathematically universal description of the genotype-phenotype relations in astrocytoma.
THE GSVD AS A COMPARATIVE SPECTRAL DECOMPOSITION
Given two column-matched but row-independent real matrices , each with full column rank , the GSVD is an exact simultaneous factorization16–19
(1) |
where are real and column-wise orthonormal and is real, invertible, and with normalized rows. The 2 N positive generalized singular values are arranged in in a decreasing order of the ratio . The GSVD is unique up to phase factors of ±1 of each triplet of the corresponding column and row basis vectors, i.e., and vn, except in degenerate subspaces defined by subsets of pairs of generalized singular values of equal ratios, i.e., . The GSVD generalizes the SVD from one to two matrices. Like the SVD, the GSVD is a mathematical building block of algorithms, e.g., for solving the problem of constrained least squares in algebra,20 and theories, e.g., for describing oscillations near equilibrium in classical mechanics.21
We formulated the GSVD as a comparative spectral decomposition that can simultaneously identify the similarity and dissimilarity between two column-matched but row-independent matrices and, therefore, create a single coherent model from two datasets recording different aspects of interrelated phenomena.22,23 This formulation24–27 is possible because the GSVD is exact, exists, and has uniqueness properties that directly generalize those of the SVD28,29 (Theorem S1). The only assumption is that there exists a one-to-one mapping between the columns of the matrices but not necessarily between their rows. We defined the significance of the row basis vector vn and the corresponding column basis vector in the corresponding matrix Di, i.e., the “generalized fraction” , to be proportional to the corresponding generalized singular value and the “generalized normalized Shannon entropy” of Di to be proportional to the arithmetic mean of (Fig. S2). We defined the significance of vn and in D1 relative to that of vn and in D2, i.e., the “GSVD angular distance,” to be a function of the ratio that, from the cosine-sine decomposition, is related to an angle (Fig. 1)
(2) |
Note that the angular distances θn are different from the principal angles corresponding to canonical correlations, as the GSVD is different from canonical correlations analysis (CCA).30
A unique row basis vector vn that is significant in either D1 or D2 and with an angular distance of , which corresponds to a ratio of or , respectively, is mathematically approximately exclusive to either D1 or D2 and for consistency should be interpreted with the corresponding column basis vector or to represent phenomena exclusive to either the first or the second dataset. A unique row basis vector vn that is significant in both D1 and D2 and with an angular distance of , which corresponds to , is mathematically common to D1 and D2 and should be interpreted with both and to represent phenomena common to both datasets.
Mathematically invariant under the exchange of the two matrices or the reordering of the pairs of matched columns or the rows, the GSVD is also blind to the labels of the matrices, the columns, and the rows. These labels are only used to interpret the row and column basis vectors in terms of the phenomena recorded in the datasets.
ASTROCYTOMA TUMOR-EXCLUSIVE GENOTYPE AND PHENOTYPE
The second most tumor-exclusive row basis vectors uncovered by the previous GSVDs of patient-matched Agilent GBM and, separately, Affymetrix LGA tumor and normal profiles are also the first and third most significant in the GBM and LGA tumor genomes, respectively. By using the clinical labels of the previous discovery sets of patients in survival analyses, these second row basis vectors were shown to separate subsets of patients of an approximately one-year median survival time from the complement subsets of median survival times of three years in GBM and five years in LGA. The corresponding second GBM and LGA tumor column basis vectors, i.e., patterns, were shown to similarly separate subsets of patients of an approximately one-year median survival time from the previous validation sets of patients.
By using the genomic coordinates of the microarray probes in segmentation analyses, the GBM and LGA patterns were shown to describe similar genome-wide patterns of co-occurring DNA CNAs that encode for opportunities for transformation via the Ras and Shh pathways. The GBM pattern, which encompasses the LGA pattern, such that these opportunities are enhanced in GBM relative to LGA, includes most CNAs that were known and several that were unrecognized in GBM prior to its discovery. We found that the GBM pattern predicts GBM survival statistically better than any one CNA that it identifies and that none of the previously known CNAs was correlated with GBM survival. We, therefore, suggested that the astrocytoma survival phenotype is an outcome of its global genotype.
Here, we find that the second tumor column basis vector uncovered by the GSVD of the WGS profiles is the second most significant in and exclusive to the astrocytoma tumor relative to the normal genomes and describes the same genotype (Fig. 2). To compare the corresponding WGS astrocytoma pattern to the Agilent GBM and Affymetrix LGA patterns, we used the genomic coordinates of the WGS bins and classified the 111 genomic segments of at least five Agilent probes in length, previously identified in the Agilent GBM pattern, as amplified, unaltered, or deleted in the WGS astrocytoma pattern in addition to the Affymetrix LGA pattern (Dataset S2). The classification is based upon the differences, in standard deviations, between the relative copy-number means of the segments and the autosome or the chromosomes. We find that the WGS astrocytoma pattern is approximately bounded above by the Agilent GBM and below by the Affymetrix LGA pattern; 83% of the segments that are amplified or deleted in the WGS astrocytoma pattern are a subset and a superset of, and of a lesser or greater magnitude than, those that are amplified or deleted in the Agilent GBM and Affymetrix LGA patterns, respectively.
An approximately one-year median survival time phenotype
By using the clinical labels of the patients, we find that the WGS astrocytoma pattern is correlated with the same survival phenotype as the Agilent GBM and Affymetrix LGA patterns (Fig. S3). Of the 85 patients, 52 are classified as having high weights of the astrocytoma pattern in their tumor profiles based upon the superposition coefficients of the second tumor column basis vector in the column vectors of the tumor dataset. The vector that lists these coefficients is linearly proportional to the second row basis vector. Of the same 85 patients, 54, including 51, i.e., ≈98% of the 52, have high Pearson correlations of their tumor profiles with the pattern. We use the correlation cutoff of 0.15 and compute the coefficient cutoff by scaling 0.15 by the Frobenius norm of the vector that lists the correlations, as was previously established for the Agilent GBM discovery set of patients and validated for the Agilent GBM validation and Affymetrix LGA discovery and validation sets of patients.
In Kaplan-Meier (KM) survival analyses, the subsets of patients with high superposition coefficients and, separately, Pearson correlations are of an approximately one-year median survival time, statistically significantly shorter than the median survival time of five years of the complement subsets of patients. In Cox proportional hazards models, a high coefficient or, separately, correlation confers ≈8 times the hazard of a low coefficient or correlation, respectively.
A genotype encoding for transformation via the Notch together with the Ras and Shh pathways
By filling in gaps in the genome which are not covered by either the Agilent or the Affymetrix probes, the WGS astrocytoma pattern adds to the description of the genotype that corresponds to the one-year survival phenotype. We find amplifications previously unrecognized in astrocytoma which encode for increased cell communication via the canonical Notch pathway in support of transformation via the Ras and Shh and the hominin-specific Notch pathway (Fig. 3).
The largest of the 111 segments, which spans ≈79M nucleotides on chromosome 1 across the bands 1p31.1-q23.3, is classified as unaltered in the WGS pattern, the same as in the microarray patterns. The segment contains the two largest gaps between the microarray probes on chromosome 1. The largest, a 23M-nucleotide gap (1p11.2-q21), includes the centromere. Circular binary segmentation (CBS)31 of the WGS pattern identified a 21M-nucleotide segment (1p11.2-q12) within the gap, which is classified as amplified. At 739K nucleotides from the 5′ end of the gene NOTCH2 (1p12-p11.2), the amplification is within its promoter region.32 Similarly, a 140K-nucleotide gap (9q34.3), which includes the 9q telomere, overlaps 79K of a 104K-nucleotide amplified segment in the promoter region of NOTCH1 at 1.6M nucleotides from its 5′ end. These amplifications within the promoter regions, rather than of the genes, encode for overexpression of wild-type NOTCH1/2.33,34 Three genes in the core Notch pathway are on two of the 111 segments, which are approximately coextensive with 19q and 20p and are amplified in the GBM but not the LGA or astrocytoma patterns. The ligand-encoding JAG1 and DLL3 are involved in sending, and PSENEN in receiving, the Notch signals. These amplifications encode for overactivation of Notch in GBM. Note that the co-deletion of 1p and 19q, which can underactivate Notch, is associated with an oligodendroglioma brain cancer patient's longer survival.
Segmentation of the WGS pattern also identifies a 76K-nucleotide segment within the second largest gap on chromosome 1 (1q21.2). The segment, which is classified as amplified, maps to the neuroblastoma breakpoint family gene NBPF14, so-called because NBPF1 (1p36.13) was discovered in a screen for genes disrupted by a translocation in a neuroblastoma brain cancer patient's normal genome.35 The segment includes 38 repeats of a 1.5K-nucleotide sequence that encodes for a copy of the protein domain of unknown function 1220 (DUF1220).36 At 2.3M nucleotides from the 5′ end of the hominin-specific NOTCH2NL (1q21.1), the amplification is within its promoter region and encodes for its overexpression.
Overactivation of the canonical Notch pathway supports human normal to tumor cell transformation via the Ras and Shh and the hominin-specific Notch pathway. In response to Ras-mediated growth signals, wild-type NOTCH1/2 upregulate the cell cycle-promoting cyclin-dependent kinase (CDK) encoded by CDK4 and blocks the cell cycle arrest, apoptosis, and senescence-promoting CDK inhibitors p16INK4A and p15INK4B encoded by CDKN2A/B.37–40 Note that in the absence of CDK inhibitors, DNA-damaged cells acquire deformed polyploid nuclei.41,42 In response to Shh-mediated developmental signals, NOTCH1/2 facilitate the clearance of the tumor suppressor Ptch1, the concurrent accumulation of the Shh signal-transducing protein encoded by SMO, and the increased downstream conversion of the proteins encoded by the oncogenes GLI1/3 into cell cycle transcriptional activators.43,44 Note that Notch is critical for an Shh-induced medulloblastoma brain cancer tumor's development.45
In the hominin-specific Notch pathway, NOTCH2NL can act as a ligand-independent NOTCH1/2.46 Note that overexpression of NOTCH2NL and gain of DUF1220 are associated with an increased brain size, both developmentally within the human and evolutionarily within the primate population.47
We also find consistency between the DNA CNAs and mRNA expression, which additionally supports the astrocytoma tumor-exclusive genotype-phenotype relation.48 Of the 29 genes highlighted, 19 are overexpressed or underexpressed in the subset of tumors that have high weights of the WGS astrocytoma pattern in their profiles, with the corresponding Mann-Whitney-Wilcoxon (MWW) P-values <0.05. This subset of tumors corresponds to the subset of patients that have the approximately one-year survival phenotype. Of these 19 genes, 16, i.e., ≈84%, consistently map to amplifications or deletions in the tumor-exclusive genotype (Figs. S4–S7).
BLIND SEPARATION FROM NORMAL AND EXPERIMENTAL SOURCES OF THE COPY-NUMBER VARIATION
By using the experimental labels of the DNA samples, we find that the GSVD blindly, i.e., without a priori information, separates the astrocytoma tumor-exclusive genotype and phenotype from CNVs common to the normal and tumor genomes and from experimental variations specific to the minimally preprocessed WGS profiles. These include the effects of the GC content variations across the tumor and normal genomes that vary in magnitude between experimental batches. The first tumor and 85th normal column basis vectors are the most significant in and exclusive to and are correlated with the fractional GC content across the tumor and normal genomes, respectively, with both correlations 0.78 and both MWW P-values (Figs. S8–S10). Both vectors roughly describe frequent spikes of reduced copy numbers superimposed on an invariant baseline in agreement with the polymerase chain reaction (PCR) amplification-dependent WGS technology underestimating the abundance of GC-poor sequences. The corresponding first and 85th row basis vectors are correlated with experimental variations in the GCC of the tumor and TSS of the normal DNA with both hypergeometric and both MWW P-values <10–2 (Fig. S11).
The 82nd row basis vector is the second and fifth most significant in the normal and tumor genomes, respectively, and approximately common to both. The vector classifies the patients by gender with both hypergeometric and MWW P-values <10–13 (Fig. S12). Both normal and tumor 82nd column basis vectors describe a deletion of the X chromosome with both MWW P-values (Figs. S13–S15). While the deletion is dominant in the normal and tumor genomes of the 53 male patients, it is missing from the astrocytoma pattern, where the X chromosome is classified as unaltered, the same as in the GBM and LGA patterns.
THE TUMOR-EXCLUSIVE GENOTYPE PREDICTS THE SURVIVAL PHENOTYPE STATISTICALLY BETTER THAN ANY OTHER INDICATOR
Because the Agilent GBM pattern encompasses the WGS astrocytoma and Affymetrix LGA patterns in the number and magnitude of CNAs and because it was derived from the largest discovery set, i.e., of 251 patients, we additionally classified the 85 WGS astrocytoma patients based upon the correlations of the Agilent GBM pattern with their WGS astrocytoma tumor profiles. We find that the Agilent GBM pattern predicts survival statistically better than and independent of the best other indicators, i.e., the patient's age and tumor's grade49 and survival and response to treatments, i.e., chemotherapy and radiation, better than the existing tests, i.e., for MGMT promoter methylation and IDH1 mutation.50,51 In KM analyses and Cox models of the patients, the pattern identifies the biologically consistent survival phenotype with greater median survival time differences, hazard ratios, and concordance indices, i.e., accuracies, and lesser log-rank P-values than either indicator or test (Fig. 4), and, in KM analyses and Cox models of the treated patients, better than either test (Fig. S16). The bivariate hazard ratios of the pattern and either indicator are within the 95% confidence intervals of the corresponding univariate ratios (Table S1). The pattern is also independent of intratumor heterogeneity as it is reflected in the TCGA parameters of the tumor sample's volume, the slide's percent tumor cells and percent tumor nuclei, the portion's weight, and the analyte's and aliquot's DNA concentrations.
This is consistent with the classifications based upon the WGS astrocytoma pattern, where the median survival time differences are the same and the hazard ratios are within the 95% confidence intervals of those based upon the Agilent GBM pattern. This is also consistent with the classifications of an Affymetrix set of 497 astrocytoma patients and, separately, an Agilent set of 364 GBM patients, from the previous discovery and validation sets of GBM and LGA patients, based upon their Affymetrix and Agilent tumor profiles, respectively, where the pattern is independent of each treatment, indicator, and test (Figs. S17 and S18, Tables S2 and S3, and Datasets S3 and S4).
That the tumor-exclusive genotype-phenotype relation is statistically independent of the current indicators, tests, and treatments of astrocytoma implies that the information contained in the relation is not currently being used in clinical practice. This information includes, e.g., biochemically putative drug targets and combinations of drug targets that are predicted to be correlated with outcome. By using this information in clinical practice, therefore, it can be expected to improve the prognostics, diagnostics, and therapeutics of the disease.
DISCUSSION
That the astrocytoma tumor-exclusive genotype-phenotype relation is invariably uncovered by, and only by, the GSVD, independent of the profiling technology and the astrocytoma grade, highlights the role of mathematics in genomic data science and machine learning. Unlike most other analyses, the GSVD uses minimally preprocessed genomic data without feature engineering. This accounts for the robustness of the GSVD to perturbations to the data and is possible because of its scalability to petabyte-sized data. Other analyses often standardize the data based upon assumptions, which may confound the data and contribute to the low reproducibility noted in genomic profiling.
Unlike most other analyses, the GSVD uses the patient-matched normal data to analyze the tumor data, including tumor genomic regions of normal CNVs, e.g., the X chromosome. This makes the GSVD sensitive to robust genotype-phenotype relations in small discovery sets of only, e.g., 251, 59, and 85 patients, and possibly imbalanced validation sets of, e.g., 184 and 74 patients, with large genomic profiles of, e.g., 213K, 934K, and 2.8M probes or bins each. This is possible because the GSVD uses the structure of the tumor and normal datasets, of two column-matched but row-independent matrices, in the blind source separation (BSS)52–64 of the tumor-exclusive from the normal genotype-phenotype relations and from experimental batch effects. Patient-matched normal CNVs are often missing from other analyses of tumor CNAs, even though CNVs overlap ≈12% of the normal human genome,65 where they are 102–104 times more frequent than point mutations,66 and are associated with both tumor and normal development.67–69 When other analyses use patient-matched normal data, it is to standardize the tumor data. This reduces the structure of the data to that of one matrix, and some of the information regarding the similarity and dissimilarity between the tumor and normal genomes may be lost.
The GSVD as a comparative spectral decomposition22,23 has been extended from two to multiple matrices and, separately, two tensors.24–26 A recent tensor GSVD comparison of ovarian cystadenocarcinoma tumor and patient- and microarray platform-matched normal copy-number profiles uncovered chromosome arm-wide patterns of tumor-exclusive platform-consistent CNAs that predict survival and response to chemotherapy. We conclude that comparative spectral decompositions, such as the GSVD, underlie a mathematically universal description of the genotype-phenotype relations in cancer that other methods miss.
METHODS
See supplementary material for the Methods section.
Ethics approval
Ethics approval was not required to perform this research.
SUPPLEMENTARY MATERIAL
ACKNOWLEDGMENTS
We thank R. A. Horn for insightful discussions of matrix analysis, M. P. Scott and R. A. Weinberg for thoughtful comments on the Shh and Ras signaling pathways, H. A. Hanson for careful reviews of survival analysis, R. L. Jensen and C. A. Palmer for helpful notes on astrocytoma intratumor heterogeneity, C. T. Wittwer for helpful notes on PCR technology, and J. S. Barnholtz-Sloan, K. Devine, J. Bowen, J. M. Gastier-Foster, K. M. Leraas, K. R. Mills Shaw, and J. C. Zenklusen for useful exchanges on TCGA. This work was funded by National Cancer Institute (NCI) U01 Grant No. CA-202144 and by Utah Science, Technology, and Research (USTAR) Initiative support, both to O.A. S.P.P. and O.A. are co-founders of and equity holders in Eigengene, Inc. This does not alter our adherence to the policies of APL Bioengineering on sharing data and materials.
References
- 1. Boveri T., Concerning the Origin of Malignant Tumours ( Gustav Fischer Verlag, Jena, Germany, 1914) [Google Scholar]; Boveri T. [J. Cell Sci. 121, 1 (2008) (translated and annotated by H. Harris)]. 10.1242/jcs.025742 [DOI] [PubMed] [Google Scholar]
- 2. Weber R. G., Sommer C., Albert F. K., Kiessling M., and Cremer T., Lab Invest. 74, 108 (1996). [PubMed] [Google Scholar]
- 3. Grossman S. A. and Ellsworth S. G., J. Clin. Oncol. 34, e13522 (2016). 10.1200/JCO.2016.34.15_suppl.e13522 [DOI] [Google Scholar]
- 4.TCGA Research Network, Nature 455, 1061 (2008). 10.1038/nature07385 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.TCGA Research Network, N. Engl. J. Med. 372, 2481 (2015). 10.1056/NEJMoa1402121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lee C. H., Alpert B. O., Sankaranarayanan P., and Alter O., PLoS One 7, e30098 (2012). 10.1371/journal.pone.0030098 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Aiello K. A. and Alter O., PLoS One 11, e0164546 (2016). 10.1371/journal.pone.0164546 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Ostrom Q. T., Gittleman H., Liao P., Vecchione-Koval T., Wolinsky Y., Kruchko C., and Barnholtz-Sloan J. S., Neuro-Oncology 19, v1 (2017). 10.1093/neuonc/nox158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Aiello K. A., Ponnapalli S. P., and Alter O., in 2018 AACR Annual Meeting, 14–18 April (2018). [Google Scholar]
- 10. Klambauer G., Schwarzbauer K., Mayr A., Clevert D. A., Mitterecker A., Bodenhofer U., and Hochreiter S., Nucleic Acids Res. 40, e69 (2012). 10.1093/nar/gks003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Karolchik D., Barber G. P., Casper J., Clawson H., Cline M. S., Diekhans M., Dreszer T. R., Fujita P. A., Guruvadoo L., Haeussler M., Harte R. A., Heitner S., Hinrichs A. S., Learned K., Lee B. T., Li C. H., Raney B. J., Rhead B., Rosenbloom K. R., Sloan C. A., Speir M. L., Zweig A. S., Haussler D., Kuhn R. M., and Kent W. J., Nucleic Acids Res. 42, D764 (2014). 10.1093/nar/gkt1168 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Haraksingh R. R., Abyzov A., and Urban A. E., BMC Genomics 18, 321 (2017). 10.1186/s12864-017-3658-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Shen R. and Seshan V. E., Nucleic Acids Res. 44, e131 (2016). 10.1093/nar/gkw520 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Roberts R. J., Carneiro M. O., and Schatz M. C., Genome Biol. 14, 405 (2013). 10.1186/gb-2013-14-6-405 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Pinto D., Darvishi K., Shi X., Rajan D., Rigler D., Fitzgerald T., Lionel A. C., Thiruvahindrapuram B., Macdonald J. R., Mills R., Prasad A., Noonan K., Gribble S., Prigmore E., Donahoe P. K., Smith R. S., Park J. H., Hurles M. E., Carter N. P., Lee C., Scherer S. W., and Feuk L., Nat. Biotechnol. 29, 512 (2011). 10.1038/nbt.1852 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Van Loan C. F., SIAM J. Numer. Anal. 13, 76 (1976). 10.1137/0713009 [DOI] [Google Scholar]
- 17. Paige C. C. and Saunders M. A., SIAM J. Numer. Anal. 18, 398 (1981). 10.1137/0718026 [DOI] [Google Scholar]
- 18. Friedland S., SIAM J. Matrix Anal. Appl. 27, 434 (2005). 10.1137/S0895479804439791 [DOI] [Google Scholar]
- 19. Horn R. A. and Johnson C. R., Matrix Analysis, 2nd ed ( Cambridge University Press, Cambridge, UK, 2012). [Google Scholar]
- 20. Golub G. H. and Van Loan C. F., Matrix Computations, 4th ed ( Johns Hopkins University Press, Baltimore, MD, 2012). [Google Scholar]
- 21. Goldstein H., Classical Mechanics, 2nd ed ( Addison-Wesley, Reading, MA, 1980). [Google Scholar]
- 22. Alter O., Brown P. O., and Botstein D., Proc. Natl. Acad. Sci. USA 100, 3351 (2003). 10.1073/pnas.0530258100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Alter O., Golub G. H., Brown P. O., and Botstein D., in Miami Nature Biotechnology Winter Symposium on Cell Cycle, Chromosomes and Cancer, 31 January–4 February (2004). [Google Scholar]
- 24. Ponnapalli S. P., Golub G. H., and Alter O., in Stanford University and Yahoo! Research Workshop on Algorithms for Modern Massive Datasets, 21–24 June (2006).
- 25. Ponnapalli S. P., Saunders M. A., Van Loan C. F., and Alter O., PLoS One 6, e28072 (2011). 10.1371/journal.pone.0028072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Sankaranarayanan P., Schomay T. E., Aiello K. A., and Alter O., PLoS One 10, e0121396 (2015). 10.1371/journal.pone.0121396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Aiello K. A., Maughan C. A., Schomay T. E., Ponnapalli S. P., Hanson H. A., and, Alter O., in 2018 AACR Annual Meeting, 14–18 April (2018). [Google Scholar]
- 28. Trefethen L. N. and Bau D. III, Numerical Linear Algebra ( SIAM, Philadelphia, PA,1997). [Google Scholar]
- 29. Edelman A., Arias T. A., and Smith S. T., SIAM J. Matrix Anal. Appl. 20, 303 (1998). 10.1137/S0895479895290954 [DOI] [Google Scholar]
- 30. Ewerbring L. M. and Luk F. T., J. Comput. Appl. Math. 27, 37 (1989). 10.1016/0377-0427(89)90360-9 [DOI] [Google Scholar]
- 31. Olshen A. B., Venkatraman E. S., Lucito R., and Wigler M., Biostatistics 5, 557 (2004). 10.1093/biostatistics/kxh008 [DOI] [PubMed] [Google Scholar]
- 32. Lettice L. A., Horikoshi T., Heaney S. J., van Baren M. J., van der Linde H. C., Breedveld G. J., Joosse M., Akarsu N., Oostra B. A., Endo N., Shibata M., Suzuki M., Takahashi E., Shinka T., Nakahori Y., Ayusawa D., Nakabayashi K., Scherer S. W., Heutink P., Hill R. E., and Noji S., Proc. Natl. Acad. Sci. USA 99, 7548 (2002). 10.1073/pnas.112212199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Purow B. W., Haque R. M., Noel M. W., Su Q., Burdick M. J., Lee J., Sundaresan T., Pastorino S., Park J. K., Mikolaenko I., Maric D., Eberhart C. G., and Fine H. A., Cancer Res. 65, 2353 (2005). 10.1158/0008-5472.CAN-04-1890 [DOI] [PubMed] [Google Scholar]
- 34. Sun W., Gaykalova D. A., Ochs M. F., Mambo E., Arnaoutakis D., Liu Y., Loyo M., Agrawal N., Howard J., Li R., Ahn S., Fertig E., Sidransky D., Houghton J., Buddavarapu K., Sanford T., Choudhary A., Darden W., Adai A., Latham G., Bishop J., Sharma R., Westra W. H., Hennessey P., Chung C. H., and Califano J. A., Cancer Res. 74, 1091 (2014). 10.1158/0008-5472.CAN-13-1259 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Laureys G., Speleman F., Versteeg R., van der Drift P., Chan A., Leroy J., Francke U., Opdenakker G., and Van Roy N., Oncogene 10, 1087 (1995). [PubMed] [Google Scholar]
- 36. O'Bleness M., Searles V. B., Dickens C. M., Astling D., Albracht D., Mak A. C., Lai Y. Y., Lin C., Chu C., Graves T., Kwok P. Y., Wilson R. K., and Sikela J. M., BMC Genomics 15, 387 (2014). 10.1186/1471-2164-15-387 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Weijzen S., Rizzo P., Braid M., Vaishnav R., Jonkheer S. M., Zlobin A., Osborne B. A., Gottipati S., Aster J. C., Hahn W. C., Rudolf M., Siziopikou K., Kast W. M., and Miele L., Nat. Med. 8, 979 (2002). 10.1038/nm754 [DOI] [PubMed] [Google Scholar]
- 38. Hahn W. C., Counter C. M., Lundberg A. S., Beijersbergen R. L., Brooks M. W., and Weinberg R. A., Nature 400, 464 (1999). 10.1038/22780 [DOI] [PubMed] [Google Scholar]
- 39. Kiaris H., Politi K., Grimm L. M., Szabolcs M., Fisher P., Efstratiadis A., and Artavanis-Tsakonas S., Am. J. Pathol. 165, 695 (2004). 10.1016/S0002-9440(10)63333-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Carlson M. E., Hsu M., and Conboy I. M., Nature 454, 528 (2008). 10.1038/nature07034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Waldman T., Lengauer C., Kinzler K. W., and Vogelstein B., Nature 381, 713 (1996). 10.1038/381713a0 [DOI] [PubMed] [Google Scholar]
- 42. Irianto J., Xia Y., Pfeifer C. R., Athirasala A., Ji J., Alvey C., Tewari M., Bennett R. R., Harding S. M., Liu A. J., Greenberg R. A., and Discher D. E., Curr. Biol. 27, 210 (2017). 10.1016/j.cub.2016.11.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Kong J. H., Yang L., Dessaud E., Chuang K., Moore D. M., Rohatgi R., Briscoe J., and Novitch B. G., Dev. Cell 33, 373 (2015). 10.1016/j.devcel.2015.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Rohatgi R. and Scott M. P., Nat. Cell Biol. 9, 1005 (2007). 10.1038/ncb435 [DOI] [PubMed] [Google Scholar]
- 45. Lee E. Y., Ji H., Ouyang Z., Zhou B., Ma W., Vokes S. A., McMahon A. P., Wong W. H., and Scott M. P., Proc. Natl. Acad. Sci. USA 107, 9736 (2010). 10.1073/pnas.1004602107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Fiddes I. T., Lodewijk G. A., Mooring M. M., Bosworth C. M., Ewing A. D., Mantalas G. L., Novak A. M., van den Bout A., Bishara A., Rosenkrantz J. L., Lorig-Roach R., Field A. R., Haeussler M., Russo L., Bhaduri A., Nowakowski T. J., Pollen A. A., Dougherty M. L., Nuttle X., Addor M. C., Zwolinski S., Katzman S., Kreigstein A., Eichler E. E., Salama S. R., Jacobs F. M. J., and Haussler D., Cell 173, 1356 (2018). 10.1016/j.cell.2018.03.051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Popesco M. C., Maclaren E. J., Hopkins J., Dumas L., Cox M., Meltesen L., McGavran L., Wyckoff G. J., and Sikela J. M., Science 313, 1304 (2006). 10.1126/science.1127980 [DOI] [PubMed] [Google Scholar]
- 48. Fischer U., Meltzer P., and Meese E., Hum. Genet. 98, 625 (1996). 10.1007/s004390050271 [DOI] [PubMed] [Google Scholar]
- 49. Netsky M. G., August B., and Fowler W., J. Neurosurg. 7, 261 (1950). 10.3171/jns.1950.7.3.0261 [DOI] [PubMed] [Google Scholar]
- 50. Bady P., Sciuscio D., Diserens A. C., Bloch J., van den Bent M. J., Marosi C., Dietrich P. Y., Weller M., Mariani L., Heppner F. L., Mcdonald D. R., Lacombe D., Stupp R., Delorenzi M., and Hegi M. E., Acta Neuropathol. 124, 547 (2012). 10.1007/s00401-012-1016-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Brennan C. W., Verhaak R. G., McKenna A., Campos B., Noushmehr H., Salama S. R., Zheng S., Chakravarty D., Sanborn J. Z., Berman S. H., Beroukhim R., Bernard B., Wu C. J., Genovese G., Shmulevich I., Barnholtz-Sloan J. S., Zou L., Vegesna R., Shukla S. A., Ciriello G., Yung W. K., Zhang W., Sougnez C., Mikkelsen T., Aldape K., Bigner D. D., Van Meir E. G., Prados M., Sloan A., Black K. L., Eschbacher J., Finocchiaro G., Friedman W., Andrews D. W., Guha A., Iacocca M., O'Neill B. P., Foltz G., Myers J., Weisenberger D. J., Penny R., Kucherlapati R., Perou C. M., Hayes D. N., Gibbs R., Marra M., Mills G. B., Lander E., Spellman P., Wilson R., Sander C., Weinstein J., Meyerson M., Gabriel S., Laird P. W., Haussler D., Getz G., Chin L., and TCGA Research Network, Cell 155, 462 (2013). 10.1016/j.cell.2013.09.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Howland P. and Park H., IEEE Trans. Pattern Anal. Mach. Intell. 26, 995 (2004). 10.1109/TPAMI.2004.46 [DOI] [PubMed] [Google Scholar]
- 53. Berger J. A., Hautaniemi S., Mitra S. K., and Astola J., IEEE/ACM Trans. Comput. Biol. Bioinf. 3, 2 (2006). 10.1109/TCBB.2006.10 [DOI] [PubMed] [Google Scholar]
- 54. De Clercq W., Vergult A., Vanrumste B., Van Paesschen W., and Van Huffel S., IEEE Trans. Biomed. Eng. 53, 2583 (2006). 10.1109/TBME.2006.879459 [DOI] [PubMed] [Google Scholar]
- 55. Teschendorff A. E., Journée M., Absil P. A., Sepulchre R., and Caldas C., PLoS Comput. Biol. 3, e161 (2007). 10.1371/journal.pcbi.0030161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Schreiber A. W., Shirley N. J., Burton R. A., and Fincher G. B., BMC Bioinf. 9, 335 (2008). 10.1186/1471-2105-9-335 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Rustandi I., Just M. A., and Mitchell T. M., in Proceedings of the MICCAI Workshop on Statistical Modeling and Detection Issues in Intra-and Inter-Subject Functional MRI Data Analysis, 20–24 September (2009). [Google Scholar]
- 58. Xiao X., Dawson N., Macintyre L., Morris B. J., Pratt J. A., Watson D. G., and Higham D. J., BMC Syst. Biol. 5, 72 (2011). 10.1186/1752-0509-5-72 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Tomescu O. A., Mattanovich D., and Thallinger G. G., BMC Syst. Biol. 8, S4 (2014). 10.1186/1752-0509-8-S2-S4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Xiao X., Moreno-Moral A., Rotival M., Bottolo L., and Petretto E., PLoS Genet. 10, e1004006 (2014). 10.1371/journal.pgen.1004006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Adali T., Levin-Schwartz Y., and Calhoun V. D., Proc. IEEE 103, 1478 (2015). 10.1109/JPROC.2015.2461624 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Chen X., Wang Z. J., and McKeown M., IEEE Signal Process. Mag. 33, 86 (2016). 10.1109/MSP.2016.2521870 [DOI] [Google Scholar]
- 63. Wang Y., Zhao W., and Zhou X., Sci. Rep. 6, 34335 (2016). 10.1038/srep34335 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Chitforoushzadeh Z., Ye Z., Sheng Z., LaRue S., Fry R. C., Lauffenburger D. A., and Janes K. A., Sci. Signal. 9, ra59 (2016). 10.1126/scisignal.aad3373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Redon R., Ishikawa S., Fitch K. R., Feuk L., Perry G. H., Andrews T. D., Fiegler H., Shapero M. H., Carson A. R., Chen W., Cho E. K., Dallaire S., Freeman J. L., González J. R., Gratacòs M., Huang J., Kalaitzopoulos D., Komura D., MacDonald J. R., Marshall C. R., Mei R., Montgomery L., Nishimura K., Okamura K., Shen F., Somerville M. J., Tchinda J., Valsesia A., Woodwark C., Yang F., Zhang J., Zerjal T., Zhang J., Armengol L., Conrad D. F., Estivill X., Tyler-Smith C., Carter N. P., Aburatani H., Lee C., Jones K. W., Scherer S. W., and Hurles M. E., Nature 444, 444 (2006). 10.1038/nature05329 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Lupski J. R., Nat. Genet. 39, S43 (2007). 10.1038/ng2084 [DOI] [PubMed] [Google Scholar]
- 67. Diskin S. J., Hou C., Glessner J. T., Attiyeh E. F., Laudenslager M., Bosse K., Cole K., Mossé Y. P., Wood A., Lynch J. E., Pecor K., Diamond M., Winter C., Wang K., Kim C., Geiger E. A., McGrady P. W., Blakemore A. I., London W. B., Shaikh T. H., Bradfield J., Grant S. F., Li H., Devoto M., Rappaport E. R., Hakonarson H., and Maris J. M., Nature 459, 987 (2009). 10.1038/nature08035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Vanneste E., Voet T., Le Caignec C., Ampe M., Konings P., Melotte C., Debrock S., Amyere M., Vikkula M., Schuit F., Fryns J. P., Verbeke G., D'Hooghe T., Moreau Y., and Vermeesch J. R., Nat. Med. 15, 577 (2009). 10.1038/nm.1924 [DOI] [PubMed] [Google Scholar]
- 69. Fischer U., Keller A., Voss M., Backes C., Welter C., and Meese E., PLoS One 7, e37422 (2012). 10.1371/journal.pone.0037422 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.