Abstract
Effective discovery of causal disease genes must overcome the statistical challenges of quantitative genetics studies and the practical limitations of human biology experiments. Here we developed diseaseQUEST, an integrative approach that combines data from human genome-wide disease studies with in silico network models of tissue- and cell-type-specific function in model organisms to prioritize candidates within functionally conserved processes and pathways. We used diseaseQUEST to predict candidate genes for 25 different diseases and traits, including cancer, longevity, and neurodegenerative diseases. Focusing on Parkinson’s disease (PD), a diseaseQUEST-directed Caenhorhabditis elegans behavioral screen identified several candidate genes, which we experimentally verified and found to be associated with age-dependent motility defects mirroring PD clinical symptoms. Furthermore, knockdown of the top candidate gene, bcat-1, encoding a branched chain amino acid transferase, caused spasm-like ‘curling’ and neurodegeneration in C. elegans, paralleling decreased BCAT1 expression in PD patient brains. diseaseQUEST is modular and generalizable to other model organisms and human diseases of interest.
Understanding the etiology of disease requires comprehensive data-driven methods capable of identifying and experimentally verifying candidate disease genes. Whereas quantitative genetics approaches provide a valuable, relatively unbiased source of candidate genes, they suffer from statistical and biological limitations (for example, lack of power because of sample size, multiple hypothesis testing, or variants with small effect sizes), thereby potentially missing a large fraction of disease-associated genes. Network-based methods have emerged as a useful set of tools to complement quantitative genetic studies, by leveraging the disease signals captured by these studies to further interpret and prioritize candidate disease genes. Tissue-specific networks1,2 have been shown to be important, because they address many of the limitations (for example, coverage and lack of tissue specificity) of previous methods that rely on protein–protein physical-interaction networks3–5. Tissue specificity is especially critical, because tissue-specific gene expression and pathway regulation underlie human physiology, and their dysfunction often results in disease. Another major challenge is the inability to systematically screen and test candidate disease genes in humans, owing to technical and ethical limitations. Model organisms provide a powerful answer to that challenge6–8, but their most effective use requires reconciling human disease genetics with model-organism biology.
To address these issues, we developed disease-associated quantitative unbiased estimation across species and tissues (diseaseQUEST), an integrated computationally driven approach that combines human quantitative genetics with in silico functional network representations of model-organism biology to systematically identify disease-gene candidates. Our approach leverages the disease signals in quantitative human genetics studies (such as genome-wide association studies (GWAS)) as well as the functional pathway signals in cell-type- and tissue-specific networks, integrating large collections of ‘omics’ data in model organisms, to predict and experimentally screen candidate disease genes for their association with relevant phenotypes. Intuitively, these networks summarize functional relationships between genes in specific tissues or cell types, such that a functional relationship represents genes working together, either directly or indirectly, in a biological pathway. The tissue specificity of this approach reflects the roles that tissue and cell-type diversity play in most complex human diseases and is critical for both the accuracy and the interpretation of diseasegene predictions. In essence, diseaseQUEST enables computationally guided phenotype screens that identify the top gene candidates for the disease of interest, prioritizing areas for which the model system used by diseaseQUEST is informative for human disease biology.
We used diseaseQUEST to predict candidate genes for 25 human diseases and traits by using C. elegans as a model system. This application of diseaseQUEST harnesses a semisupervised approach that we developed to generate tissue-specific functional networks, which are combined with human GWAS results to identify new disease-related-gene candidates. We showed that diseaseQUEST can accurately identify disease genes across organ systems and demonstrated its ability to predict the tissue specificity of known longevity pathways by using only human GWAS genes as input.
We took advantage of the experimental tools in C. elegans that allow for high-throughput behavioral testing (thus making it a valuable system to quickly assay disease-associated genes), as well as the worm’s short lifespan (enabling fast screening of age-related disorders), to experimentally assay 45 candidate PD genes across 13,255 individual worms. PD is the most common neurodegenerative movement disorder worldwide, for which 70–95% of cases have unknown origins, thus reflecting the need for innovative approaches to identify disease-modifying genes. Knockdown of most of the diseaseQUEST PD-candidate genes caused motor defects in C. elegans, and neuronal knockdown of our top candidate, bcat-1, caused spasm-like curling and exacerbated α-synuclein-mediated degeneration of dopaminergic neurons. Notably, BCAT1 is normally highly expressed in areas of the brain that are affected by PD, and expression of BCAT1 is significantly lower (false discovery rate (FDR) = 0.0227) in the substantia nigra in PD patients than unaffected individuals, thus suggesting that diseaseQUEST with high-throughput C. elegans behavioral screening can successfully identify and test new disease genes.
RESULTS
Combining tissue-specific model-system biology with human disease studies
diseaseQUEST includes three key components (Fig. 1a) within an integrated computational-experimental framework for discovery and directed experimental screening of disease-gene candidates. The Functional Representation module leverages a semisupervised approach for building tissue- and cell-type-specific model-organism functional networks (in this study, we built C. elegans networks, described below). The Disease Prediction module utilizes these model-organism networks and human quantitative genetic data to make candidate-disease-gene predictions. Finally, the Phenotypic Assay module experimentally tests these predictions in phenotyping screens in the model organism.
A semisupervised regularized Bayesian integration method to build tissue-specific functional networks
To enable diseaseQUEST to effectively leverage the wealth of cell-type information available for many worm genes, we developed a new approach that efficiently extracts cell-lineage-specific signals from the compendium of C. elegans expression data and generates network representations of tissue- and cell-type-specific functional similarity for the Functional Representation module. We applied this semisu-pervised, ontology-aware regularized Bayesian integration method to 203 cell types and tissues in C. elegans, including not only tissues in the major organ systems but also hermaphrodite- and male-specific tissues, thereby providing networks of resolution down to specific cell types (for example, dopaminergic neurons; hyp 1, a specific hypo-dermal cell; marginal cells; full list in Supplementary Data 1). Our approach addresses limitations in the knowledge of cell-type-specific gene expression and protein function by using semisupervised learning. The method supplements the limited number of known patterns of cell-type-specific expression or function with high-confidence predictions made from large collections of functional genomic data. This procedure enabled us to generate high-quality networks, even for cell types and tissues with few known cell-type-specific genes (for example, the ASIR neurons or the V2l cells, with only 76 and 47 annotated cell-type-specific genes).
Across all tissues and sex-specific systems, the networks were accurate in predicting known tissue-specific functional associations in ‘hold-out’ evaluations (in which a subset of genes with known functional associations is ‘hidden’ from the system throughout training and is used to evaluate its performance) (Fig. 1b,c). Our semisuper-vised framework captured tissue-specific function significantly better than a global, non-tissue-specific network representing the whole organism (Fig. 1b, one-sided Wilcoxon rank-sum test, P < 2.087 ×10−14; another example of a global non-tissue-specific approach is WormNet 3.0 (ref. 9), P < 3.66 × 10−15) or networks generated by a fully supervised framework approach1 (Fig. 1c, one-sided Wilcoxon rank-sum test, P < 2.52 × 10−13; to our knowledge, no other tissue-specific worm networks have been described). Notably, individual neuron subsets, such as cholinergic and dopaminergic neurons, were among the top-performing tissue networks determined through the semisupervised approach, and they outperformed the whole nervous-system network (Fig. 1b). We have made all these networks available for download and interactive exploration through a dynamic web interface (Worm Integrated in Specific Contexts (WISP), http://wisp.princeton.edu/).
Predicting candidate human disease genes
The Disease Prediction module of diseaseQUEST then leverages the cell-type- and tissue-specific model-organism networks described above in concert with human disease genes from quantitative genetics studies within a machine-learning framework to predict new candidate genes. Specifically, we identified the closest worm functional orthologs10 of reported disease-associated genes in the GWAS Catalog11 as positive examples, and we used the support-vector machine-learning approach with the network neighborhoods of these genes as input to predict other genes with similar network topology. Intuitively, our approach learns coherent tissue-specific-network signals that are indicative of genes involved in a specific human disease as opposed to other diseases, then uses these patterns to predict new gene candidates.
For 25 diseases and traits (Supplementary Data 2) with sufficient number of GWAS genes (Supplementary Data 3), we observed strong predictive performance across all major disease categories (fivefold cross-validation, Fig. 2a). The results included accurate predictions for a number of cancers (for example, lung, melanoma, and ovarian cancers), cardiovascular and muscular diseases (for example, hypertension and myocardial infarction), nervous-system diseases (for example, amyotrophic lateral sclerosis (ALS) and PD), and metabolic and autoimmune diseases and traits (for example, longevity, obesity, and celiac disease). We also found that the diseaseQUEST predictions reflected many aspects of known disease biology (Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichments in Supplementary Data 4–7). For example, ALS is characterized by motor neuron degeneration and muscle atrophy, and the diseaseQUEST predictions were enriched in genes associated with locomotion and muscle biology (Supplementary Data 4). Moreover, alternative splicing, MAP kinase signaling, and phosphatidylinositol signaling were among the most highly enriched terms, and these pathways were previously implicated in ALS through mouse models12–14. The disease predictions for schizophrenia, which are enriched in various aspects of RNA biology (alternative splicing and mRNA surveillance) and the ubiquitin–proteasome system (Supplementary Data 5), are similarly supported by findings from human studies15–17. Support for the diseaseQUEST predictions extends to cancer, including ovarian and pancreatic carcinomas, in which the genetic basis for most disease cases is unknown. For example, the ovarian cancer predictions were enriched in genes that regulate fatty-acid metabolism and mitochondrial function (Supplementary Data 6), and the pancreatic cancer predictions were highly associated with mRNA splicing/spliceosome factors (Supplementary Data 7), all associations that were consistent with prior literature18–20. These results demonstrate that leveraging the data-driven disease signal from only GWAS studies (i.e., without incorporating any prior disease knowledge into the prediction process), diseaseQUEST predictions identify known aspects of various diseases while making predictions for new genes and pathways that can be experimentally tested in model systems.
Recapitulating known aging biology and predicting tissue-specific longevity genes
To further systematically evaluate the Functional Representation and Disease Prediction modules, we examined gene predictions for longevity. This unique opportunity allows for evaluation of causality, not simply association, because thorough experimental studies in C. elegans have identified genes with clear causal associations with longevity. In fact, decades of small-scale experiments and large-scale screens have identified genes involved in the determination of C. elegans adult lifespan, many of which were later shown to influence lifespan in mammals21. As a test of diseaseQUEST ‘s predictive power, we assessed whether our tissue-specific network–based approach using only human-longevity GWAS genes as input could successfully predict these experimentally identified longevity genes in a data-driven manner. By using the C. elegans network for the intestine, a tissue known to have many roles in lifespan regulation22,23, our method successfully predicted genes known to affect C. elegans adult lifespan (Fig. 2b and Supplementary Data 8, one-sided Wilcoxon rank-sum test, P < 5.619 × 10−7, FDR < 5.7 × 10−5). For example, the top diseaseQUEST predictions successfully identified several lifespan-associated components of the autophagy/TOR machinery (Supplementary Data 8), including let-363/MTOR, hlh-30/TFEB, aak-1/AMPK, atg-7/Atg7, unc-51/Ulk2, lin-45, and ife-2/EIF4E, and intestinal autophagy is indeed specifically required for the increased longevity associated with dietary restriction in C. elegans24. These findings reveal that diseaseQUEST can be successfully used to predict causal disease genes that are amenable to phenotypic screening.
To evaluate the specificity of these predictions, we also calculated the predictive performance of general, non-tissue-specific networks and found that using the intestine network dramatically improved the performance (one-sided Wilcoxon rank-sum tests: intestine network P < 5.619 × 10–7 versus non-tissue-specific global network P < 0.001, WormNet 3.0 (ref. 9); P < 0.003, protein–protein interaction network P < 0.122). Furthermore, when predictive performance on longevity genes was compared across all 203 C. elegans tissue and cell type networks, the intestine and the larger alimentary-system networks were the best performing (Fig. 2c) among the 203 sets of predictions, and many other tissues known to be relevant to longevity were also among the higher-ranked results (Fig. 2c and Supplementary Data 8), thus demonstrating a tight correspondence between the computational models and related biological knowledge. This analysis is especially important because, in addition to demonstrating the accuracy and tissue specificity of diseaseQUEST predictions, it shows that human GWAS genes, although not guaranteed to be causal, as a group, provide an informative signal about disease causality sufficient to discover new candidates.
Applying diseaseQUEST to PD
We next focused on identifying candidate PD genes by using the full computational-experimental diseaseQUEST framework (Supplementary Data 9 and Fig. 3). The PD-candidate genes from the Disease Prediction module appeared to be relevant to human PD biology, because they were significantly enriched in orthologs of known PD genes, according to Human Gene Mutation Database annotations, which were not used in any stage of the diseaseQUEST prediction process (one-sided Wilcoxon rank-sum test, P < 4.151 × 10−4). The predictions were also enriched in orthologs of human genes closest to the 43 significant single-nucleotide polymorphisms reported in a recent 23andMe PD GWAS study25 that was also independent of our analysis (one-sided Wilcoxon rank-sum test, P < 8.577 × 10−6). Furthermore, our predictions were enriched in significantly differentially expressed genes in the substantia nigra of patients with sporadic PD26 (one-sided Wilcoxon rank-sum test, P < 9.009 × 10−3).
To interpret the processes and pathways represented in our top PD predictions, we examined them in the context of the dopaminergic neuron WISP functional network. These predictions formed four major clusters (Fig. 3a and Supplementary Data 10), including two clusters related to movement: cluster A was enriched in genes related to muscle movement, locomotion, and activity level, and cluster B was enriched in terms related to synapse density as well as motor neuron and nervous-system morphology. Clusters C and D were both enriched in metabolic processes, and cluster D also had strong aging/longevity and growth-pathway signals. Overall, the predictions were enriched in cellular components known to be dysregulated in PD27, such as lysosomes and phagosomes (Fig. 3b, Supplementary Fig. 1 and Supplementary Data 11).
Directed PD-candidate screens for age-dependent motility defects
We then used the Phenotypic Assay module to experimentally screen the top predictions for PD-associated phenotypes. Reasoning that age-dependent motility defects could be used to model human PD symptoms, we examined the top-ranked genes for the effects of candidate gene knockdown on swimming behavior with age. To prioritize the top-scoring predictions for experimental follow-up, we considered only worm genes with known human orthologs, and we split these top predictions into three tiers based on known and/or predicted human brain expression and C. elegans neuronal expression (Online Methods and Supplementary Data 12). To avoid developmental defects and to enhance RNA interference (RNAi) in neurons, we knocked down gene expression specifically in neuronal-RNAi-sensitive adults by feeding late larval stage (L4) larvae with bacteria encoding each of the top 45 candidate genes’ double-stranded RNA. We then used CeleST28 to analyze the swimming behavior of young, mid-life, and older worms (days 2, 5, and 8; 13,255 worms across 1,823 videos, Supplementary Data 13 and Supplementary Fig. 2). Swimming slowed with age, and principal component analysis suggested that aging had a major effect on behavior (Supplementary Fig. 3).
However, knockdown of many (11 of 45) of the top PD candidates caused a drastic and significant spasm-like curling phenotype with age (Fig. 4a,b), which also corresponded with ‘stretch’ phenotypes (25 of 45; Fig. 4c), both of which are atypical of normal aging. Notably, knockdown of scav-1, one of the PD GWAS orthologs (which was a ‘positive’ example in our training and was also strongly predicted by our method to be PD related), caused obvious, severe curling (Fig. 4d), and all four of the PD GWAS-positive hits significantly affected stretch (Fig. 4c). Although we originally reasoned that age-dependent defects in motility might be generally analogous to human motor disorders, scav-1’s motility defect suggests that these worm swimming phenotypes can be used to model several aspects of human parkinsonism, including resting tremors, which are also spasm-like.
To assess the specificity of the curling phenotype with regard to the PD predictions, we analyzed the top-scoring genes across a wide spectrum of disease predictions, including cancers and metabolic disorders, for curling. We tested the top predictions across 13 different diseases in 23,662 worms, generating 4,441 snapshots that were analyzed for curling. Even though all genes tested are expressed in adult neurons29 (Supplementary Data 14), none of the non-PD disease-candidate genes caused a curling phenotype (Supplementary Fig. 3). This result demonstrates the specificity of the diseaseQUEST approach in identifying disease-specific genes.
bcat-1 and neurodegeneration
One of the most severe age-related curling defects was caused by adult-specific knockdown of bcat-1 (Fig. 5a,b and Supplementary Fig. 4), a BCAA transferase that is required for development30 but has not been previously linked to PD. BCAT1 catalyzes the first step in the catabolism of BCAAs, which play roles in glutamate metabolism, mTOR signaling, obesity, and diabetes31–33. In C. elegans, bcat-1 knockdown in wild-type adults was previously found to increase the endogenous accumulation of BCAAs (valine, leucine, and isoleucine) and to extend lifespan34, as we also observed. We treated wild-type (N2) animals (whose neurons are refractory to RNAi) with bcat-1 short interfering RNA and found that curling was not induced, in contrast to the phenotype observed in neuronal-RNAi-sensitive worms, thus suggesting that the curling defect is due to bcat-1 downregulation in neurons (Fig. 5c and Supplementary Fig. 5). These findings are consistent with a role of bcat-1 in neuron-related disorders.
The possible role of BCAA metabolism in PD is intriguing. Although this role was not previously characterized in relation to PD, an analysis of Allen Brain Atlas data revealed that BCAT1 expression is high in PD-susceptible brain regions of healthy individuals (Fig. 5d, Supplementary Fig. 6 and Supplementary Data 15), whereas BCAT1 is significantly diminished in the substantia nigra in patients with sporadic PD26 (FDR = 0.0227; Fig. 5e). Furthermore, the levels of BCAAs in the urine of patients with PD correlate with disease severity35, and high levels of BCAAs may be damaging to neuron function36. Strikingly, adults with maple syrup urine disease (which results in high BCAA levels) experience movement disorders, including parkinsonism37, and exhibit loss of dopaminergic neurons in the substantia nigra and pontine nuclei38. In contrast, decreased BCAAs have been found to improve metabolic health in both mice and humans39,40, although motor and cognitive function were not tested in those studies.
bcat-1 decrease promotes dopaminergic neurodegeneration in a C. elegans model of PD
To further examine the role of bcat-1 in PD phenotypes, we used a well-established C. elegans model of dopaminergic neurode-generation41. α-synuclein has been linked to PD both genetically and pathologically42, and worms expressing human α-synuclein in dopaminergic neurons exhibit progressive loss of dopamine neuron cell bodies and neurites43–45. Therefore, we tested whether bcat-1 RNAi influenced α-synuclein-mediated dopaminergic neu-rodegeneration. Knockdown of bcat-1 in a-synuclein-expressing worms increased the loss of dopaminergic cell bodies and neur-ites, and caused the remaining neurites to become irregularly shaped (Fig. 5f,g and Supplementary Fig. 7). These results suggest that bcat-1 exacerbates the effect of α-synuclein in dopaminergic neurons.
Our results demonstrated an association of bcat-1 with the major features of PD: (i) progressive, age-related motor dysfunction and (ii) degeneration of dopaminergic neurons in the context of α-synuclein toxicity. The presence of these features suggests that our model is specifically relevant to PD. Moreover, our findings suggest that BCAA metabolism may provide an as-yet-unidentified link between seemingly disparate neuropathologies in PD.
DISCUSSION
Here, we demonstrated the effectiveness of our diseaseQUEST frame-work for integrative, cross-species analysis of disease-associated genes in revealing mechanisms underlying 25 human diseases and traits. Our framework revealed important underlying biological mechanisms that can now be investigated in mammalian systems, such as the role of bcat-1 in PD.
Although we used reported GWAS genes for longevity and PD, as well as the C. elegans model system as a proof of principle, one of the primary advantages of this framework is its modularity. diseaseQUEST can be readily applied to any disease and any model system (for example, mouse, fly, or zebrafish) for which a relevant high-throughput assay can be developed (Supplementary Note). This extensibility is critical, because researchers may prefer different model organisms depending on disease relevance and experimental convenience. For example, an entorhinal cortex–specific network in mice could be combined with Alzheimer’s disease GWAS46,47 to generate candidate AD genes by using a Phenotypic Assay module of novel object recognition48. Alternatively, a pronephron-specific network in zebrafish combined with cardiac arrhythmia GWAS studies49 and a heart-rate assay50 could be used to identify hypertension gene candidates. As network-based approaches to prioritize candidate disease genes continue to improve, the Disease Prediction module can also be updated to use state-of-the-art methods. A notable additional observation from our analysis, especially the longevity study, for which detailed experimental characterization of the process in worms is available, is that although not all GWAS-identified genes are truly causal, as a group they possess strong signal that enables identification of novel disease candidates, including tissue-specific aspects of their biology.
Overall, our results underscore the importance of systematically integrating computational methods with experimental approaches, as well as combining experimental tools in model organisms, such that high-throughput behavioral analyses can be performed along with large-scale studies in human genetics, to further the understanding of complex diseases.
METHODS
Methods, including statements of data availability and any associated accession codes and references, are available in the online version of the paper.
ONLINE METHODS
We integrated 174 genome-level data sets spanning 56,179 expression- and interaction-based measurements from more than 3,578 publications in addition to small-scale expression assays derived from approximately 2,400 publications, thus generating 203 tissue- and cell-type-specific networks for C. elegans. A semisupervised data-integration method based on regularized Bayesian integration was developed to perform data integration. Each of the 203 networks was evaluated for tissue and functional signal. Worm tissue networks relevant to tissue-specific diseases represented in the GWAS Catalog were used to predict candidate disease genes, and top gene predictions for PD were screened via thrashing assays.
Data-compendium assembly.
We downloaded and processed 24,270 physical-interaction results (based on more than 155 publications), 29,173 genetic interaction results (based on 3,258 publications), and 166 worm microarray data sets (consisting of 2,736 microarray experiments). Processed dataset values were discretized into representative bins for efficient storage and learning.
Physical-interaction data were downloaded from BioGRID52, IntAct53, and MINT54. Data from each database were separately discretized into four bins (0, 1, 2, and ≥3), depending on the number of experiments that supported presence of the corresponding interaction. Genetic-interaction data were downloaded from WormBase (WS241)55. For each pair of genes, the Fisher z-transformed Pearson correlation of interaction profiles (presence/absence of genetic interactions across all other genes) was calculated and discretized into one of the following seven bins: (−∞, −0.1), [−0.1, 0), [0, 0.1), [0.1, 0.25), [0.25, 0.5), [0.5, 0.75), or [0.75, ∞).
Experimentally defined transcription factor (TF)-binding sites were down-loaded from JASPAR56, and the (1 kb) upstream region of each gene was scanned for the presence of TF-binding-site motifs with the MEME software suite57. For each pair of genes, the Fisher z-transformed Pearson correlation of TF binding profiles was calculated and discretized into one of the following seven bins: (−∞, −1.5), [−1.5, −0.5), [−0.5, 0.5), [0.5, 1.5), [1.5, 2.5), [2.5, 3.5), or [3.5, ∞).
Gene expression data sets were downloaded from the Gene Expression Omnibus (GEO) data repository58 maintained by NCBI. After duplicate samples were collapsed, genes with values missing in >30% of the samples were removed, and all other missing values were imputed as described in ref. 59. After normalization of expression within each gene per data set, the product of normalized expression scores per pair of genes in each sample was calculated and discretized into one of the following seven bins: (−∞, −1.5), [−1.5, −0.25), [−0.25, 0.25), [0.25, 1.5), [1.5, 2.5), [2.5, 3.5), or [3.5, ∞).
Semisupervised data integration and network evaluation.
Construction of global functional-interaction gold standard.
The global (tissue-naïve) functional-interaction gold standard was constructed on the basis of coannotation (or absence thereof) of genes to expert-selected biological process terms from GO51 according to whether the term would be verifiable through specific molecular experiments. For each of the 309 selected terms, we obtained all GO annotations with experimental-evidence codes (i.e., EXP, IDA, IPI, IMP, IGI, and IEP).
Gene pairs coannotated to any of the selected terms (after propagation) were considered positive examples of the presence of a functional relationship. Gene pairs lacking coannotation to any term were considered negative examples, except in cases in which the two genes were separately annotated to highly overlapping GO terms (hypergeometric P < 0.05) or coannotated to other higher-level GO terms that might still indicate the possible presence of a functional relationship60. The additional criteria were added to decrease the number of potential false negatives, and any gene pair that met either condition was excluded from the gold standard.
Construction of a tissue–gene expression standard.
Gene annotations to tissue and cell type were obtained from curated anatomy associations from WormBase (WS241)55, as well as annotations from the C. elegans Tissue Expression Consortium61 and other small-scale expression analyses, as curated in ref. 62. No microarray or RNA-seq results were included in the tissue–gene gold standard. All annotations were mapped and propagated on the basis of the WormBase anatomy ontology, and only sufficiently well understood tissues (in terms of gene expression) were retained (more than ten direct gene annotations). A ‘tissue-slim’ was also defined to categorize the resulting tissues. These were system-level anatomy terms in the WormBase anatomy ontology (immediate children of ‘organ system’ and ‘sex-specific entity’, under ‘functional system’).
Incorporation of tissue specificity into a functional gold standard.
To construct a tissue-specific functional gold standard for each tissue, we labeled each gene appearing in either positive or negative example gene pairs in the global functional gold standard with any known tissue annotations in the tissue–gene expression standard. An overlay of tissue–expression implies three possible types of edges for each gene pair in each tissue: between two genes both expressed in the tissue, bridging a gene expressed in the tissue and a gene expressed elsewhere, or exterior to genes in that tissue (i.e., neither gene has been annotated to the tissue). Because the goal of tissue-specific functional networks is to predict functional relationships between genes that are coexpressed in the tissue, positive examples in the tissue-specific functional-relationship gold standard included only between edges for positive examples of functional relationships. Negative examples of tissue-specific functional relationships included a combination of the other edge types, i.e., all three edge types among negative functional examples (between, bridging, and exterior), as well as bridging and exterior edges (relative to the current tissue) among positive functional examples from other tissues.
Supplementation of tissue-specific gold standard by using previously unlabeled features.
The tissue-specific functional gold standard was further supplemented by gene pairs that did not meet the stringent requirements of being present in the global functional gold standard as a positive example and being a between edge, where both genes are annotated to the tissue. The two components of our definition for a positive example of a tissue-specific functional interaction were satisfied as follows:
Functional interaction: there is a predicted functional interaction with high probability in the global functional network.
Tissue coexpression: the predicted tissue–gene expression (based on expression compendium)61 of both genes in a gene pair indicates probable expression in the tissue.
Each gene pair is thus assigned a weight representing the predicted probability of being a true-positive example of a tissue-specific functional relationship:
wij =Pr(i ∈ Gt)Pr(j ∈ Gt)Pr(FRij =1), for genes i and j, with Gt as the set of genes expressed in tissue t, and FRij = I(functional relationship between genes i and j), where I is an indicator function.
For genes known to be in the gold standard and functionally interacting, it is clear that: Pr(i ∈ Gt)=Pr(j ∈ Gt)=Pr(FRij)=1 ⇒ wij =1; for all other gene pairs, 0 ≤ wij ≤1
Data integration considering new features.
Each tissue-specific functional network was learned by our semisupervised regularized Bayesian integration method. More specifically, we trained a naïve Bayesian classifier (while considering weights) for each tissue with a binary class node representing the indicator function for a functional relationship between a pair of genes conditioned on additional nodes representing each of the aforementioned data sets. The global and fully supervised tissue-specific regularized functional integrations were generated as described previously63 (for the fully supervised tissue-specific networks, unweighted tissue-specific gold standards were used in lieu of the global gold standard).
The regularized posterior probability of a tissue-specific functional relationship generated from our semisupervised method for any gene pair i and j was calculated as follows:
where TFRij = I(tissue-specific functional relationship between genes i and j), where I is an indicator function; is the kth data set for which both genes i and j have data, and is the actual experimental value for genes i and j.
The typical term in the naïve Bayes equation has been replaced with a weighted data-set probability function for purposes of regularization:
where
Here, η is a pseudocount constant (set to 3 in our integration, as done previously)63, |Dk| is the number of discretization levels for data set Dk, and wij is the previously described gold-standard weight. Uk is the data-set mutual-information criterion for any data set Dk, with Ipairs∈negative(Dk;Di) as the mutual information between data sets Dk and Di for any gene pairs that are negative examples of functional interactions (on the basis of the tissue-naïve functional gold standard), and H(Dk) as the entropy of data set Dk.
Regularization was necessary because large-scale genomics data sets typically violate the assumption of conditional independence for naïve Bayes classifiers. As in ref.63, we calculated the nonbiological conditional dependency between data sets and weighted them accordingly, to minimize the negative effects of violating the conditional-independence assumption.
After training the Bayesian classifier for each tissue, we used each model to estimate the probability of tissue-specific functional interactions between all pairs of genes represented in the data compendium. Implementation of these integration procedures used the Sleipnir library for functional genomics64, in which the weighted integration procedure has been added and is now publicly available.
Isotonic-regression adjustment of network probabilities.
To further mitigate the effect of violating the conditional independence assumption for naïve Bayes classifiers (which results in posterior probability estimates being pushed toward 0 and 1), we used isotonic regression to calibrate the probabilities output by our method, as described in ref. 65.
Evaluation of tissue-specific functional relationships.
We evaluated the global and all tissue-specific functional networks (with and without semisupervised learning) by using a random gene holdout (one-third of all genes) from the gold standard. Thus, for the global and tissue-specific functional networks trained without using unlabeled edges, all gene pairs for which either gene was present in the holdout were excluded from training. For the semisupervised tissue-specific functional networks, the same group of genes was held out at all stages of training (i.e., from the steps leading to the weighting of previously unlabeled features, including the tissue–gene expression standard for tissue–gene expression predictions and the functional-interaction standard used to generate the functional-interaction predictions). The set of gene pairs used for evaluation were pairs for which both genes were present in the hold-out. All networks were evaluated on the basis of their AUROC.
Evaluation of tissue-specific functional networks generated by using progressively smaller subsamples (proportions: 0.02, 0.03, 0.04, 0.05, 0.1, 0.15, 0.2, 0.4, 0.6, and 0.8) of the full worm compendium showed that our network-construction method was robust to data-compendium size (Supplementary Fig. 8). Similarly, networks of progressively smaller subsamples of prior knowledge (i.e., tissue gene annotations; proportions: 0.05, 0.1, 0.15, 0.2, 0.4, 0.6, and 0.8) showed that the approach is powerful in situations with limited prior knowledge.
Human GWAS gene prediction.
Reported genes for GWAS represented in the GWAS Catalog11 were aggregated, and functional analogs were identified in worms10. When possible, GWAS diseases were mapped to the Disease Ontology. For each disease, the worm functional analogs were used as positive examples. Orthologs of all other genes reported in the GWAS Catalog (excluding genes reported in the same disease category, on the basis of the Disease Ontology slim) were used as negative examples. Each disease with a biologically relevant tissue network and at least five positive examples in its gold standard was retained. We then used this gold standard along with the relevant tissue network as features to predict additional disease genes, by using our previously validated network-based SVM prediction method66. SVM scores were converted to fold-over-random scores by first calculating probabilities with the Platt method67, then dividing the probability by the prior probability of candidate-gene prediction (based on the number of positives and negatives in the corresponding gold standard).
Clustering the top PD candidates in the dopaminergic neuron network.
We created a dopaminergic neuron subnetwork, in which nodes were all PD candidate-gene predictions with a probability greater than twofold over random of being PD associated, and clustered the corresponding shared k-nearest-neighbors (SKNN) network by using the Louvain community-finding algorithm68. Given any graph, we calculated the SKNN network by transforming each edge weight to the number of shared top k-nearest-neighbors (on the basis of ranking all neighbors by the original weights) and took the subnetwork defined by the top 5% of these edge weights. The Louvain algorithm was then used to cluster the resulting network. We used k = 50 for the clustering presented here but confirmed that the clustering was robust for k between 10 and 100. Furthermore, we subsampled 90% of the nodes and repeated the Louvain algorithm 1,000 times. For each pair of genes, a cluster co-membership score was calculated according to the proportion of times the pair was partitioned to the same cluster. Pairs of genes with co-membership scores ≥0.2 are shown in Figure 4, in which the layout (by using gephi69) is based on the edge weights ≥0.65 in the dopaminergic neuron network. The layout was robust to different co-membership scores and edge-weight cutoffs. The enrichment in GO biological process and WormBase phenotype terms in each cluster was calculated by using one-sided Fisher’s exact tests, with Benjamini–Hochberg multiple hypothesis testing correction to calculate the FDR.
Selection of PD genes for further experimental validation.
After ranking of the gene predictions for PD, the list of genes was filtered for any genes with known human orthologs. Any genes with a chance greater than twofold over random of being a PD-associated gene were split into three tiers (in which each gene–ortholog pair would appear in only the highest matching tier; for example, if gene a–ortholog a was in Tier 1, even if it met the criteria for Tier 2 or 3, it would not be included in those tiers):
Tier 1.
The worm gene is annotated to be neuron expressed by WormBase, and at least one of its human orthologs is annotated to be brain expressed by HPRD
Tier 2.
The worm gene is expressed in a neuron-specific RNA-seq library29, and at least one of its human orthologs is annotated to be brain expressed by HPRD.
Tier 3.
The worm gene is either annotated to be neuron expressed by WormBase or is expressed in the neuron-specific RNA-seq library, and at least one of its human orthologs is expressed in many brain expression samples (as determined by the Gene Expression Barcode70).
Thrashing screen for age-related motor defects.
RNAi clones were obtained from the Ahringer RNAi library. Candidate PD-related genes were tested for thrashing abnormalities at days 2, 5, and 8 of adulthood. Strain LC108 was synchronized from eggs onto HG plates seeded with OP50. At the L4 larval stage, worms were transferred via pipetting onto RNAi-seeded, IPTG-induced HG plates containing carbenicillin, IPTG, and 0.05 mM FUdR. Worms were transferred onto fresh RNAi-seeded HG plates on days 3 and 5 of the assay. Thrashing tests were performed as previously described28. Briefly, approximately four worms were picked at one time into a 10-μL drop of M9 buffer on a microscope slide. 30-s videos were captured with an ocular-fitted iPhone 5 camera attached to a standard dissection microscope via an Arcturus Magnifi mount. Between 50 and 700 worms were imaged on each day for each strain tested. Images were captured with inverted colors, i.e., white worms on a black background, as required by the CeleST processing suite.
C. elegans strains.
C. elegans strains were grown on nematode growth medium (NGM) plates seeded with OP50 Escherichia coli and maintained at 20 °C. The following strains were used in this study: wild-type worms of the N2 Bristol strain, LC108 uIs69 (myo-2p::mCherry, unc-119p::sid-1), TU3311 uIs60 (unc-119p::sid-1, unc-119p::yfp), CQ495 vsIs48 (unc-17p::gfp); uIs69 (myo-2p::mCherry, unc-119p::sid-1), CQ435 vtIs7 (dat-1p::gfp); uIs69 (myo-2p::mCherry, unc-119p::sid-1), CQ492 vtIs7 [dat-1p::gfp]; vIs69 [pCFJ90 (myo-2p::mCherry + unc-119p::sid-1)], and CQ434 baIn11 [dat-1p:: a-syn; dat-1p::gfp]; vIs69 [pCFJ90 (myo-2p::mCherry + unc-119p::sid-1)].
CeleST.
Captured videos were analyzed for a variety of motility characteristics via CeleST Worm tracker software28. Individual frames were extracted, and images were converted to grayscale and sharpened via ImageMagick (www.imagemagick.org/script/index.php/). After the user defined the bounding box for each video, CeleST automated the identification of individual worms and their procession throughout each image batch, as well as denoting periods in which confounding factors (such as worm overlap or disappearance from frame) led to censoring of the frames. Thereafter, a manual check of each worm was performed, in which the time course of each worm was displayed, and the user confirmed or rejected the software’s judgment for each defined block of time. The output of CeleST provides quantitative analysis of ten separate aspects of worm motility on an individual and collective basis.
RNAi treatment.
For individual RNAi experiments, animals were synchronized from eggs through bleaching and plated on HG plates seeded with OP50. At day 1 of adulthood, RNAi-seeded 100-mm NGM plates containing carbenicillin and IPTG were induced with 0.1 M IPTG 1 h before worm transfer. Adult worms were picked onto RNAi plates and incubated at 20 °C. Worms were transferred onto fresh RNAi plates on days 3 and 5. Approximately 100 worms were imaged for each strain on each day of testing.
Manual curling analysis.
Manual analysis of the curling phenotype was used to complement the CeleST software, which underestimates the percentage of time spent curling. The quantification was performed with a standard EXTECH Instruments stopwatch. The percentage of time spent in a curled pose, as defined by the sum of the periods in which either the head or tail makes contact with a noncontiguous segment, was measured for each individual worm over the span of each video. Because multiple actors were involved in the measurement process, minimization of subjectivity was met by comparison of a sample by all involved. Measurement of a single condition was equally distributed among actors to further account for any variance in judgment or precision. More than 6,000 worms were individually measured for the assays in Supplementary Figures 4–6.
Microscopy.
Animals treated with RNAi from day 1 through day 8 of adult-hood were mounted on 2% agarose pads in M9 and sodium azide. Images were captured on a Nikon Eclipse Ti inverted microscope and processed in Nikon NIS elements software. At least 15 worms were imaged per condition in each replicate. For dopaminergic (dat-1p::GFP-labeled) neurons, cell bodies of the six head neurons were counted, and neurite morphology was examined by using the projections extending from the labeled head neurons. For dopaminergic neuron imaging, neuronal RNAi-sensitive worms expressing dat-1p::GFP and dat-1p:α-synuclein were treated with control or bcat-1 RNAi from day 1 of adulthood. Imaging of day 6 adults was performed on a Nikon A1 confocal microscope at 40× magnification, and z stacks were processed in Nikon NIS elements software. ADE and CEP cell bodies were counted, as well as neurites projecting anteriorly from CEP cell bodies.
Statistics and reproducibility.
In Figure 4d, the mean ± s.e.m. is shown, and an unpaired two-sided t-test was performed, P = 1.15 × 10–13. L4440, n = 165 animals. scav-1 RNAi, n = 116 animals. T = 7.812, df = 279, 95% CI = (9.284, 15.54).
In Figure 5a, two-way ANOVA with Sidak’s multiple-comparison test was performed. Control: day 2, n = 492; day 5, n = 345; day 8, n = 573. bcat-1 RNAi: day 2, n = 675; day 5, n = 714; day 8, n = 582. Curling day 8 control versus bcat-1 RNAi, t = 6.829, df = 3,375, 95% CI: (−2.364, −1.139), P = 3.04 × 10–11. Stretch day 2 control versus bcat1 RNAi, t = 3.449, df = 3,375, 95% CI: (−0.054, −0.0098), P = 0.00171. Stretch day 5 control versus bcat1 RNAi, t = 8.502, df = 3,375, 95% CI: (−0.1115, −0.06256), P < 1 × 10–15. Stretch day 8 control versus bcat1 RNAi, t = 14.58, df = 3,375, 95% CI: (−0.156, −0.1121), P < 1 × 10–15.
In Figure 5c, two-way repeated-measures ANOVA with Sidak’s multiple-comparison test was performed. Control:unc-119p::sid-1, n = 119 animals; bcat-1 RNAi:unc-119p::sid-1, n = 133 animals; control:wild type, n = 98 animals; bcat-1 RNAi:wild type, n = 103 animals. Multiple comparisons: Control:unc-119p::sid-1 versus bcat-1 RNAi:unc-119p::sid-1, t = 7.46, df = 449, 95% CI: (−19.87, −9.477), P = 2.699 × 10−12. Control:wild type versus bcat-1 RNAi:wild type, t = 0.2002, df = 449, 95% CI: (5.373, 6.254), P = 0.999. bcat-1 RNAi:unc-119p::sid-1 versus bcat-1 RNAi:wild type, t = 9.585, df = 449, 95% CI: (14.2, 25.02), P < 1 × 10−15.
In Figure 5g, unpaired two-sided t-tests were performed. L4440, n = 45; bcat-1 RNAi, n = 61. Mean ± s.e.m. Top, t = 5.446, df = 104, 95% CI: (−1.34, −0.06248), P = 3.46 ×10−7. Bottom, t = 5.015, df = 104, 95% CI: (−1.38, −0.05988), P = 2.19×10−6. ****P < 0.0001. The experiment was repeated three times independently and yielded similar results.
In Supplementary Figure 3d, mean ± s.e.m. are shown. Control, n = 351; bcat-1, n = 420; cyb-2.1, n = 287; pxl-1, n = 289; frm-2, n = 279; mre-11, n = 272; sma-4, n = 286; snt-4, n = 305; cdh-4, n = 285; lbp-2, n = 320; ani-3, n = 300; hcp-1, n = 264; BE0003N10.1, n = 229; let-363, n = 284; hil-3, n = 270. n represents the number of animals per condition. One-way ANOVA with Tukey’s multiple-comparison test. Control versus bcat-1 RNAi, P = 4.33 × 10−8. ****P < 0.0001.
In Supplementary Figure 4, mean ± s.e.m. are shown, Two-way ANOVA with Sidak’s multiple-comparison test was performed. Control: day 2, n = 492; day 5, n = 345; day 8, n = 573. bcat-1 RNAi: day 2, n = 675; day 5, n = 714; day 8, n = 582. Body wave number day 2 control versus bcat-1 RNAi, t = 3.075, df = 3,375, 95% CI: (−0.2648, −0.03323), P = 0.0064.
In Supplementary Figure 5a, two-way repeated-measures ANOVA with Sidak’s multiple-comparison test was performed. Mean ± s.e.m. are shown. Control:unc-119p::sid-1, n = 28 animals; bcat-1 RNAi:unc-119p::sid-1, n = 41 animals; control:wild type, n = 24 animals; bcat-1 RNAi:wild type, n = 30 animals. Multiple comparisons: Control:unc-119p::sid-1 versus bcat-1 RNAi:unc- 119p::sid-1, t = 3.156, df = 119, 95% CI: (−18.7, −1.491), P = 0.0121. Control: wild type versus bcat-1 RNAi:wild type, t = 0.7787, df = 119, 95% CI: (−11.98, 6.577), P = 0.9684. bcat-1 RNAi:unc-119p::sid-1 versus bcat-1 RNAi:wild type, t = 3,422, df = 119, 95% CI: (2.272, 18.55), P = 0.0051.
In Supplementary Figure 5b, two-way repeated-measures ANOVA with Sidak’s multiple-comparison test was performed. Mean ± s.e.m. are shown. Control:unc-119p::sid-1, n = 75 animals; bcat-1 RNAiiunc-119p::sid-1, n = 86 animals; control:wild type, n = 73 animals; bcat-1 RNAi:wild type, n = 76 animals. Multiple comparisons: Control:unc-119p::sid-1 versus bcat- 1 RNAi:unc-119p::sid-1, t = 4.305, df = 306, 95% CI: (−10.68, −2.546), P = 0.000135. Control:wild type versus bcat-1 RNAi:wild type, t = 0.8621, df = 306, 95% CI: (−5.595, 2.847), P = 0.948. bcat-1 RNAi:unc-119p::sid-1 versus bcat-1 RNAi:wild type, t = 4.576, df = 306, 95% CI: (2.952, 11.06), P = 000041.
In Supplementary Figure 7, unpaired two-sided t-tests were performed. Mean ± s.e.m. are shown. L4440, n = 45; bcat-1 RNAi, n = 61. t = 0.4156, df = 104, 95% CI: (−0.3112, 0.2033), P = 0.6785. The experiment was repeated three times independently and yielded similar results.
Code availability.
The semisupervised integration procedure has been integrated into our Sleipnir library for functional genomics68 (https://libsleipnir.bitbucket.io/), the entire codebase can be downloaded from Supplementary Software, and we have also provide a diseaseQUEST docker image (https://github.com/FunctionLab/diseasequest-docker/; Supplementary Note).
Reporting Summary.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability.
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Source data for Figures 4a,c and 5a have been provided in Supplementary Data 13.
Supplementary Material
ACKNOWLEDGMENTS
We thank K. Yao, R. Hong, and J. Zhou for assistance with video analysis, G. Laevsky for assistance with confocal microscopy, the CGC for strains, and Z. Gitai and the laboratories of O.G.T. and C.T.M. for valuable discussion. Strain UA44 was generously provided by G. Caldwell (University of Alabama), and strain BY250 was a generous gift from R. Blakely (Vanderbilt University). V.Y. was supported in part by US NIH grant T32 HG003284. O.G.T. is supported as a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR). C.T.M. is supported as the Director of the Glenn Center for Aging Research at Princeton and as an HHMI-Simons Faculty Scholar. This work was supported by the NIH (R01 GM071966 to O.G.T. and Cognitive Aging R01 and DP1 Pioneer Award to C.T.M.).
Footnotes
Note: Any Supplementary Information and Source Data files are available in the online version of the paper.
COMPETING INTERESTS
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Greene CS et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet 47, 569–576 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Krishnan A et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci 19, 1454–1462 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rossin EJ et al. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet. 7, e1001273 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liu Y et al. Network-assisted analysis of GWAS data identifies a functionally-relevant gene module for childhood-onset asthma. Sci. Rep 7, 938 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.International Multiple Sclerosis Genetics Consortium. Network-based multiple sclerosis pathway analysis with GWAS data from 15,000 cases and 30,000 controls. Am. J. Hum. Genet 92, 854–865 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pendse J et al. A Drosophila functional evaluation of candidates from human genome-wide association studies of type2 diabetes and related metabolic traits identifies tissue-specific roles for dHHEX. BMC Genomics 14, 136 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bournele D & Beis D Zebrafish models of cardiovascular disease. Heart Fail. Rev 21, 803–813 (2016). [DOI] [PubMed] [Google Scholar]
- 8.Shulman JM et al. Functional screening of Alzheimer pathology genome-wide association signals in Drosophila. Am. J. Hum. Genet 88, 232–238 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cho A et al. WormNet v3: a network-assisted hypothesis-generating server for Caenorhabditis elegans. Nucleic Acids Res. 42, W76–W82 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Park CY et al. Functional knowledge transfer for high-accuracy prediction of under-studied biological processes. PLoS Comput. Biol 9, e1002957 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Welter D et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Arnold ES et al. ALS-linked TDP-43 mutations produce aberrant RNA splicing and adult-onset motor neuron disease without aggregation or loss of nuclear TDP-43. Proc. Natl. Acad. Sci. USA 110, E736–E745 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kim EK & Choi E-J Pathological roles of MAPK signaling pathways in human diseases. Biochim. Biophys. Acta 1802, 396–405 (2010).20079433 [Google Scholar]
- 14.Wagey R, Pelech SL, Duronio V & Krieger C Phosphatidylinositol 3-kinase: increased activity and protein level in amyotrophic lateral sclerosis. J. Neurochem 71, 716–722 (1998). [DOI] [PubMed] [Google Scholar]
- 15.Takata A, Matsumoto N & Kato T Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci. Nat. Commun 8, 14519 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Addington AM et al. A novel frameshift mutation in UPF3B identified in brothers affected with childhood onset schizophrenia and autism spectrum disorders. Mol. Psychiatry 16, 238–239 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rubio MD, Wood K, Haroutunian V & Meador-Woodruff JH Dysfunction of the ubiquitin proteasome and ubiquitin-like systems in schizophrenia. Neuropsychopharmacology 38, 1910–1920 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pyragius CE, Fuller M, Ricciardelli C & Oehler MK Aberrant lipid metabolism: an emerging diagnostic and therapeutic target in ovarian cancer. Int. J. Mol. Sci 14, 7742–7756 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang CW, Hsu WH & Tai CJ Antimetastatic effects of cordycepin mediated by the inhibition of mitochondrial activity and estrogen-related receptor a in human ovarian carcinoma cells. Oncotarget 8, 3049–3058 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dvinge H, Kim E, Abdel-Wahab O & Bradley RK RNA splicing factors as oncoproteins and tumour suppressors. Nat. Rev. Cancer 16, 413–430 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kenyon CJ The genetics of ageing. Nature 464, 504–512 (2010). [DOI] [PubMed] [Google Scholar]
- 22.Libina N, Berman JR & Kenyon C Tissue-specific activities of C. elegans DAF-16 in the regulation of lifespan. Cell 115, 489–502 (2003). [DOI] [PubMed] [Google Scholar]
- 23.Zhang P, Judy M, Lee S-J & Kenyon C Direct and indirect gene regulation by a life-extending FOXO protein in C. elegans: roles for GATA factors and lipid gene regulators. Cell Metab. 17, 85–100 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gelino S et al. Intestinal autophagy improves healthspan and longevity in C. elegans during dietary restriction. PLoS Genet. 12, e1006135 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pickrell JK et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet 48, 709–717 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Moran LB et al. Whole genome expression profiling of the medial and lateral substantia nigra in Parkinson’s disease. Neurogenetics 7, 1–11 (2006). [DOI] [PubMed] [Google Scholar]
- 27.Levine B & Kroemer G Autophagy in the pathogenesis of disease. Cell 132, 27–42 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Restif C et al. CeleST: computer vision software for quantitative analysis of C. elegans swim behavior reveals novel features of locomotion. PLoS Comput. Biol 10, e1003702 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kaletsky R et al. The C. elegans adult neuronal IIS/FOXO transcriptome reveals adult phenotype regulators. Nature 529, 92–96 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Maeda I, Kohara Y, Yamamoto M & Sugimoto A Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi. Curr. Biol 11, 171–176 (2001). [DOI] [PubMed] [Google Scholar]
- 31.Sakai R, Cohen DM, Henry JF, Burrin DG & Reeds PJ Leucine-nitrogen metabolism in the brain of conscious rats: its role as a nitrogen carrier in glutamate synthesis in glial and neuronal metabolic compartments. J. Neurochem 88, 612–622 (2004). [DOI] [PubMed] [Google Scholar]
- 32.Newgard CB et al. A branched-chain amino acid-related metabolic signature that differentiates obese and lean humans and contributes to insulin resistance. Cell Metab. 9, 311–326 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lynch CJ & Adams SH Branched-chain amino acids in metabolic signalling and insulin resistance. Nat. Rev. Endocrinol 10, 723–736 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mansfeld J et al. Branched-chain amino acid catabolism is a conserved regulator of physiological ageing. Nat. Commun 6, 10043 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Luan H et al. Comprehensive urinary metabolomic profiling and identification of potential noninvasive marker for idiopathic Parkinson’s disease. Sci. Rep 5, 13888 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Manuel M & Heckman CJ Stronger is not always better: could a bodybuilding dietary supplement lead to ALS? Exp. Neurol 228, 5–8 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Carecchio M et al. Movement disorders in adult surviving patients with maple syrup urine disease. Mov. Disord 26, 1324–1328 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kiil R & Rokkones T Late manifesting variant of branched-chain ketoaciduria (maple syrup urine disease). Acta Paediatr. 53, 356–364 (1964). [DOI] [PubMed] [Google Scholar]
- 39.Scaini G et al. Chronic administration of branched-chain amino acids impairs spatial memory and increases brain-derived neurotrophic factor in a rat model. J. Inherit. Metab. Dis 36, 721–730 (2013). [DOI] [PubMed] [Google Scholar]
- 40.Fontana L et al. Decreased consumption of branched-chain amino acids improves metabolic health. Cell Rep. 16, 520–530 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Harrington AJ, Yacoubian TA, Slone SR, Caldwell KA & Caldwell GA Functional analysis of VPS41-mediated neuroprotection in Caenorhabditis elegans and mammalian models of Parkinson’s disease. J. Neurosci 32, 2142–2153 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Goedert M, Spillantini MG, Del Tredici K & Braak H 100 years of Lewy pathology. Nat. Rev. Neurol 9, 13–24 (2013). [DOI] [PubMed] [Google Scholar]
- 43.Lakso M et al. Dopaminergic neuronal loss and motor deficits in Caenorhabditis elegans overexpressing human alpha-synuclein. J. Neurochem 86, 165–172 (2003). [DOI] [PubMed] [Google Scholar]
- 44.Cao S, Gelwix CC, Caldwell KA & Caldwell GA Torsin-mediated protection from cellular stress in the dopaminergic neurons of Caenorhabditis elegans. J. Neurosci 25, 3801–3812 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kuwahara T et al. Familial Parkinson mutant alpha-synuclein causes dopamine neuron dysfunction in transgenic Caenorhabditis elegans. J. Biol. Chem 281, 334–340 (2006). [DOI] [PubMed] [Google Scholar]
- 46.Beecham GW et al. Genome-wide association meta-analysis of neuropathologic features of Alzheimer’s disease and related dementias. PLoS Genet. 10, e1004606 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lambert JC et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet 45, 1452–1458 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wilson DIG et al. Lateral entorhinal cortex is critical for novel object-context recognition. Hippocampus 23, 352–366 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Christophersen IE et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet 49, 946–952 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kithcart A & MacRae CA Using zebrafish for high-throughput screening of novel cardiovascular drugs. JACC Basic Transl. Sci 2, 1–12 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, D1049–D1056 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Chatr-Aryamontri A et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470–D478 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Orchard S et al. The MIntAct project: IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Licata L et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Harris TW et al. WormBase 2014: new views of curated biology. Nucleic Acids Res. 42, D789–D793 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Mathelier A et al. JASPAR 2014: an extensively expanded and updated openaccess database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bailey TL et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Barrett T et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 41, D991–D995 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Troyanskaya O et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001). [DOI] [PubMed] [Google Scholar]
- 60.Myers CL, Barrett DR, Hibbs MA, Huttenhower C & Troyanskaya OG Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hunt-Newbury R et al. High-throughput in vivo analysis of gene expression in Caenorhabditis elegans. PLoS Biol. 5, e237 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Chikina MD, Huttenhower C, Murphy CT & Troyanskaya OG Global prediction of tissue-specific gene expression and context-dependent gene networks in Caenorhabditis elegans. PLoS Comput. Biol 5, e1000417 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Huttenhower C et al. Exploring the human genome with functional maps. Genome Res. 19, 1093–1106 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Huttenhower C, Schroeder M, Chikina MD & Troyanskaya OG The Sleipnir library for computational functional genomics. Bioinformatics 24, 1559–1561 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Niculescu-Mizil A & Caruana R Predicting good probabilities with supervised learning in ICML ′05 Proc. 22nd Intl. Conf. Mach. Learn 625–632 (ACM Press, Bonn, Germany, 2005). [Google Scholar]
- 66.Guan Y, Ackert-Bicknell CL, Kell B, Troyanskaya OG & Hibbs MA Functional genomics complements quantitative genetics in identifying disease-gene associations. PLOS Comput. Biol 6, e1000991 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Platt JC Probabilities for SV Machines in Advances in Large Margin Classifiers (eds. Smola AJ et al. ) 61–74 (Massachusetts Institute of Technology, Cambridge, MA, USA, 2000). [Google Scholar]
- 68.Blondel VD, Guillaume J-L, Lambiotte R & Lefebvre E Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008). [Google Scholar]
- 69.Bastian M, Heymann S & Jacomy M Gephi: an open source software for exploring and manipulating networks In Int. AAAI Conf. Weblogs Soc. Media (Association for the Advancement of Artificial Intelligence, Menlo Park, CA, USA, 2009). [Google Scholar]
- 70.McCall MN, Uppal K, Jaffee HA, Zilliox MJ & Irizarry RA The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res. 39, D1011–D1015 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Source data for Figures 4a,c and 5a have been provided in Supplementary Data 13.