Abstract
Rice is a staple food for one-half the world's population and a model for other monocotyledonous species. Thus, efficient approaches for identifying key genes controlling simple or complex traits in rice have important biological, agricultural, and economic consequences. Here, we report on the construction of RiceNet, an experimentally tested genome-scale gene network for a monocotyledonous species. Many different datasets, derived from five different organisms including plants, animals, yeast, and humans, were evaluated, and 24 of the most useful were integrated into a statistical framework that allowed for the prediction of functional linkages between pairs of genes. Genes could be linked to traits by using guilt-by-association, predicting gene attributes on the basis of network neighbors. We applied RiceNet to an important agronomic trait, the biotic stress response. Using network guilt-by-association followed by focused protein–protein interaction assays, we identified and validated, in planta, two positive regulators, LOC_Os01g70580 (now Regulator of XA21; ROX1) and LOC_Os02g21510 (ROX2), and one negative regulator, LOC_Os06g12530 (ROX3). These proteins control resistance mediated by rice XA21, a pattern recognition receptor. We also showed that RiceNet can accurately predict gene function in another major monocotyledonous crop species, maize. RiceNet thus enables the identification of genes regulating important crop traits, facilitating engineering of pathways critical to crop productivity.
Keywords: systems biology, plant genetics, gene-trait associations
Rice (Oryza sativa) is the most important staple food crop. As one of the best studied grasses, rice also has an accumulated wealth of knowledge that makes it an attractive candidate as a reference for other important staple crops and emerging biofuel grasses (1). Its compact genome size (≈430 Mb), well-established methods for genetic transformation (2), availability of high-density genetic maps and whole-genome microarrays (reviewed in ref. 3), finished genome sequence (4), and close relationships with other cereals, all make rice an ideal model system in which to study plant physiology, development, agronomics, and genomics of grasses (5–7). Furthermore, in recent years, several laboratories have successfully developed gene-indexed mutants for targeted loss-of-function or gain-of-function analysis of many rice genes (reviewed in ref. 3).
These advances have led to the accumulation of sufficient public data to construct systems-level models of rice gene interactions. In principle, such models should allow for the prediction and systematic discovery of genes and associated pathways that control phenotypes of economic importance, such as tolerance to environmental stress and resistance to disease. We have developed such a network modeling platform, called a probabilistic functional gene network (8, 9), variations of which have been successfully applied to predict novel gene functions in e.g., yeast (10), worm (11), mice (12, 13), humans (14–17), and Arabidopsis thaliana (18). In principle, Arabidopsis gene networks should be useful even for rice genes, as many processes are conserved between dicotyledonous species, including Arabidopsis, and monocotyledonous species, including rice. However, monocots and dicots diverged >160–200 million years ago; thus, many gene networks differ significantly between these two main groups of flowering plants (19–25). Therefore, a complete understanding of monocot gene networks will depend on a full characterization of these pathways in an experimentally tractable monocotyledonous species such as rice. Here, we present an experimentally validated genome-scale functional gene network of a monocotyledonous species, a network of rice genes, named RiceNet, reconstructed from quantitative integration of available genomics and proteomics datasets.
Construction of a genome-wide network for rice is challenging for several reasons. First, whereas A. thaliana has ≈27,000 protein coding genes (The Arabidopsis Information Resource, release 9; ref. 26), rice has 41,203 nontransposable element (TE)-related protein coding genes [The Institute for Genomic Research (TIGR) rice annotation release 5; ref. 27]. This increased genome complexity results in a combinatorial explosion for the number of hypotheses for pairwise relations between genes, complicating discovery of true functional associations. Second, the current reference knowledge and raw genomic data available for models are much sparser for rice than for Arabidopsis, reducing predictive power of models. Third, the experimental validation of predicted gene function is more difficult in rice than Arabidopsis because of a longer reproductive cycle, larger plant sizes, greenhouse requirements, fewer available gene knockout strains, and less efficient transformation procedures (3). Despite these hurdles, we reconstructed a network covering ≈50% of the 41,203 rice genes. This network builds on a published midsized network of 100 rice stress response proteins that was constructed through protein interaction mapping (28). We demonstrated that RiceNet associations are highly predictive for diverse biological processes in rice. We further predicted and experimentally validated three previously unknown regulators of resistance mediated by XA21, a rice pattern recognition receptor that is a key determinant of the innate immune response (29). RiceNet also showed significant predictive power for identifying genes that function in the stress response of another major crop, maize. These results indicate that RiceNet can accurately predict gene function in monocotyledonous species.
Results
RiceNet: A Genome-Scale Gene Network for a Monocotyledonous Species.
We aimed to construct a gene network of rice spanning as many as feasible of the 41,203 non-TE–related protein coding genes annotated by the TIGR Rice Genome Annotation Release 5 (27). Only limited numbers of genome-scale datasets are available for rice. Such limitations, in turn, reduce the scale of networks that can be reconstructed (e.g., a rice gene network based on only mRNA coexpression in all available rice gene microarrays covered only ≈10% of the genome; ref. 30). This shortcoming can be partially overcome by transferring datasets from other organisms via the use of gene orthology relationships (31). The key to using such data lies in the judicious weighting of datasets from other species to maximize reconstruction of conserved rice gene systems, while not degrading reconstruction of rice-specific gene systems from datasets collected in rice. The value of this approach has been shown in reconstruction of gene networks for Caenorhabditis elegans (11) and Arabidopsis (18). Thus, in addition to rice (Oryza sativa) datasets, we also identified evolutionarily conserved gene–gene linkages between rice genes by using datasets from Saccharomyces cerevisiae, C. elegans, Homo sapiens, and A. thaliana.
A total of 24 different types of data (spanning many individual datasets) were quantitatively integrated into a single gene network as described in full in the SI Appendix. The datasets used to infer functional linkages spanned many types of gene–gene relationships, including both direct measurements of physical and genetic interactions, as well as inferred interactions from genome sequences, literature mining, and protein structures. In all, the 24 data types, from five different organisms, included transcript coexpression links based on several hundred DNA microarray datasets (e.g., SI Appendix, Table S1 for rice), genome-scale protein–protein physical interactions mapped by yeast two hybrid or affinity purification followed by mass spectrometry identification of protein complexes, linkages between genes automatically mined from PubMed articles, protein–protein interactions from curated databases, linkages between proteins with similar domain co-occurrence profiles, genetic interactions, linkages based on genes’ phylogenetic profiles or tendencies for bacterial orthologs to occur as neighbors in many bacterial genomes, and protein–protein interactions inferred from protein tertiary structures (Fig. 1A and SI Appendix, Table S2). Clearly, differing lines of evidence provide widely differing degrees of support for functional linkages. For each pair of rice genes, the evidence from all datasets was integrated into a single numerical log likelihood score (LLS) denoting the likelihood for those genes to function together as supported by sharing Gene Ontology biological process (GO-BP) terms annotated by the TIGR Rice Genome Annotation Release 5 (27) (SI Appendix). Integration improved genome coverage and linkage accuracy beyond all of the individual datasets (Fig. 1A). The final, integrated set of gene–gene linkages, named RiceNet (www.functionalnet.org/ricenet), contains a total of 588,221 links connecting 18,377 non-TE–related rice protein-coding genes (44.6% of 41,203 genes) (SI Appendix, Fig. S1). RiceNet also covers 16,678 (63.7%) of the 26,178 rice genes known, thus far, to be expressed.
Fig. 1.
Summary of construction and computational assessment of RiceNet, a genome-scale functional gene network for O. sativa. (A) Pairwise gene linkages derived from 24 diverse functional genomics and proteomics data types, each spanning many individual experiments and representing in all >60 million experimental or computational observations, were integrated into a composite gene network with higher accuracy and genome coverage than any individual dataset. The integrated network (RiceNet) contains 588,221 functional linkages among 18,377 (45%) of the 41,203 non-TE–related protein-coding rice genes. The plot x axis indicates the log-scale percentage of the 41,203 protein-coding genes covered by functional linkages derived from the indicated datasets (plotted curves); the y axis indicates the accuracy of functional linkages derived from the datasets, measured as the cumulative likelihood for linked genes to share GO-BP term annotations, tested using 0.632 bootstrapping and plotted for successive bins of 1,000 linkages each (symbols). Datasets are named as XX-YY, where XX indicates species of data origin (AT, A. thaliana; CE, C. elegans; HS, H. sapiens; OS, O. Sativa; SC, S. cerevisiae) and YY indicates data type (CC, cocitation; CX, mRNA coexpression; DC, domain co-occurrence; GN, gene neighbor; GT, genetic interaction; LC, literature curated protein interactions; MS, affinity purification/mass spectrometry; PG, phylogenetic profiles; TS, tertiary structure; YH, yeast two hybrid). (B) RiceNet includes many linkages beyond those found by simple orthology from the Arabidopsis gene network AraNet (18), as shown by a Venn diagram of the gene linkages. RiceNet covers many more rice genes (18,377 versus 12,225 genes) than the AraNet-derived network with the same number of links. Measurement of linkage accuracy show that links supported by both networks are more accurate (true positive rate; TP = 22.1%) than those by only one network. The higher accuracy of RiceNet-only links (TP = 15.5%) than that of AraNet-transferred links (TP = 6.5%) indicates the value of optimizing the functional network for rice. Linkage accuracy was measured by using a completely independent set of reference linkages derived from the KEGG biological pathway database only. (C) Pathway predictability is generally high, as measured by the areas under cross-validated ROC curves for correctly prioritizing known genes for 834 GO-BP annotations (with more than three associated genes) using network guilt-by-association via Gaussian smoothing (17). In bar-and-whiskers plots, the central horizontal line in the box indicates the median AUC, the boundaries of the box indicate the first and third quartiles, whiskers indicate the 10th and 90th percentiles of the AUC distribution, and plus signs indicate outliers. RiceNet is superior to the AraNet-derived network and significantly outperforms randomized tests. (D) RiceNet links genes with similar cell-type specific expression patterns. Genes connected by RiceNet were significantly more coexpressed (by nearly threefold) across 40 individual rice cell types (35) than were genes linked in randomized networks (repeating the calculation for 100 randomized networks and plotting the distribution of the 100 resulting odds ratios), and somewhat more than for linkages transferred by orthology from AraNet (P value < 1 × 10−17; Wilcoxon signed rank sum test).
RiceNet Is More Accurate and Extensive than a Network Generated by Orthology from Arabidopsis.
It is an open question how well gene networks derived from better-characterized dicots such as Arabidopsis might faithfully reconstruct pathways and systems in a monocot. For example, an alternate approach to constructing a rice gene network might be simply to transfer linkages from orthologous gene pairs from the existing Arabidopsis gene network, AraNet (18). This approach does not require modeling using rice annotations or any of the rice-derived experimental data. To assess the accuracy of such a network, we first defined an AraNet-derived network with the same number of functional links as those of a RiceNet. The AraNet-derived network covers 12,225 rice genes with 588,221 links, whereas RiceNet covers 18,377 genes (6,122 more genes) with the same number of links (Fig. 1B). We tested the accuracy of the AraNet-derived network versus RiceNet using linkages from the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (32), which is based on manual curation. KEGG is thus considered generally accurate and largely independent from both RiceNet and AraNet, and shares only 2.2% of the 662,936 GO-BP–derived linkages used to guide the integration of RiceNet (SI Appendix). To err on the conservative side, we excluded KEGG linkages shared with GO-BP so as to obtain a set of 89,140 KEGG linkages that were fully independent from the linkages used to guide RiceNet construction. As expected, gene links supported by both networks are more accurate than network links predicted by only one approach. However, RiceNet-specific linkages are ≈2.5 times more accurate than those derived solely by orthology from AraNet (15.5% versus 6.5% true positive rate; Fig. 1B). In terms of genome coverage, RiceNet covers 4,839 additional genes (≈12% of rice genome). Thus, reconstructing a gene network specifically for rice genes improves both accuracy and coverage of the network.
RiceNet Reflects Well-Defined Biological Pathways and Processes.
We assessed the quality of RiceNet for modeling biological processes by several additional computational analyses. First, we used topological analysis to assess whether the RiceNet contains modular structures consistent with well-defined biological pathways and processes. We found that RiceNet shows a 100-fold higher clustering coefficient (33) than a randomized network (SI Appendix, Fig. S2A). The observed higher extent of clustering is an expected characteristic of functional modules. Similarly, we observed very nonrandom path lengths connecting gene pairs in RiceNet (SI Appendix, Fig. S2B), indicating tightly interconnected regional structures (representing functional modules) separated by longer chains of functional associations. Both topological properties suggest RiceNet is organized into gene modules separated in the network.
Second, we determined whether the RiceNet-predicted gene modules reflect known biological pathways in rice. In this “guilt-by-association” approach, we prioritized candidate genes for each biological process based on network connections to known genes in those processes, assessing predictive accuracy by using cross-validation and receiver operating characteristic (ROC) analysis. Our previous study demonstrated that superior prediction performance can be achieved by using methods that consider not only direct network neighbors but also indirect ones (34). Therefore, we prioritized candidate genes by using both direct and indirect network neighbors via Gaussian smoothing (17). Prediction power can be summarized from a ROC analysis as the area under the ROC curve (AUC), which ranges from near 0.5 for random expectation to 1 for perfect predictions. For a total of 834 GO-BPs with more than three annotated genes, 642 processes (77% of total tested GO-BPs) showed AUC > 0.7, which indicates that RiceNet is highly predictive of gene function (Fig. 1C), far in excess of chance expectation [e.g., randomized gene sets of the same sizes show a median AUC of 0.5 and only 25 terms (3%) show AUC > 0.7]. These results strongly suggest that RiceNet is predictive for diverse types of biological pathways in rice. Moreover, RiceNet proved far more predictive of rice gene function than AraNet, as tested for GO-BP terms (P < 1 × 10−65; Wilcoxon signed rank sum test).
Third, we examined cell type-specific mRNA expression of rice genes. Rice, like other multicellular organisms, is composed of many distinct cell types. Each cell type expresses a unique set of genes that together produce the characteristics of that cell type. In contrast, RiceNet is composed of just one integrated network spanning a large fraction of the set of rice genes. Although cell-type specificity is not explicitly modeled in RiceNet, we tested whether it nonetheless can be used to model cell-type specific functions. Assuming that many biological processes are carried out through functionally specialized cell types, functional associations are expected to be enriched among genes expressed in the same cell-types. We measured the likelihood of genes connected in RiceNet or the AraNet-derived network to share cell-type specific expression (SI Appendix). For this analysis, we used the rice transcriptome atlas database, which profiles transcript expression across 40 rice cell types (35). We found that RiceNet links genes expressed in the same cell-types at nearly three times the rate of a randomized network (Fig. 1D). AraNet-derived links show somewhat lower enrichment for cell-type specificity (P < 1 × 10−17; Wilcoxon signed rank sum test), indicating that optimizing the network for rice contributed to cell-type specificity, and may help explain the higher accuracy of RiceNet (Fig. 1B).
Two-Step Network-Guided Discovery of ROX1, ROX2, and ROX3, Three Regulators of XA21-Mediated Immunity.
Our pathway analysis described above demonstrates that genes for similar biological processes can be successfully associated in RiceNet. We next specifically tested the feasibility of identifying previously unknown genes governing biotic stress response pathways by using RiceNet. We reported the construction of a stress response interactome consisting of 100 proteins; 46 of these proteins are predicted to be involved in the biotic stress response (28). Fifteen of these interactome components are of particular interest because they have been confirmed to play a key role in the rice defense response by using loss-of-function and gain-of-function analyses (28). We therefore used these 15 genes to query RiceNet in an attempt to identify novel candidate genes governing XA21-mediated immunity (Fig. 2 and SI Appendix, Table S3). Note that Xa21 itself is not present in RiceNet, because it is not present in the Oryza sativa genome (it was isolated from O. longistaminata) (36). For this reason, Xa21 was not used as part of the query set.
Fig. 2.
Extensive interactions among RiceNet-predicted proteins with other proteins predicted to be involved in XA21-mediated immunity. Thirteen of the 14 candidate genes predicted by RiceNet (SI Appendix, Table S3) to function in XA21-mediated immunity are indicated by yellow circles. Pairwise protein–protein interactions between these 13 proteins and 24 components (green nodes) of the previously reported biotic stress response subinteractome (28) are designated by orange edges. The 14th RiceNet-predicted protein is not shown because it did not bind with any of the 24 tested interactome members using yeast two-hybrid assays as indicated in the text. Other members of the interactome (white nodes) and their connections (black edges) are included for context. Diamonds indicate the 15 proteins validated phenotypically, using loss-of-function and gain-of-function analyses, to modulate rice defense responses. These 15 proteins served as the RiceNet query as described in the text. All constructs carry full-length cDNAs except for the Xa21 construct as previously described (36).
When the 15 genes with validated phenotypes were used to query RiceNet, we identified 802 rice genes with a predicted function in XA21-mediated immunity. To further prioritize within this set, we selected only genes connected to ≥1 query gene by LLS ≥2.0, connected to ≥2 query genes by LLS ≥1.5, or that had not been demonstrated to confer a function in plant immune responses based on literature searches (SI Appendix, Table S4). These criteria narrowed the list to 14 genes. We then experimentally assayed the interactions of the proteins encoded by these genes with an available set of 24 proteins from the biotic stress response subinteractome, including XA21 itself (28) (SI Appendix, Table S3 and SI Appendix, Table S5). Thirteen of the 14 candidate genes interacted by yeast two hybrid assays with at least one component of the XA21 interactome (Fig. 2), confirming the value of this prioritization strategy.
We selected five of these candidates for characterization in planta based either on their direct interactions with XA21 (LOC_Os01g70580, LOC_Os01g70790, LOC_Os02g21510, and LOC_Os03g20460) or on their interactions with proteins harboring sequence motifs related to the inflammatory response in animals (LOC_Os06g12530) (SI Appendix, Table S4). We generated overexpression (ox) and RNAi constructs for each gene (except for LOC_Os06g12530 for which only an overexpression construct was generated) and introduced these constructs into a homozygous Kitaake-XA21 rice line using a hygromycin selectable marker. We then assayed for resistance to the bacterial pathogen Xanthomonas oryzae pv. oryzae (Xoo), measuring lengths of water-soaked lesions 14–21 d after inoculation.
We did not observe any obvious differences between the control Kitaake-XA21 and transgenic lines either overexpressing or silenced for LOC_Os01g70790 or LOC_Os03g20460 (SI Appendix, Table S6). However, transgenic plants silenced for two of the genes (LOC_Os01g70580 RNAi, LOC_Os02g21510 RNAi) showed clear enhanced susceptibility to Xoo as compared with the Kitaake-XA21 control. Transgenic plants overexpressing LOC_Os06g12530 showed enhanced susceptibility to Xoo compared with the Kitaake-XA21 control (SI Appendix, Fig. S3 and Fig. 3). These genes were designated Rox1 (Regulator of XA21-mediated immunity 1), Rox2, and Rox3, respectively. The phenotypes are heritable for two generations, the RNA levels correlate with the overexpression or knockdown, and the transgene cosegregates with the altered phenotypes in progeny analyses (SI Appendix). Based on the phenotypes, Rox1 and Rox2 are positive regulators (silencing leads to enhanced susceptibility) and Rox3 is a negative regulator (overexpression leads to enhanced susceptibility). Thus, we were able to validate three of five RiceNet predictions in planta (60% success rate).
Fig. 3.
Three new regulators of XA21 (ROX)-mediated immunity predicted by RiceNet are validated by Xoo infection assays of transgenic rice plants. The response to Xoo was assayed in infected leaves of 5-wk-old rice plants. A symbol (-) indicates lack of the transgene, and a symbol (+) indicates the presence of the transgene. (A) LOC_Os01g70580 (ROX1) is a positive regulator of XA21-mediated immunity. Kitaake-XA21 (Kit-XA21) plants with RNAi-mediated suppression of LOC_Os01g70580 (Rox1) were generated and assayed for resistance. Lesion lengths were measured in leaves of 5-wk-old rice plants from T2 progeny of the XA21- LOC_Os01g70580 (ROX1) RNAi 2-2, 2-7, 2-10, and 2-11 lines 14 d after Xoo inoculation. (B) LOC_Os02g21510 (ROX2) is a second positive regulator of XA21-mediated immunity. Kitaake-XA21 plants with RNAi-mediated suppression of LOC_Os02g21510 (Rox2) were generated and lesion lengths were measured as above in leaves of T1 progeny from XA21-LOC_Os02g21510 (Rox2) RNAi 3 and 4 lines and T2 progeny from the XA21-LOC_Os02g21510 (Rox2) RNAi 3-1 line. (C) LOC_Os06g12530 (Rox3) is a negative regulator of XA21-mediated immunity. Kitaake-XA21 plants overexpressing (ox) LOC_Os06g12530 (Rox3) were generated and response to Xoo was measured as above by using leaves from T1 progeny from the LOC_Os06g12530 (Rox3) overexpression lines, 1 and 3, and T2 progeny from the LOC_Os06g12530 (Rox3) overexpression 1-15 line. Each bar represents the average and SD from at least three tested leaves. Kitaake-XA21 and Kitaake (Kit) were used as controls. Photographs of lesion length were taken at 14 d after Xoo inoculation. Additional genetic tests and expression quantification of candidate genes in transgenic lines are described in SI Appendix and SI Appendix, Fig. S3.
Sequence homology sheds some light on more specific roles for these three ROX proteins. First, the positive regulator Rox1 is annotated as a thiamine pyrophosphokinase (TPK). TPK catalyzes the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamine) to form the coenzyme thiamine pyrophosphate (TPP). It has been reported that treatment of thiamine and TPP induces resistance to rice pathogens including Xoo (37). These results further support a role for Rox1 in the rice defense responses.
The positive regulator Rox2 is a member of the NOL1/NOL2/sun gene family. Human homologs have been implicated in Williams-Beuren syndrome, a developmental disorder associated with haploinsufficiency of multiple genes at 7q11.23 (38). NOL1/NOL2/sun gene family members have not been shown to function in innate immunity.
Finally, the negative regulator Rox3 is annotated as a nuclear migration protein, nudC. nudC plays a key role in cell division through the regulation of cytoplasmic dynein and also in the regulation of the inflammatory response (39). These results indicate that sequence homology alone would not have suggested roles in immunity, emphasizing the value of RiceNet for identifying new genes relevant to the biotic response in the absence of strong a priori knowledge.
RiceNet Predictability Extends to Another Monocotyledonous Crop Species, Maize.
Given that RiceNet efficiently predicts gene function in rice, we asked whether this predictability extends to another monocotyledonous crop species, maize (Zea mays). Because RiceNet is more accurate for rice gene function than a network generated by orthology from Arabidopsis (Fig. 1B), we also hypothesized that RiceNet would be more efficient than AraNet at gene prediction in another monocotyledonous species. We therefore compared the predictive power of an AraNet-derived maize gene network (AT-ZM), a RiceNet-derived maize gene network (OS-ZM), and maize GO-BP annotation by AgBase (40) (SI Appendix).
RiceNet appears predictive for many maize gene pathways. We identified 32 GO-BP terms that have three or more annotated maize genes based on experimental or literature evidence and used these 32 GO-BP terms with highly reliable annotations for the analysis. Using cross-validated ROC analysis as in ref. 41, we tested the two networks (OS-ZM and AT-ZM), for correctly prioritizing genes in these GO-BPs, using the Gaussian smoothing guilt-by-association method (17). Both networks showed good predictability for maize GO-BP terms (Fig. 4 and SI Appendix, Table S7). For 30 terms predictable (AUC > 0.5) by either network, OS-ZM performed significantly better than AT-ZM (P value = 1.42 × 10−2; Wilcoxon signed rank sum test). This predictability was independent of the number of associated genes (SI Appendix, Fig. S4). Noticeably, many abiotic stress responses in maize are highly predictable with OS-ZM. Thus, RiceNet is useful for studies of gene-trait associations in both rice and maize, and this work suggests that RiceNet will also be useful for predicting gene function in other important monocotyledonous species such as wheat and switchgrass, for which species-specific networks have not yet been constructed.
Fig. 4.
RiceNet predicts maize biological processes. Thirty-two maize GO-BP gene sets, each with three or more associated genes identified through noncomputational approaches, were tested for predictability by the AraNet-derived and RiceNet-derived maize gene networks, indicated by AT-ZM and OS-ZM, respectively. The predictability of each process was measured by using cross-validation as the area under a ROC curve (AUC) with Gaussian smoothing methods (17). A total of 30 maize GO-BP terms are predictable (AUC > 0.5) by either AT-ZM or OS-ZM. For 20 of the 30, OS-ZM outperformed AT-ZM. The distribution of AUC scores by OS-ZM was significantly higher than that by AT-ZM (P value = 1.42 × 10−2; Wilcoxon signed rank sum test).
Discussion
Here, we report construction of RiceNet, a genome-scale gene network for rice. We demonstrate its predictive power for diverse biological processes and its usefulness in identifying genes governing rice innate immunity. RiceNet represents an experimentally validated genome-scale gene network for a crop species. Using RiceNet, we systematically selected candidate genes predicted to be involved in the rice defense response by using a two-step network-guided prediction approach—first by guilt-by-association, and then by protein–protein interaction tests. From rough computational estimates, we expect RiceNet to often offer correct candidate genes when the predictive AUC is high (e.g., >0.75), with the number of correct candidates increasing as AUC increases (SI Appendix, Fig. S5). The additional prioritization offered by protein interaction screening of candidates appeared to elevate the validation rate even further, and in our transgenic tests of five of the candidate genes selected by using this two-step strategy, we confirmed three genes as regulators of XA21-mediated immunity. Given the large genomes of most crop species (generally 30,000–50,000 genes) and their long reproductive cycles (often several months), this two-step network-guided prioritization should facilitate identification of key genes controlling important traits and future engineering of agronomically useful varieties.
Several other features of RiceNet are particularly notable, including that a network optimized for rice outperformed an alternative rice gene network constructed by orthology from the dicot Arabidopsis (18) (Fig. 1 B and D). We expect that other crop species will have similar requirements, benefiting from species-specific datasets and optimization. Given the impressive advances in genomic and proteomic technology for food crop species (42, 43) and candidate bioenergy crops (44) and other models (45), a sharp rise in available datasets appears likely for these species over the next few years. For rice, additional rice-specific experimental data including protein–protein interactions and gene expression data spanning new experimental conditions appear most valuable for improving the accuracy and coverage of RiceNet into the future.
We have shown that RiceNet is useful for predicting gene function in maize, another monocotyledonous crop species. The high predictability for maize genes by a RiceNet-derived maize gene network (OS-ZM) and its superior performance compared with an AraNet-derived maize gene network (AT-ZM) indicates that monocot-specific modeling improves gene prediction for other monocotyledonous crop species. This finding is particularly important in light of the lack of species-specific gene networks for maize and other monocotyledonous crop species.
Economically important crop traits range from simple traits emerging from strong selective pressure during domestication to highly genetically complex traits (46). Manipulation of only a few key genes related to the traits may not generate the desired phenotypes. Therefore, approaches for broadly defining relevant genes, coupled with rapid and inexpensive interaction assays to more finely prioritize genes, offer an attractive and potentially rapid route for focusing crop engineering efforts on the small sets of genes that are deemed most likely to affect the traits of interest.
Methods
Reference and benchmark sets, raw datasets, and computational methods of construction and analysis of RiceNet are described in full in SI Appendix.
Construction of MaizeNet by orthology-based links from AraNet and RiceNet are described in full in SI Appendix.
Genetic analysis of ROX1, ROX2, and ROX3 transgenic plants are described in full in SI Appendix.
A user-interactive web tool for RiceNet-based selection of candidate genes is publicly available at http://www.functionalnet.org/ricenet.
Supplementary Material
Acknowledgments
This work was supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, through Contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy. This work was also supported by the National Research Foundation of Korea funded by the Korean government Ministry of Education, Science, and Technology Grants 2010-0017649 and 2010-0001818 and POSCO TJ Park Science fellowship (to I.L.); the National Science Foundation, National Institutes of Health (NIH), Welch Foundation Grant F1515, and Packard Foundation (to E.M.M.); and NIH Grant GM 55962 (to P.C.R.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental.
References
- 1.Paterson AH, Bowers JE, Peterson DG, Estill JC, Chapman BA. Structure and evolution of cereal genomes. Curr Opin Genet Dev. 2003;13:644–650. doi: 10.1016/j.gde.2003.10.002. [DOI] [PubMed] [Google Scholar]
- 2.Hiei Y, Ohta S, Komari T, Kumashiro T. Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. Plant J. 1994;6:271–282. doi: 10.1046/j.1365-313x.1994.6020271.x. [DOI] [PubMed] [Google Scholar]
- 3.Jung KH, An G, Ronald PC. Towards a better bowl of rice: Assigning function to tens of thousands of rice genes. Nat Rev Genet. 2008;9:91–101. doi: 10.1038/nrg2286. [DOI] [PubMed] [Google Scholar]
- 4.Matsumoto T, et al. International Rice Genome Sequencing Project The map-based sequence of the rice genome. Nature. 2005;436:793–800. doi: 10.1038/nature03895. [DOI] [PubMed] [Google Scholar]
- 5.Gale MD, Devos KM. Comparative genetics in the grasses. Proc Natl Acad Sci USA. 1998;95:1971–1974. doi: 10.1073/pnas.95.5.1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Goff SA. Rice as a model for cereal genomics. Curr Opin Plant Biol. 1999;2:86–89. doi: 10.1016/S1369-5266(99)80018-1. [DOI] [PubMed] [Google Scholar]
- 7.Shimamoto K, Kyozuka J. Rice as a model for comparative genomics of plants. Annu Rev Plant Biol. 2002;53:399–419. doi: 10.1146/annurev.arplant.53.092401.134447. [DOI] [PubMed] [Google Scholar]
- 8.Lee I. Probabilistic functional gene societies. Prog Biophys Mol Biol. 2011;106:435–442. doi: 10.1016/j.pbiomolbio.2011.01.003. [DOI] [PubMed] [Google Scholar]
- 9.Lee I, Date SV, Adai AT, Marcotte EM. A probabilistic functional network of yeast genes. Science. 2004;306:1555–1558. doi: 10.1126/science.1099511. [DOI] [PubMed] [Google Scholar]
- 10.Li Z, et al. Rational extension of the ribosome biogenesis pathway using network-guided genetics. PLoS Biol. 2009;7:e1000213. doi: 10.1371/journal.pbio.1000213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee I, et al. A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nat Genet. 2008;40:181–188. doi: 10.1038/ng.2007.70. [DOI] [PubMed] [Google Scholar]
- 12.Kim WK, Krumpelman C, Marcotte EM. Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy. Genome Biol. 2008;9(Suppl 1):S5. doi: 10.1186/gb-2008-9-s1-s5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Peña-Castillo L, et al. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 2008;9(Suppl 1):S2. doi: 10.1186/gb-2008-9-s1-s2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Huttenhower C, et al. Exploring the human genome with functional maps. Genome Res. 2009;19:1093–1106. doi: 10.1101/gr.082214.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21:1109–1121. doi: 10.1101/gr.118992.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C. Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol. 2009;10:R91. doi: 10.1186/gb-2009-10-9-r91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q. GeneMANIA: A real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9(Suppl 1):S4. doi: 10.1186/gb-2008-9-s1-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY. Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana. Nat Biotechnol. 2010;28:149–156. doi: 10.1038/nbt.1603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Flavell R. Role of model plant species. Methods Mol Biol. 2009;513:1–18. doi: 10.1007/978-1-59745-427-8_1. [DOI] [PubMed] [Google Scholar]
- 20.Chern M, Canlas PE, Fitzgerald HA, Ronald PC. Rice NRR, a negative regulator of disease resistance, interacts with Arabidopsis NPR1 and rice NH1. Plant J. 2005;43:623–635. doi: 10.1111/j.1365-313X.2005.02485.x. [DOI] [PubMed] [Google Scholar]
- 21.Chern M, Fitzgerald HA, Canlas PE, Navarre DA, Ronald PC. Overexpression of a rice NPR1 homolog leads to constitutive activation of defense response and hypersensitivity to light. Mol Plant Microbe Interact. 2005;18:511–520. doi: 10.1094/MPMI-18-0511. [DOI] [PubMed] [Google Scholar]
- 22.Devos KM, Beales J, Nagamura Y, Sasaki T. Arabidopsis-rice: Will colinearity allow gene prediction across the eudicot-monocot divide? Genome Res. 1999;9:825–829. doi: 10.1101/gr.9.9.825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Fitzgerald HA, Canlas PE, Chern MS, Ronald PC. Alteration of TGA factor activity in rice results in enhanced tolerance to Xanthomonas oryzae pv. oryzae. Plant J. 2005;43:335–347. doi: 10.1111/j.1365-313X.2005.02457.x. [DOI] [PubMed] [Google Scholar]
- 24.Fitzgerald HA, Chern MS, Navarre R, Ronald PC. Overexpression of (At)NPR1 in rice leads to a BTH- and environment-induced lesion-mimic/cell death phenotype. Mol Plant Microbe Interact. 2004;17:140–151. doi: 10.1094/MPMI.2004.17.2.140. [DOI] [PubMed] [Google Scholar]
- 25.Xu K, et al. Sub1A is an ethylene-response-factor-like gene that confers submergence tolerance to rice. Nature. 2006;442:705–708. doi: 10.1038/nature04920. [DOI] [PubMed] [Google Scholar]
- 26.Swarbreck D, et al. The Arabidopsis Information Resource (TAIR): Gene structure and function annotation. Nucleic Acids Res. 2008;36(Database issue):D1009–D1014. doi: 10.1093/nar/gkm965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ouyang S, et al. The TIGR Rice Genome Annotation Resource: Improvements and new features. Nucleic Acids Res. 2007;35(Database issue):D883–D887. doi: 10.1093/nar/gkl976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Seo YS, et al. Towards establishment of a rice stress response interactome. PLoS Genet. 2011;7:e1002020. doi: 10.1371/journal.pgen.1002020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ronald PC, Beutler B. Plant and animal sensors of conserved microbial signatures. Science. 2010;330:1061–1064. doi: 10.1126/science.1189468. [DOI] [PubMed] [Google Scholar]
- 30.Ficklin SP, Luo F, Feltus FA. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks. Plant Physiol. 2010;154:13–24. doi: 10.1104/pp.110.159459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001;314:1041–1052. doi: 10.1006/jmbi.2000.5197. [DOI] [PubMed] [Google Scholar]
- 32.Kanehisa M, Goto S, Kawashima S, Nakaya A. The KEGG databases at GenomeNet. Nucleic Acids Res. 2002;30:42–46. doi: 10.1093/nar/30.1.42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]
- 34.Wang PI, Marcotte EM. It's the machine that matters: Predicting gene function and phenotype from protein networks. J Proteomics. 2010;73:2277–2289. doi: 10.1016/j.jprot.2010.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jiao Y, et al. A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies. Nat Genet. 2009;41:258–263. doi: 10.1038/ng.282. [DOI] [PubMed] [Google Scholar]
- 36.Song WY, et al. A receptor kinase-like protein encoded by the rice disease resistance gene, Xa21. Science. 1995;270:1804–1806. doi: 10.1126/science.270.5243.1804. [DOI] [PubMed] [Google Scholar]
- 37.Ahn IP, Kim S, Lee YH. Vitamin B1 functions as an activator of plant disease resistance. Plant Physiol. 2005;138:1505–1515. doi: 10.1104/pp.104.058693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Merla G, Ucla C, Guipponi M, Reymond A. Identification of additional transcripts in the Williams-Beuren syndrome critical region. Hum Genet. 2002;110:429–438. doi: 10.1007/s00439-002-0710-x. [DOI] [PubMed] [Google Scholar]
- 39.Riera J, Lazo PS. The mammalian NudC-like genes: A family with functions other than regulating nuclear distribution. Cell Mol Life Sci. 2009;66:2383–2390. doi: 10.1007/s00018-009-0025-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.McCarthy FM, et al. AgBase: Supporting functional modeling in agricultural organisms. Nucleic Acids Res. 2011;39(Database issue):D497–D506. doi: 10.1093/nar/gkq1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lehner B, Lee I. Network-guided genetic screening: Building, testing and using gene networks to predict gene function. Brief Funct Genomics Proteomics. 2008;7:217–227. doi: 10.1093/bfgp/eln020. [DOI] [PubMed] [Google Scholar]
- 42.Schnable PS, et al. The B73 maize genome: Complexity, diversity, and dynamics. Science. 2009;326:1112–1115. doi: 10.1126/science.1178534. [DOI] [PubMed] [Google Scholar]
- 43.Schmutz J, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–183. doi: 10.1038/nature08670. [DOI] [PubMed] [Google Scholar]
- 44.Yuan JS, Tiller KH, Al-Ahmad H, Stewart NR, Stewart CN., Jr Plants to power: Bioenergy to fuel the future. Trends Plant Sci. 2008;13:421–429. doi: 10.1016/j.tplants.2008.06.001. [DOI] [PubMed] [Google Scholar]
- 45.Vogel JP, et al. International Brachypodium Initiative Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463:763–768. doi: 10.1038/nature08747. [DOI] [PubMed] [Google Scholar]
- 46.Holland JB. Genetic architecture of complex traits in plants. Curr Opin Plant Biol. 2007;10:156–161. doi: 10.1016/j.pbi.2007.01.003. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




