Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2018 Mar 12;177(1):422–433. doi: 10.1104/pp.18.00144

Predicted Arabidopsis Interactome Resource and Gene Set Linkage Analysis: A Transcriptomic Analysis Resource1,[OPEN]

Heng Yao a,b, Xiaoxuan Wang a, Pengcheng Chen a, Ling Hai a, Kang Jin a, Lixia Yao c, Chuanzao Mao b,2, Xin Chen a,b,2
PMCID: PMC5933134  PMID: 29530937

The Predicted Arabidopsis Interactome Resource (PAIR) is a database of high-confidence functional associations between Arabidopsis genes that integrates functional association evidence and enables a network-driven approach for interpreting transcriptomic changes that may functionally impact biological processes.

Abstract

An advanced functional understanding of omics data is important for elucidating the design logic of physiological processes in plants and effectively controlling desired traits in plants. We present the latest versions of the Predicted Arabidopsis Interactome Resource (PAIR) and of the gene set linkage analysis (GSLA) tool, which enable the interpretation of an observed transcriptomic change (differentially expressed genes [DEGs]) in Arabidopsis (Arabidopsis thaliana) with respect to its functional impact for biological processes. PAIR version 5.0 integrates functional association data between genes in multiple forms and infers 335,301 putative functional interactions. GSLA relies on this high-confidence inferred functional association network to expand our perception of the functional impacts of an observed transcriptomic change. GSLA then interprets the biological significance of the observed DEGs using established biological concepts (annotation terms), describing not only the DEGs themselves but also their potential functional impacts. This unique analytical capability can help researchers gain deeper insights into their experimental results and highlight prospective directions for further investigation. We demonstrate the utility of GSLA with two case studies in which GSLA uncovered how molecular events may have caused physiological changes through their collective functional influence on biological processes. Furthermore, we showed that typical annotation-enrichment tools were unable to produce similar insights to PAIR/GSLA. The PAIR version 5.0-inferred interactome and GSLA Web tool both can be accessed at http://public.synergylab.cn/pair/.


Today, omics approaches are broadly and regularly used in plant research. The variety, complexity, and quantity of omics data have increased rapidly, bringing both new opportunities and new analytical challenges (Berger et al., 2013; Gomez-Cabrero et al., 2014; Saha et al., 2014). One advantage of omics profiles is that they comprehensively describe the physiological status of samples at the molecular level. However, an important question is whether we can elucidate the underlying design logic of physiological processes in plants from these molecular-level descriptions to obtain effective control over desired plant traits.

Existing approaches to obtain high-level biological sense from omics data mostly rely on enrichment analysis. These approaches evaluate whether the changed genes are enriched or clustered in certain biological concepts (e.g. Gene Ontology [GO] terms). This strategy summarizes the observed differentially expressed genes (DEGs) in terms of established biological concepts. In many cases, a high-level overview of the DEGs can be produced to explain why such changes happen. However, in practical use, enrichment approaches frequently report that no annotation term is enriched or that only conceptually general terms (such as GO:0007165, signal transduction) are found. Such results provide little value in elucidating the design logic of physiological processes in plants and formulating further hypotheses to achieve effective control over desired traits. One explanation for this lack of analytical power could be that innovative research tends to explore previously uncharted areas of life mechanisms, where no established concepts accurately describe the observed DEGs properly. Consequently, enrichment approaches are unable to find suitable concepts to summarize these changes. On the other hand, while no established concepts accurately describe the observed DEGs, we still may use established concepts to accurately describe their functional impacts. For example, observed changes may lead collectively to GO:0043067 (regulation of programmed cell death [PCD]) and GO:0008020 (G-protein-coupled photoreceptor activity), even when the DEGs themselves are not enriched in these terms (for details, see “Discussion”). One way to implement this analysis strategy is to use a functional association network of genes to expand our perception of the potential functional impacts resulting from the DEGs. If the DEGs have frequent functional associations with genes of a biological function, the observed DEGs will be expected to interfere with this biological function. We call this strategy gene set linkage analysis (GSLA).

To use GSLA, a functional gene association network is required. Multiple online resources have been established to provide information on physical molecular interactions and functional gene associations. These resources have significantly accelerated research in plant systems biology. Among them, Intact (Hermjakob et al., 2004), BioGRID (Stark et al., 2006), BIND (Bader et al., 2003), ATTED-II (Obayashi et al., 2018), and PlaNet and TAIR (Lamesch et al., 2012) focused on manually curated experimentally reported interactions, while the works of Geisler-Lee et al. (2007) and De Bodt et al. (2009) as well as AtPID (Cui et al., 2008), STRING (Szklarczyk et al., 2015), AraNet (Lee et al., 2010a), GeneMANIA (Warde-Farley et al., 2010), and AMFSD (Heyndrickx and Vandepoele, 2012) focused on inferred molecular interactions. However, these resources typically have a poorly managed balance between the coverage and accuracy against the Arabidopsis (Arabidopsis thaliana) interactome. The more accurate resources covered very low fractions of the true interactome, while the more inclusive resources contained high proportions of false-positive interactions (Rao et al., 2014; Rhee and Mutwil, 2014). In 2009, we introduced the Predicted Arabidopsis Interactome Resource (PAIR) version 3.0 (v3.0), featuring a functional gene association network with balanced sensitivity and specificity. PAIR integrates several types of functional association evidence to infer high-confidence functional gene associations. PAIR v3.0 was later shown capable of supporting both the identification of regulatory genes from insignificant gene expression changes and the prediction of protein functions (Lin et al., 2009, 2011; Lee et al., 2010b).

Since the first publication of PAIR in 2009, a significant amount of functional association data have been accumulated, allowing more accurate and comprehensive inference of the functional gene associations in Arabidopsis and providing an ideal basis for GSLA to interpret the collective functional impacts of an observed transcriptome change. In this work, we present PAIR v5.0, which integrates six types of evidence of functional gene associations. The functional gene associations in PAIR v5.0 were inferred from public data available before the year 2015 and were evaluated against newly published data afterward. PAIR v5.0 includes 335,301 functional gene associations, which are expected to represent ?27% of all Arabidopsis protein-protein interactions, with ?38% of the PAIR v5.0 gene associations being protein-protein interactions. Compared with PAIR v3.0, the number of functional gene associations has been increased from 145,494 to 335,301, and the evidence data used to infer these associations have been updated to their latest version and more rigorously curated. The current version of PAIR (v5.0) is a functional gene association network composed of strong functional associations with more balanced coverage and accuracy, which complements the existing horizon of functional association networks and seamlessly supports GSLA. The GSLA Web tool based on PAIR v5.0 is accessible through the newly developed PAIR Web site: http://public.synergylab.cn/pair/.

Below, we describe the preparation and evaluation of PAIR v5.0 and two case studies demonstrating the use of the PAIR/GSLA system to interpret observed DEGs in terms of their functional impacts at the biological process level. In these case studies, this unique analytical capability facilitated the discovery of the potential phenotype of a mutation and important regulatory genes that show insignificant expression changes as well as suggested distinctions in high-level mechanisms leading to the same phenotype.

RESULTS

Inference of Functional Associations between Protein-Coding Genes

We used the support vector machine (SVM [the SVM kernel used in this work is the radial basis function kernel]; Winters-Hilt et al., 2006; Chang and Lin, 2013) to infer significant functional associations from six types of evidence. A diagram of the inference workflow is shown in Figure 1. Details of the procedures are provided in Supplemental Materials S1.

Figure 1.

Figure 1.

Preparation of the training data set for the inference of functional associations between Arabidopsis genes. High-confidence physical protein-protein interactions were collected from four databases as positive examples. Evidence for six types of functional associations was collected from 12 public databases. Thirty features were computed from these data and evaluated for their ability to discriminate protein interactions from random gene pairs. Sixteen high-quality features with area under the curve (AUC) > 0.6 were used to represent each gene pair. Random gene pairs other than positive examples were used as negative examples. The number of negative examples was 100 times the number of positive examples.

After the evaluation of many types of functional association data, six types of data were selected for the inference of functional associations between genes. These choices were based on the abundance of these data and their unbiased coverage of all genes. The selected types of functional association data included gene coexpression, shared functional annotation, domain interaction, gene colocalization, phylogenetic profile, and homologous interactions in other species (interologs). We retrieved 22,240 expression profiles, 173,378 annotations, 1,791,283 domain interactions, 517,560 subcellular gene localizations, 16,233 phylogenetic profiles, and 17,907 homologous interactions in other species from 12 public databases. All data for the inference of functional associations between genes were retrieved before December 22, 2014, which allowed unbiased evaluation of the accuracy of the inferred interactome using data newly published after this date.

From these six types of functional association evidence, 30 feature values were computed to describe the functional associations between genes. The discriminatory power of these features to separate known protein-protein interactions from random gene pairs was evaluated by the AUC in a receiver operating characteristic (ROC) test. Among them, 16 features showed AUCs over 0.6, indicating satisfactory discrimination of known protein-protein interactions (Supplemental Fig. S1; for details, see Supplemental Materials S1, section 1.2). These 16 features were used for the subsequent SVM inference of functional associations between genes.

Protein-protein interactions are one type of strong functional association. Therefore, we used protein interactions as positive examples to train an SVM classifier to infer strong functional associations with strengths that are similar to those of protein interactions. Because we used protein interactions as positive examples, this approach appears to produce a prediction of protein interactions. However, because all interactions are inferred from functional association evidence, false-positive interactions that do not correspond to protein interactions may still represent functional gene associations of similar strengths. The resulting interactome shall be regarded as a functional gene association network. In our previous work inferring a functional gene association network for humans, this approach has been demonstrated to produce a more homogenous functional gene association network when compared with the simple collection of protein-protein interactions (Zhou et al., 2013).

As detailed in Supplemental Materials S1, 260,013 experimentally reported physical protein interactions were collected from four open databases: Intact (Hermjakob et al., 2004), BioGRID (Stark et al., 2006), BIND (Bader et al., 2003), and TAIR (Lamesch et al., 2012). Among these interactions, we identified 6,257 high-confidence protein interactions that were reported by at least two studies or by at least two different low-throughput methods. These high-confidence protein interactions were selected as positive examples to train the SVM classifier. We used randomly generated gene pairs (positive samples excluded) as negative examples. The final SVM classifier showed 25.79% ± 2.18% sensitivity and 99.95% ± 0.0099% specificity in a 5-fold cross validation. A total of 329,044 gene pairs were classified as functionally associated by this classifier. According to the size of this inferred association network and its coverage on newly reported protein interactions, we estimated the size of the Arabidopsis protein interactome to be 5.088 × 105 interactions (for details, see Supplemental Materials S1, section 2.3). This protein interactome size corresponds to one protein interaction out of 739 random gene pairs, which is close to the protein interaction probability observed in yeast, one per 775 (Yu et al., 2008). This inferred Arabidopsis functional gene association network was supplemented by the high-confidence protein interactions we used to train the SVM model, which resulted in 335,301 functional gene associations (PAIR v5.0). This data set is expected to represent 27.02% of the Arabidopsis protein interactome, with 38% of its functional gene associations representing protein interactions (for details, see Supplemental Materials S1, section 2.2).

Evaluation of Inferred Functional Gene Associations

To assess the quality of PAIR v5.0 as a functional gene association network, we evaluated how well it connects functionally associated genes. We compared PAIR v5.0 with seven other interactome data sets to evaluate their effectiveness in gene function prediction via the guilt-by-association strategy. The seven interactomes are those reported by Geisler-Lee et al. (2007) and De Bodt et al. (2009) as well as STRING (Szklarczyk et al., 2015), AtPID (Cui et al., 2008), AraNet (Lee et al., 2010a), geneMANIA (Warde-Farley et al., 2010), and AMFSD (Heyndrickx and Vandepoele, 2012). Because PAIR v5.0 was developed based on data available before December 22, 2014, we collected 1,313 genes from the GO database (up to December 12, 2016) for which new annotations were added after December 22, 2014. These genes have a total of 19,178 annotations, 4,930 of which are newly reported. We evaluated how comprehensively these 4,930 new annotations may be predicted by the frequent annotations of their network neighbors and what proportion of the predicted annotations were consistent with the known 19,178 annotations of these genes. PANTHER (Mi et al., 2017) was used to identify the frequent annotations of a gene’s network neighbors.

Altering the significance cutoff in PANTHER may result in more or fewer annotations being identified as frequent annotations. A loose cutoff will result in the successful identification of more new annotations with more false positives, while a strict cutoff will result in the identification of fewer new annotations. We used the precision-recall curve to show the overall quality of the network in new annotation prediction, which is irrespective of the cutoff values. For each cutoff value, precision measures the proportion of the identified frequent annotations that is correct (whether they are consistent with the known 19,178 annotations), while recall measures the proportion of the 4,930 new annotations that is recovered. The higher the AUC in a precision-recall curve, the better an interactome connects functionally associated genes.

As shown in Figure 2, PAIR v5.0 has the highest AUC among all eight interactomes. Notably, the curve of PAIR v5.0 reached the high-recall region, whereas the curves of Geisler-Lee et al. (2007), De Bodt et al. (2009), and AtPID do not. Additionally, the curves of STRING/AMFSD/AraNet/geneMANIA reached the high-recall region. However, these curves always stayed in the low-precision region, which indicates that their interactomes may include large proportions of weak functional gene associations. The frequent annotations of a gene’s network neighbors, therefore, may include many weakly associated functions, which in turn lowered the prediction accuracy in the guilt-by-association gene function prediction experiment. In general, PAIR v5.0 showed a balance between coverage and accuracy.

Figure 2.

Figure 2.

Precision and recall curves for guilt-by-association prediction of gene functions using eight interactomes. Precision measures the proportion of the identified frequent annotations that is correct (whether they are consistent with all known annotations), while recall measures the proportion of the new annotations that is recovered.

GSLA

The PAIR v5.0 functional gene association network was used in GSLA to interpret the functional impacts of observed DEGs. The GSLA tool evaluates whether a set of changed genes shows more frequent functional associations with genes that belong to a biological process or function. GSLA relies on a pair of tests to determine the statistical significance of the functional associations between two gene sets (i.e. whether the DEGs may have significant functional impact on a biological process or function; Fig. 3). The first test (Q1) assumes that the inter-gene-set gene association density between functionally associated gene sets is higher than the background gene association density between random gene sets. The second test (Q2) assumes that the observed high density between functionally associated gene sets can be observed only in the biologically correct functional gene association network: that is, that the density observed in PAIR v5.0 (which is used in GSLA) is higher than the densities observed in random functional gene association networks consisting of the same genes, with each gene having the same number of neighbors. In a biological sense, Q1 tests the strength of the functional associations between two gene sets while Q2 verifies that the observed strong functional association is the result of the biologically correct network topology (i.e. our knowledge of the molecular mechanisms) rather than the result of the compositions of these two gene sets. Some genes, known as hubs, have considerably more neighbors than other genes. Gene sets that include many hubs, therefore, are more likely to connect to other gene sets. Q2 is used to remove the confounding factor of gene set composition and to ensure the biological significance of the functional associations detected between gene sets. Q1 and Q2 are related but different tests that complement each other to increase the sensitivity and specificity of GSLA.

Figure 3.

Figure 3.

Two hypotheses used by GSLA to assess the functional association between two biologically meaningful gene sets. Q1 assumes that the inter-gene-set gene association density between two functionally associated biologically meaningful gene sets is higher than the density connecting two random gene sets. Q2 assumes that the high density between functionally associated gene sets can be observed only in the biologically correct interactome and not in random interactomes.

The GSLA tool can be accessed at the PAIR v5.0 Web site (http://public.synergylab.cn/pair/). Users can upload a gene list consisting of DEGs as input to GSLA and provide an e-mail address to receive the results. The result file will show the GO terms that are expected to be functionally associated with the input gene list, together with the functional gene associations between the GO term genes and the input genes.

DISCUSSION

PAIR v5.0 Is a Functional Gene Association Network with Balanced Coverage and Accuracy

PAIR v5.0 was developed as a functional association network of Arabidopsis genes with balanced coverage and accuracy. It integrates six types of evidence data from public databases and uses an SVM model to infer highly reliable functional associations. Protein-protein interactions are one type of strong functional association. PAIR v5.0, like other functional association networks, is expected to include a significant proportion of functional gene associations that represent protein-protein interactions. Using the Arabidopsis protein interactome size estimated above and the newly reported protein-protein interactions (reported between December 12, 2014, and December 22, 2016), we estimated the proportion of new protein interactions represented in the major Arabidopsis interactomes and how the proportion of gene pairs in the major Arabidopsis interactomes may represent protein interactions.

As shown in Supplemental Table S7, PAIR v5.0 showed a reasonably high fraction of Arabidopsis protein interactions with good accuracy. For comparison, STRING is a popular interactome resource. STRING is expected to represent 98.98% of Arabidopsis protein interactions, which is 3.67 times that of PAIR (27%). Therefore, STRING is a very useful resource for reconfirming a protein interaction hypothesis. However, the size of the STRING interactome is 51 times that of PAIR (16,831,382 versus 335,301), which leads to an estimated accuracy for STRING gene pairs representing protein interactions of 2.992%. In contrast, 38% of PAIR gene pairs may represent protein interactions. Protein interactions are a form of strong functional gene association. These numbers indicate that PAIR is more enriched with strong functional gene associations; therefore, it produces less noise in the network-based guilt-by-association gene function prediction experiment (Fig. 2). In this regard, PAIR complements the existing horizon of functional association networks, providing a functional gene association network with balanced coverage and accuracy (Supplemental Table S7), which nicely supports the ability of GSLA to interpret the functional impacts of the observed DEGs.

On the other hand, it is to be noted that we used a simple cross-validation scheme to evaluate the prediction accuracy (for details, see Supplemental Materials S1, section 2). This may lead to an overestimated prediction accuracy (Park and Marcotte, 2012; Hamp and Rost, 2015). Therefore, our estimates that PAIR v5.0 represents 27% of Arabidopsis protein interactions and that 38% of PAIR gene pairs represent protein interactions may be overly optimistic. However, given the margins and as shown in Supplemental Table S7, PAIR is still a unique resource for providing a functional gene association network with balanced coverage and accuracy. This conclusion is best demonstrated in the network-based guilt-by-association gene function prediction experiment (Fig. 2) and the case studies presented below, in which other interactomes do not work with GSLA to produce similarly insightful interpretations for the observed DEGs (data not shown).

Based on PAIR v5.0 we present the GSLA tool for transcriptome data interpretation. Most available transcriptome data interpretation tools are based on enrichment or clustering analysis, which describe the observed change, instead of what functional impacts the observed change may lead to. We demonstrate the usefulness of this latter unique analytical capability of GSLA with two case studies. In both case studies, GSLA was compared with five other widely used transcriptome data interpretation tools, including GSEA (Subramanian et al., 2005), GO Term Finder (Boyle et al., 2004), the DAVID GO cluster tool (Huang et al., 2007), and MapMan (Thimm et al., 2004). Meanwhile, we counted the number of GO terms identified by all tools (MapMan not included, because MapMan reports the pathway instead of GO terms) and what proportion of these terms are specific terms (depth 6+ in the GO term tree). The results are shown in Supplemental Table S8, and more details can be seen in Supplemental Materials S1, section 5.3. In general, GSLA identified the relevant and specific terms without reporting notably more terms than other tools.

PAIR/GSLA Predicts the Mechanism of Abscisic Acid Hypersensitivity in roa1 Mutants

The Arabidopsis roa1 mutant has been shown to be hypersensitive to abscisic acid (ABA) in the seedling stage. When treated with ABA (5 μm), wild-type plants can germinate normally, but roa1 mutants exhibit clear seedling growth defects, including reduced primary root elongation and yellow leaves (Zhan et al., 2015). In this study, 203 DEGs were reported to show at least 2-fold changes in the roa1 mutants before and after germination (Supplemental Table S9). The roa1 mutants not subjected to ABA treatment did not exhibit apparent growth and developmental defects under normal growth conditions (Fig. 4A). We used the PAIR/GSLA system to reanalyze the potential functional impacts of the 203 DEGs. Other DEG annotation tools also were used for comparison (Fig. 4B; Supplemental Tables S14–S17 and S35; details of the functional categories in Fig. 4 are given in Supplemental Materials S1, section 5.3).

Figure 4.

Figure 4.

PAIR/GSLA predicts the mechanism of ABA hypersensitivity in roa1 mutants. A, The roa1 mutants and wild-type (WT) plants show no phenotypic difference when not treated with ABA. This image was reproduced from Zhan et al. (2015). B, The DEGs of roa1 versus the wild type in the control group were analyzed for their biological process-level significances using five tools. GSLA reported all functional categories that were reported by other tools and the additional functional category of PCD and hypersensitivity. C, Functional categories and concept specificities of the terms reported by five gene-set annotation tools. D, Functional categories and GO terms of the results of GSLA.

In general, both GSLA and other transcriptome change annotation tools reported abiotic and biotic stress terms and some light harvest-related terms. These terms were consistent with the observed phenotype, but other annotation tools typically reported conceptually very general terms, which are common in plant stress responses and are not related to ABA hypersensitivity, which was the main phenotype of this study. In contrast, the terms reported by GSLA could be grouped into several functional categories (Fig. 4, C and D; term descriptions are provided in Supplemental Table S7). These terms included not only the biological functions reported by the other tools but also a distinctive functional category, PCD and hypersensitivity response, which are consistent with the main phenotype of this study (i.e. ABA hypersensitivity). It is well known that the hypersensitivity response in plants is a defense strategy against pathogens: a fast reaction that initiates PCD to eradicate the wounded or infected cells to stop further damage (Kliebenstein and Rowe, 2008). ABA inhibits the hypersensitivity response by increasing the threshold of the PCD signal (Unger et al., 2005). On the other hand, PCD is a necessary process in plant germination (Supplemental Figs. S4 and S5). The suppression of PCD leads to the aberrant degeneration of aleurone cells and to suppressed germination (Oracz and Karpínski, 2016). Therefore, the PCD and hypersensitivity response terms reported by GSLA are consistent with the ABA hypersensitivity phenotype and suggest a mechanism through which roa1 acquires its phenotype (for details, see Supplemental Materials S1, section 5).

It was also noted that GSLA reported a significant functional association between the down-regulated genes and the term GO:0003700 (sequence-specific DNA binding transcription factor activity; Supplemental Table S19, marked in blue). Remarkably, this functional association is supported by two interactions involving the gene At3g24170, which was not found to be significantly up- or down-regulated by Zhan et al. (2015). At3g24170 was later found to be one of the two most severely misspliced genes that contributed to the ABA hypersensitivity phenotype in roa1 mutants (Zhan et al., 2015).

Furthermore, in the above analysis, GSLA can identify the PCD and hypersensitivity response processes by comparing the transcriptome difference between the wild type and the roa1 mutant even before ABA treatment, while other tools may only report PCD and hypersensitivity response processes in analyses of transcriptome differences after ABA treatment (i.e. when the PCD pathway was actually modulated). This difference demonstrates the unique ability of GSLA to anticipate the potential functional impacts of currently observed DEGs (defective PCD and, therefore, possible hypersensitivity to ABA), without having to see an ABA-treated phenotype. In practical research, it is often important to understand the potential functional impacts of currently observed DEGs to formulate hypotheses and design further experiments.

PAIR/GSLA Reveals Different Mechanisms Leading to Similar Phenotypes

Woodson et al. (2015) reported the molecular mechanism of the deetiolation defect of Arabidopsis fc mutants when exposed to light after 4 d of germination without light. A similar phenotype with failure of deetiolation had been studied previously in flu mutants (Woodson et al., 2011; Domínguez and Cejudo, 2014). To elucidate the molecular mechanism of greening block in fc mutants, Woodson et al. (2015) performed a controlled experiment. Seeds of flu mutants and fc mutants were germinated in the dark for 4 d and then exposed to light. Both the flu and fc mutants showed a similar greening-block phenotype during deetiolation. The authors measured the transcriptome changes during deetiolation with RNA sequencing.

Upon comparison of the DEGs in the fc and flu mutants during deetiolation to the DEGs in the wild-type plants, 41 genes were found to be differentially expressed only in the fc mutants (Supplemental Table S12) and 77 genes were found to be differentially expressed only in the flu mutants (Supplemental Table S13). These two sets of genes were analyzed by GSLA (Supplemental Tables S28 and S33; details of the functional categories are given in Supplemental Materials S1, section 5.3) and other annotation tools (Supplemental Tables S24–S26, S29–S31, S37, and S38). As shown in Table I, GSLA reported biological functions that covered the same categories of biological functions reported by other tools, such as light intensity-related terms for the fc-specific DEGs (Table I, term GO:0009644; GO:0009642 in the stress/stimulus response category). In addition, GSLA reported PCD-related terms for the flu-specific DEGs (Table I), which were not reported by the other tools. To determine the respective mechanisms leading to photosynthetic cell death in the fc and flu mutants, Woodson et al. showed that mutation (dysfunction) of the EXE1 gene rescued the deetiolation phenotype in the flu mutants but not in the fc mutants. EXE1 is necessary for the initiation of the PCD pathway (Wagner et al., 2004). Therefore, the photosynthetic cell deaths in fc mutants are independent of the EXE1 gene and the PCD pathway (Woodson et al., 2015).

Table I. GO terms identified by GSLA in the analysis of DEGs between fc and flu mutants during deetiolation.

Mutant Functional Category GO Term Identifier
fc mutants Basic cellular process GO:2001147, GO:0043295, GO:1900750, GO:0072341, GO:1901681, GO:0003677, GO:0003700, GO:0006457, GO:0010468, GO:0016757, GO:0006355, GO:0006351, GO:0003676, GO:0016740
Organics metabolism GO:2001227, GO:2001147, GO:0097243, GO:1900750, GO:0043295, GO:1901959, GO:1901957, GO:0080043,
Stress/stimulus response GO:0010286, GO:0009644, GO:0009642, GO:0009408, GO:0000302, GO:0009266, GO:0033554, GO:0006950
flu mutants Basic cellular process GO:0016459, GO:0006928, GO:0090150, GO:0006612
Defense/immune response GO:0010440, GO:0031347, GO:0050776, GO:0031348, GO:0050794, GO:0045088
Organics metabolism GO:0034620, GO:0016772, GO:0016301, GO:0003984, GO:0097472, GO:0004693, GO:0071901, GO:0016211, GO:0008113
Hypersensitivity/PCD GO:0043069, GO:0060548, GO:0012501, GO:0010363, GO:0009626, GO:0043067
Signal transduction GO:0004721, GO:0006984, GO:0023014
Stress/stimulus response GO:0051716, GO:0080135, GO:0030968, GO:0080134, GO:0048585
Phytohormone GO:0009867, GO:0009862, GO:0071395
Seedling development GO:0010084, GO:0097472, GO:0004693, GO:0045786

In addition, the DEGs between the wild type and the fc mutants at the beginning of deetiolation (i.e. after 4 d of germination in darkness but before exposure to light) were analyzed using GSLA and other tools (Supplemental Tables S19–S22 and S36). Among other tools, the annotation term clustering tool DAVID reported the most relevant terms. DAVID reported the terms related to redox reaction and the photosynthesis system, and these terms were generally related to the mechanism of greening block in fc mutants. However, these terms lack specificity and are of limited value in suggesting further research directions. In contrast, GSLA reported terms that covered all functional categories of the terms reported by other tools along with additional terms related to nutrition starvation/transport and signal transduction (Fig. 5, C and D; details of the functional categories are given in Supplemental Materials S1, section 5.3). It is interesting that one term under the signal transduction category, G-protein-coupled photoreceptor activity, suggested interference with the light stimulation response in fc mutants. In Arabidopsis, phytochrome (PΦB) is a core component of the photoreceptor signaling pathway that regulates the shifting from skotomorphogenesis to photomorphogenesis (i.e. the initiation of light development). PΦB changes its molecular conformation in response to red/far-red light shifting, which alters its distribution between the nucleus and cytoplasm and changes the expression of its downstream genes (Fig. 5A; for details, see Supplemental Materials S1, section 5; Li et al., 2011; Taiz et al., 2015). Therefore, it is expected that interference with PΦB synthesis may block the photoreceptor signaling pathway and delay the initiation of light development. The fc mutants have dysfunctional ferrochelatases, which are related to the tetrapyrrole metabolism pathway that converts protoporphyrin IX to heme and finally to PΦB (Fig. 5B; for details, see Supplemental Materials S1, section 5; Woodson et al., 2011; Taiz et al., 2015). In addition to the G-protein-coupled photoreceptor activity term, GSLA also reported several iron- and nitrogen-related terms that are consistent with the disruption of the tetrapyrrole metabolism pathway. This evidence suggested that the disruption of the G-protein photoreceptor signaling pathway may be a mechanism leading to the deetiolation defect in fc mutants.

Figure 5.

Figure 5.

PAIR/GSLA reveals different mechanisms leading to similar phenotypes. A, The heme metabolism pathway plays an important role in the light development of Arabidopsis. This pathway produces essential components for both the light-synthesis machinery (especially the light-harvesting complex [LHC]) and the photoreceptor. The photoreceptor will change conformation after absorbing red/far-red light, which leads to ion fluxes and alterations in gene expression that initiate light development. B, In wild-type plants, protoporphyrin IX (Proto) is converted to Pchilde, then to Childe, and then used for the synthesis of chlorophyll or PΦB, which are essential components of the photoreceptor. In flu mutants (the green branch), defective FLU leads to the accumulation of Pchilde and, consequently, to a reactive oxygen species (ROS) burst. PCD is then triggered by the ROS burst. In fc mutants (the red branch), defective FC delays the conversion from Proto to PΦB and may deplete the supply of photoreceptor. An insufficient supply of photoreceptor may lead to the selective degeneration of defective chloroplasts. C, Functional categories and concept specificities of the terms reported by five gene-set annotation tools. GSLA reported all functional categories that were reported by other tools and the additional functional category of signal transduction, including GO:0008020 (G-protein-coupled photoreceptor activity). D, Functional categories and GO terms of the results of GSLA.

It is interesting that the above insights obtained from the reanalysis of the transcriptome differences between the fc and flu mutants are supported by several follow-up studies. A series of studies published in the years 2016 and 2017 depicted a complex regulatory network in which tetrapyrrole metabolism, photoreceptor, and G-protein signaling pathways interconnect to react to change in light intensity and to initiate photomorphogenesis, especially the development and quality control of the light-synthesis system involving chloroplast degeneration (Martín et al., 2016; Woodson, 2016; Hirosawa et al., 2017; Wang et al., 2017). Altogether, these results demonstrated the capability of the PAIR/GSLA system to reveal different mechanisms leading to similar phenotypes at the biological process level. Furthermore, these mechanisms are relevant to physiology and may suggest further directions of investigation.

Using the PAIR/GSLA System to Interpret Transcriptome Changes

Based on a carefully prepared functional gene association network, the PAIR/GSLA system has demonstrated, in the above case studies, an improved capability compared with that of several well-established tools for interpreting the functional impacts of observed DEGs. It can be argued that species evolve complex genetic networks to provide robust physiological functions to adapt to environmental events, yet each individual retains a certain level of variation. While individual gene variations may distract us from the strategic mechanisms that plants have evolved to adapt to the environment, the ability of PAIR/GSLA to make interpretations at the physiological function level may facilitate our quest to understand the logic behind plant-environment interactions and provide high-level insights into the molecular mechanisms and how these mechanisms might be manipulated to generate plants with desired traits.

In addition, PAIR/GSLA uses gene sets to represent the concepts that are used for transcriptome data interpretation. Gene sets may have variable sizes and may represent concepts of different granularity. Gene sets also can originate from annotation, such as GO terms, or from observation, such as observed signature expression changes that characterize a specific physiological status. Although it is not fully discussed in this work, the second option may allow more flexible use of the PAIR/GSLA system in studying physiological functions that cannot be described accurately by existing annotation terms. It is also possible to modify the annotation-derived gene sets to create new gene sets that represent more specific concepts.

Finally, there is an essential difference between the PAIR/GSLA system and other transcriptome data interpretation tools. PAIR/GSLA evaluates whether the functional association between the observed DEGs and a specific biological function (i.e. its corresponding genes) is significant, while most other approaches evaluate whether a biological function (i.e. its corresponding genes) is enriched or clustered in the observed DEGs. It is possible that the observed DEGs may interfere significantly with a biological function without the biological function itself being enriched in the observed DEGs. For example, in the roa1 mutant case study, the RBM-25 gene mutation triggered transcriptomic differences between the roa1 mutants and the wild-type gene. These differences led to the ABA hypersensitivity phenotype, which is closely related to the PCD pathway. However, PCD-related genes were not enriched in the observed DEGs. Additionally, there may not be an accurate and specific term to describe the transcriptome change. The observed DEGs may interfere with several biological functions, not all of which are significant enough to be enriched in the observed DEGs. If a concept exists that encompasses these biological functions with minimal inclusion of other functions (i.e. the existence of a specific concept), genes associated with this concept may be enriched in the observed DEGs. However, this overall encompassing concept does not always exist because, in practical research, the novelty of a study often lies in the discovery that several previously unconnected biological functions work coordinately to respond to certain environmental events. In the fc mutant case study, the selective degeneration of chloroplast is what actually occurred in the plant. However, there is no term called selective degeneration of chloroplast in the GO database. Blocking the PΦB signal pathway, seedling defection, and impeding the synthesis of iron/nitrogen compounds are all involved in the selective degeneration of chloroplasts (Woodson et al., 2015; Martín et al., 2016; Woodson, 2016; Hirosawa et al., 2017; Wang et al., 2017). The related terms also are not always enriched in the observed DEGs. Because of the lack of enrichment, other tools often reported very general terms or no terms at all. For example, in the fc mutant case study, other tools reported very general terms such as metal-binding or no related terms at all. In contrast, the PAIR/GSLA system reported functionally associated but not enriched terms (G-protein-coupled photoreceptor activity, iron ion transport, nitrate transport, trichoblast differentiation, and root epidermal cell differentiation), which may provide useful insights for designing additional experiments.

MATERIALS AND METHODS

Evidence Data for the Prediction of Arabidopsis Interactions

We retrieved and integrated experimentally reported physical protein interaction data for Arabidopsis (Arabidopsis thaliana) from Intact (Hermjakob et al., 2004), BioGRID (Stark et al., 2006), BIND (Bader et al., 2003), and TAIR (Lamesch et al., 2012; Supplemental Table S1). The original data were curated by the evidence provided in each database to determine whether each reported physical protein interaction was experimentally identified rather than predicted. These physical protein interactions were then used as positive examples to train an SVM model. In addition, six types of indirect evidence were used for the prediction of interactions, each suggesting a specific type of functional association (Rhodes et al., 2005; Shoemaker and Panchenko, 2007). Thirty feature values were computed to numerically represent these six types of evidence via different mathematical characterizations (Supplemental Table S3). We tested each feature for its relevance by measuring the AUC of its ROC to discriminate known protein interactions from random protein pairs. A cutoff of 0.6 was used. After filtration, 16 features passed the AUC criterion (Supplemental Fig. S1). More details are given in Supplemental Materials S1, section 1.

Modeling and Interaction Prediction

The SVM algorithm was used to build the interaction prediction model. We chose the SVM algorithm because of its advantage of high prediction precision when modeling high-dimension data sets (Brown et al., 2000; Burbidge et al., 2001) and its efficient implementation (Winters-Hilt et al., 2006; LIBSVM version 3.0 [http://www.csie.ntu.edu.tw/?cjlin/libsvm/]). We implemented a 5-fold cross-validation scheme to optimize the model. The final prediction model inferred a total of 329,044 functional associations. More details are given in Supplemental Materials S1, section 2.

Network Evaluation

The predicted interaction network was evaluated by comparing its quality as a functional association network with seven other interactomes. The quality of the functional association network (i.e. the ability to group functionally associated genes together) was evaluated by predicting a gene’s function (GO terms with evidence codes EXP, IDA, IMP, IGI, TAS, IPI, IEP, ISS, ISO, ISA, ISM, IGC, and RCA) using the PANTHER term enrichment tool (Mi et al., 2017) to identify enriched annotation terms in this gene’s first-degree network neighbors (Wang et al., 2007). More details are given in Supplemental Materials S1, section 3.

Omics Data Analysis

The GSLA tool was developed to interpret the collective functional impacts of DEGs according to their network synergy. GSLA evaluates whether the DEG set has significant functional associations with other gene sets representing biological functions. GSLA relies on two hypotheses (Q1 and Q2) to evaluate whether two gene sets have significant functional associations. Q1 tests the strength of the functional association, while Q2 verifies whether the observed strong association is the result of the biologically correct network topology (i.e. our knowledge of the molecular mechanisms) rather than the result of the compositions of these two gene sets. Some genes, known as hubs, have considerably more neighbors than other genes. Gene sets consisting of many hub genes are more likely to connect to other gene sets. Q2 is used to remove the confounding factor of gene set composition. Q1 and Q2 are related but different tests that complement each other to increase the sensitivity and specificity of GSLA (Supplemental Fig. S3). More details are given in Supplemental Materials S1, section 4.

Accession Numbers

Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers At1g60200 (roa1), At5g26030 (fc-1), At2g30390 (fc-2) and At3g14110 (flu).

Supplemental Data

The following supplemental materials are available.

Footnotes

1

This work was supported by the Zhejiang Provincial Natural Science Foundation of China (grant no. LR13C020001), the National Key Research and Development Program of China (grant no. 2016YFD0100700), and the National Science Foundation of China (grant no. 31571356).

[OPEN]

Articles can be viewed without a subscription.

References

  1. Bader GD, Betel D, Hogue CWV (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31: 248–250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berger B, Peng J, Singh M (2013) Computational solutions for omics data. Nat Rev Genet 14: 333–346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Michael J, Sherlock G (2004) GO::TermFinder: open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20: 3710–3715 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M Jr, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97: 262–267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Burbidge R, Trotter M, Buxton B, Holden S (2001) Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput Chem 26: 5–14 [DOI] [PubMed] [Google Scholar]
  6. Chang CC, Lin CJ (2013) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2: 1–39 [Google Scholar]
  7. Cui J, Li P, Li G, Xu F, Zhao C, Li Y, Yang Z, Wang G, Yu Q, Li Y, et al. (2008) AtPID: Arabidopsis thaliana Protein Interactome Database. An integrative platform for plant systems biology. Nucleic Acids Res 36: D999–D1008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. De Bodt S, Proost S, Vandepoele K, Rouzé P, Van de Peer Y (2009) Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics 10: 288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Domínguez F, Cejudo FJ (2014) Programmed cell death (PCD): an essential process of cereal seed development and germination. Front Plant Sci 5: 366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Geisler-Lee J, O’Toole N, Ammar R, Provart NJ, Millar AH, Geisler M (2007) A predicted interactome for Arabidopsis. Plant Physiol 145: 317–329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gomez-Cabrero D, Abugessaisa I, Maier D, Teschendorff A, Merkenschlager M, Gisel A, Ballestar E, Bongcam-Rudloff E, Conesa A, Tegnér J (2014) Data integration in the era of omics: current and future challenges. BMC Syst Biol (Suppl 2) 8: I1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hamp T, Rost B (2015) More challenges for machine-learning protein interactions. Bioinformatics 31: 1521–1525 [DOI] [PubMed] [Google Scholar]
  13. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, et al. (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res 32: D452–D455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Heyndrickx KS, Vandepoele K (2012) Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol 159: 884–901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hirosawa Y, Ito-Inaba Y, Inaba T (2017) Ubiquitin-proteasome-dependent regulation of bidirectional communication between plastids and the nucleus. Front Plant Sci 8: 310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA (2007) The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8: R183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kliebenstein DJ, Rowe HC (2008) Ecological costs of biotrophic versus necrotrophic pathogen resistance, the hypersensitive response and signal transduction. Plant Sci 174: 551–556 [Google Scholar]
  18. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, et al. (2012) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40: D1202–D1210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY (2010a) Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana. Nat Biotechnol 28: 149–156 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lee K, Thorneycroft D, Achuthan P, Hermjakob H, Ideker T (2010b) Mapping plant interactomes using literature curated and predicted protein-protein interaction data sets. Plant Cell 22: 997–1005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Li J, Li G, Wang H, Deng XW (2011) Phytochrome signaling mechanisms. The Arabidopsis Book 9: e0148, [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lin M, Hu B, Chen L, Sun P, Fan Y, Wu P, Chen X (2009) Computational identification of potential molecular interactions in Arabidopsis. Plant Physiol 151: 34–46 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lin M, Zhou X, Shen X, Mao C, Chen X (2011) The Predicted Arabidopsis Interactome Resource and network topology-based systems biology analyses. Plant Cell 23: 911–922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Martín G, Leivar P, Ludevid D, Tepperman JM, Quail PH, Monte E (2016) Phytochrome and retrograde signalling pathways converge to antagonistically regulate a light-induced transcriptional network. Nat Commun 7: 11431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD (2017) PANTHER version 11: expanded annotation data from Gene Ontology and reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 45: D183–D189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Obayashi T, Aoki Y, Tadaka S, Kagaya Y, Kinoshita K (2018) ATTED-II in 2018: a plant coexpression database based on investigation of statistical property of the mutual rank index. Plant Cell Physiol 59: e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Oracz K, Karpínski S (2016) Phytohormones signaling pathways and ROS involvement in seed germination. Front Plant Sci 7: 864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Park Y, Marcotte EM (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods 9: 1134–1136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rao VS, Srinivas K, Sujini GN, Kumar GNS (2014) Protein-protein interaction detection: methods and analysis. Int J Proteomics 2014: 147648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Rhee SY, Mutwil M (2014) Towards revealing the functions of all genes in plants. Trends Plant Sci 19: 212–221 [DOI] [PubMed] [Google Scholar]
  31. Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM (2005) Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 23: 951–959 [DOI] [PubMed] [Google Scholar]
  32. Saha R, Chowdhury A, Maranas CD (2014) Recent advances in the reconstruction of metabolic models and integration of omics data. Curr Opin Biotechnol 29: 39–45 [DOI] [PubMed] [Google Scholar]
  33. Shoemaker B, Panchenko A (2007) Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol 3: e43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34: D535–D539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102: 15545–15550 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43: D447–D452 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Taiz L, Zeiger E, Møller IM, Murphy A (2015) Plant Physiology and Development. Sinauer Associates, Sunderland, MA [Google Scholar]
  38. Thimm O, Bläsing O, Gibon Y, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37: 914–939 [DOI] [PubMed] [Google Scholar]
  39. Unger C, Kleta S, Jandl G, Tiedemann AV (2005) Suppression of the defence-related oxidative burst in bean leaf tissue and bean suspension cells by the necrotrophic pathogen Botrytis cinerea. J Phytopathol 153: 15–26 [Google Scholar]
  40. Wagner D, Przybyla D, Op den Camp R, Kim C, Landgraf F, Lee KP, Würsch M, Laloi C, Nater M, Hideg E, et al. (2004) The genetic basis of singlet oxygen-induced stress responses of Arabidopsis thaliana. Science 306: 1183–1185 [DOI] [PubMed] [Google Scholar]
  41. Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23: 1274–1281 [DOI] [PubMed] [Google Scholar]
  42. Wang Y, Wu Y, Yu B, Yin Z, Xia Y (2017) EXTRA-LARGE G PROTEINs interact with E3 ligases PUB4 and PUB2 and function in cytokinin and developmental processes. Plant Physiol 173: 1235–1246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, et al. (2010) The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38: W214–W220 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Winters-Hilt S, Yelundur A, McChesney C, Landry M (2006) Support vector machine implementations for classification & clustering. BMC Bioinformatics (Suppl 2) 7: S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Woodson JD. (2016) Chloroplast quality control: balancing energy production and stress. New Phytol 212: 36–41 [DOI] [PubMed] [Google Scholar]
  46. Woodson JD, Joens MS, Sinson AB, Gilkerson J, Salomé PA, Weigel D, Fitzpatrick JA, Chory J (2015) Ubiquitin facilitates a quality-control pathway that removes damaged chloroplasts. Science 350: 450–454 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Woodson JD, Perez-Ruiz JM, Chory J (2011) Heme synthesis by plastid ferrochelatase I regulates nuclear gene expression in plants. Curr Biol 21: 897–903 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, et al. (2008) High-quality binary protein interaction map of the yeast interactome network. Science 322: 104–110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zhan X, Qian B, Cao F, Wu W, Yang L, Guan Q, Gu X, Wang P, Okusolubo TA, Dunn SL, et al. (2015) An Arabidopsis PWI and RRM motif-containing protein is critical for pre-mRNA splicing and ABA responses. Nat Commun 6: 8139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Zhou X, Chen P, Wei Q, Shen X, Chen X (2013) Human interactome resource and gene set linkage analysis for the functional interpretation of biologically meaningful gene sets. Bioinformatics 29: 2024–2031 [DOI] [PubMed] [Google Scholar]

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES