Significance
Carbon fixation and accumulation as lignocellulosic biomass is of global ecological and industrial importance and most significantly occurs in the form of wood development in trees. Traits of importance in biomass accumulation are highly complex and, aside from environmental factors, are affected by many pathways and thousands of genes. We have applied a network-based data integration method for a systems genetics analysis of genes, processes, and pathways underlying biomass and bioenergy-related traits using segregating Eucalyptus hybrid tree populations. We could link biologically meaningful sets of genes to complex traits and at the same time reveal the molecular basis of trait variation. Such a holistic view of the biology of wood formation will contribute to genetic improvement and engineering of plant biomass.
Keywords: systems genetics, lignocellulosic biomass, cell wall, bioenergy, network-based data integration
Abstract
As a consequence of their remarkable adaptability, fast growth, and superior wood properties, eucalypt tree plantations have emerged as key renewable feedstocks (over 20 million ha globally) for the production of pulp, paper, bioenergy, and other lignocellulosic products. However, most biomass properties such as growth, wood density, and wood chemistry are complex traits that are hard to improve in long-lived perennials. Systems genetics, a process of harnessing multiple levels of component trait information (e.g., transcript, protein, and metabolite variation) in populations that vary in complex traits, has proven effective for dissecting the genetics and biology of such traits. We have applied a network-based data integration (NBDI) method for a systems-level analysis of genes, processes and pathways underlying biomass and bioenergy-related traits using a segregating Eucalyptus hybrid population. We show that the integrative approach can link biologically meaningful sets of genes to complex traits and at the same time reveal the molecular basis of trait variation. Gene sets identified for related woody biomass traits were found to share regulatory loci, cluster in network neighborhoods, and exhibit enrichment for molecular functions such as xylan metabolism and cell wall development. These findings offer a framework for identifying the molecular underpinnings of complex biomass and bioprocessing-related traits. A more thorough understanding of the molecular basis of plant biomass traits should provide additional opportunities for the establishment of a sustainable bio-based economy.
Wood (secondary xylem tissue) from trees represents a significant proportion of global carbon sequestration (1, 2), while also providing raw materials, be it in the form of timber, paper, or other biomaterials and value-added derivatives, such as cellulose (3) and lignin (4). Fast-growing trees such as poplars and eucalypts with short rotation times, small genome sizes (500–600 Mbp), large genetic diversity, and established breeding populations are widely cultivated as woody biomass crops and are well suited for biotechnological improvement (5–8). A tree’s amenability to bioprocessing for pulp, paper, cellulose, and other bioproducts is dependent on the aggregate of its wood properties, which are a function of cellular architecture and the chemistry and ultrastructure of the secondary cell walls (SCWs) of wood fiber cells that compose the bulk of woody biomass (9). These properties are determined by overlapping developmental programs and pathways that have to be coordinated during secondary xylem development (xylogenesis) (10). Some of these pathways, SCW polysaccharide and lignin biosynthesis in particular, represent a strong, irreversible carbon sink and directly or indirectly use core metabolites (glucose/UDP-glucose and fructose) that are also used for growth and physiological/cellular homeostasis (energy metabolism, production of amino acids, etc.). These metabolic interdependencies and the multitude of biological processes involved result in woody biomass traits having complex genetic architectures, especially in highly outbred organisms such as forest trees (11).
Functional genetics approaches based mainly on single-gene perturbations have been informative in revealing components of the SCW producing system in model plants (12). However, they have not necessarily yielded insight into the complex interactions of the genes that naturally affect cell wall chemistry and ultrastructure in ways that do not interfere with normal plant growth, form, and biomass accumulation. The characterization of natural genetic perturbations segregating in phenotypically wild-type individuals offers an attractive alternative tool to study properties that emerge from permissible genetic variation (11, 13). With the availability of high-throughput genotyping and high-resolution linkage maps in Populus (14, 15) and Eucalyptus (16–19), genetic approaches such as quantitative trait locus (QTL) (14, 20, 21) and LD-based association mapping (22–25) are becoming feasible to study woody biomass traits in long-lived perennial plants. While such association mapping approaches facilitate the delimitation of genome positions harboring causal variation (13), they provide little information as to how these genes and their variants act together in biological pathways to influence trait variation.
Complementing genetic information with molecular phenotypes (e.g., transcript levels) can contribute to a better mechanistic understanding of trait variation (24, 26, 27). Expression QTL (eQTL) analysis (28) allows the identification of genomic loci associated with variation in molecular phenotypes. In contrast to QTL or association analyses, eQTL analysis also identifies the genes that are affected by this variation and thus potentially contribute to the complex trait. However, because complex traits are subject to the combined effect of multiple genetic loci, each with a small effect on the trait, there is generally low power to detect statistically significant associations when relying on single-gene analysis. In addition, single gene associations do not show how genes and/or pathways interact to explain a complex trait (29).
To cope with the aforementioned limitations of the association problem, more integrated systems genetics approaches have been proposed (29, 30). Methods that use network models to represent molecular a priori knowledge on the organism/trait of interest (31) have been particularly successful to perform association analysis in clonal systems (32–35). In the context of outbred populations, network-based methods have been applied for gene prioritization (36) or to increase the reliability of eQTL association mapping itself (37) but not yet for integrative association analysis.
Here we applied a systems genetics approach to study the genomic loci and pathways affecting wood formation in Eucalyptus. We generated coupled genetics/genomics (linkage map and immature xylem transcriptome) data for 156 individuals segregating from an F2 pseudobackcross between a Eucalyptus grandis × Eucalyptus urophylla F1 interspecific hybrid tree and an unrelated E. urophylla tree and profiled traits representative of tree growth (diameter at breast height and bark thickness), wood properties (wood basic density and cell wall composition), and bioprocessing metrics (sugar release). Data were integrated using a unique network-based data integration (NBDI) approach that allows combining genotyping, expression profiling, and prior network information to prioritize genes and molecular mechanisms associated with complex wood formation traits.
Results
Network-Based Gene–Trait Association.
One hundred fifty-six E. grandis × E. urophylla F2 interspecific backcross trees were profiled for transcript abundance in immature xylem and for woody biomass traits that relate to growth, wood density, cell wall composition, and sugar extractability (Table 1 and Materials and Methods). Genotyping, expression profiling, inferred eQTL associations, and prior network information were simultaneously used to prioritize genes and molecular mechanisms associated with each of the measured traits. To this end we developed an integration approach that makes use of a gene interaction network model in which nodes are genes and edges represent two types of information. If derived from prior information, edges reflect relations between genes and gene products, derived from Kyoto Encyclopedia of Genes and Genomes (KEGG) (SI Materials and Methods). If derived from eQTL associations, edges reflect that the connected genes share the same eQTL and thus are likely functionally related and coregulated.
Table 1.
Trait | NBDI enrichment p | Nontransformed enrichment p |
DBH (over bark) | 6.38 × 10−20 | 2.81 × 10−11 |
DBH (under bark) | 5.55 × 10−15 | 1.38 × 10−10 |
Bark thickness | 4.61 × 10−19 | 1.38 × 10−10 |
Wood density | 6.52 × 10−10 | 5.49 × 10−12 |
Lignin content | 2.83 × 10−01 | 9.28 × 10−01 |
Total C5 sugar in walls | 1.02 × 10−01 | 2.83 × 10−01 |
Total C6 sugar in walls | 1.02 × 10−01 | 9.75 × 10−01 |
Glucose released | 5.49 × 10−12 | 3.03 × 10−04 |
Percent of max glucose released | 3.29 × 10−14 | 1.03 × 10−04 |
Xylose released | 1.88 × 10−13 | 1.02 × 10−01 |
% of max xylose release | 2.21 × 10−03 | 7.19 × 10−01 |
Glucose + xylose released | 2.81 × 10−11 | 3.03 × 10−04 |
Percent of max sugar released | 3.29 × 10−14 | 2.21 × 10−03 |
For each trait, 300 genes were selected. The enrichment score corresponds to the P value of a hypergeometric enrichment test.
This network model is then used to propagate on a per sample basis the expression signals of genes to a local network neighborhood. This propagation transforms the original gene expression data to network-diffused gene expression data (Fig. 1 and Materials and Methods). In the network-diffused expression matrix, each data point can be interpreted as the original expression signal of a gene in a sample, modulated by the expression of the genes that are close neighbors in the network (i.e., that are likely found in the same pathways or to share eQTLs, etc.). Modulation implies that if nodes in the local neighborhood of a gene are also expressed, the expression signal of the gene is confirmed and its relevance is increased; otherwise, its relevance is decreased. Using diffused gene expression is comparable to module-based analysis, where before further analysis, groups of genes with correlating gene expression (modules) are identified, under the assumption that these genes participate in the same biological processes or belong to the same pathway. The network neighborhood used to modulate the expression of a gene can be viewed as an implicit module of functionally related genes, with the main advantage of our approach that each individual gene can be prioritized or deprioritized for a particular trait based on the integrated information from network neighbors.
Once the gene expression values are diffused through the network to obtain the transformed expression values (referred to as NBDI-transformed values), genes relevant for each trait under study are identified by correlation analysis. Correlating NBDI-transformed values to each of the traits allows for ranking, per trait, of genes of potential relevance to that trait (for gene selections for each trait, see Dataset S1). To benchmark the performance of the NBDI approach, we compiled a literature-based set of reference genes with known biological roles in SCW biosynthesis (the best characterized and a central biological process in wood formation; Dataset S2) and calculated the extent to which gene sets associated with the traits were enriched for these reference genes. To illustrate the added value of using NBDI-transformed values, we also performed the same associations but using the nontransformed gene expression values when correlating gene expression variation to trait variation. The difference in prioritization performance between our approach and that of using nontransformed expression values is illustrated as a function of the number of reference genes that is prioritized for either all traits together (Fig. 2) or for each individual trait (Fig. S1). Compared with using nontransformed expression data, NBDI-transformed data result in higher enrichment of reference genes, regardless of how many genes per trait were selected (Fig. 2). This higher enrichment can be explained by the fact that for most traits the NBDI transformation results in assigning a higher prioritization rank to reference genes than when using the nontransformed values (Fig. S2). In addition, with NBDI-transformed genes we were able to prioritize genes that associated to several traits as well as genes that were trait specific (Fig. S3). In the remainder of the analyses, we selected the 300 genes (Dataset S1) of which the NBDI-transformed expression correlated best with the trait. With this threshold a significant enrichment for reference genes (Table 1) was obtained, while still yielding results that were unique enough to explain trait differences. Applying these criteria resulted in a gene selection of 1,529 nonredundant genes (Dataset S1) that were linked to at least one of the traits and of which 102 were found in the reference set (pNBDI = 1.38 × 10−21).
Genes and Pathways of Relevance to the Traits Under Study.
The NBDI approach combines genetic and prior information with gene expression variation to prioritize, per trait, relevant genes/pathways influenced by genetic variation in the population. This combined information is captured in the two complementary views, an eQTL (Fig. 3) and a network view (Fig. 4).
First, because of its network model, NBDI implicitly imposes that genes prioritized for a trait should also share an eQTL at one or more loci in the genome. Most eQTLs for genes prioritized by the NBDI approach for a particular trait should therefore cluster together in hot spots rather than being randomly scattered along the genome. Fig. 3 shows this is indeed the case.
Second, as the NBDI approach favors genes that are connected in the interaction network, at least some of the genes associated with a trait, when projected on a gene interaction network, should cluster together and constitute a molecular subnetwork underlying the trait providing a complementary network view. Because this network of curated gene interactions (derived from KEGG) is sparser than the network used for the NBDI analysis that also includes eQTL overlap relations, by definition only a subset of the gene selections and relations can be visualized in subnetworks. To visualize eQTL relations for genes that could be connected through KEGG, eQTL overlap relations were overlaid on the identified subnetworks. For each trait, the largest connected component of the obtained network is extracted, containing the prioritized genes that can be connected in the network through direct edges. Representative subnetworks of the broad trait classes (Fig. 3) are presented in Fig. 4. The networks corresponding to wood density, diameter at breast height (DBH), glucose released, and lignin content contain 20, 39, 28, and 48 genes, respectively, from the original 300 selected genes per trait and were highly significant (the probability of obtaining a connected component with the observed size purely by chance was smaller than 10−10, 10−23, 10−17, and 10−32, respectively).
As shown in Figs. 3 and 4, the traits under study depend, at least partially, on similar processes and shared eQTLs. This is expected given that all traits reflect wood-related properties and will be, to different extents, phenotypically related. However, trait-specific differences can be identified. When the traits are clustered based on the eQTLs shared by the genes in their gene selections (SI Materials and Methods), four broad groups of traits can be identified (Fig. 3): (1) wood density, (2) sugar release (bioprocessing) metrics, (3) growth traits, and (4) C5 and C6 sugar in cell walls together with lignin content and the percent of maximum xylose release. Below, the biological functions of the genes associated with the shared eQTL peaks found for traits and trait groups are discussed in more detail (Fig. S4 and Dataset S3).
The biological functions of the genes associated with traits in groups 2 (sugar release) and 3 (growth traits) are characterized by shared cell wall related processes (cell wall organization and bioprocessing, hemicellulose metabolic processes, xylan biosynthesis, glucuronxylan biosynthesis, and lignin biosynthesis) involving genes with eQTL located at peaks 43, 48, 49, and 50, and by processes that are shared by all traits at eQTL peaks 10 and 11 (but for which no functional overrepresentation could be assigned) (Fig. S4). All growth related traits (bark thickness and DBH) and all bioprocessing (sugar release) metrics (except percent maximum xylose release) belong to this group.
Growth-related traits (group 3) seem to be dominated, in addition to the cell wall related processes mentioned above, by anthocyanin related processes (eQTL peak 33). This is also illustrated in the DBH subnetwork that is representative for this trait group (Fig. 4). The DBH subnetwork contains a considerable number of lignin-related genes. Most bona fide lignin biosynthesis genes that are highly expressed in developing xylem (8, 38) (Dataset S3) were indeed associated with growth (DBH and bark thickness traits). In addition to these lignin-related genes, the DBH network contains several genes involved in hormone signaling (IAA9 and ARF7) and flavonoid biosynthesis (ANS, BAN, and A11). The relationship between lignin and flavonoid biosynthesis has been shown in Arabidopsis, where silencing of hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyl transferase (HCT) resulted in decreased plant growth and redirection of the metabolic flux into flavonoid production through chalcone synthase (39).
In contrast, sugar release traits (trait group 2) and wood density (trait group 1) differ from the growth-related traits (group 3) in their larger involvement of photosynthesis-related processes (related to eQTL peak 46; Dataset S3). Because secondary xylem tissue is the main carbohydrate sink in woody plants, its composition is expected to be affected by the availability of fixed carbon resulting from photosynthesis (40). A representative subnetwork of this group of sugar release traits (percent glucose released; Fig. 4) is much less dominated by prior interactions (they might be less known for the processes involved) but clearly shows the presence of two clusters of genetic associations involving, among others, genes encoding TUBULIN ALPHA-2 CHAIN (TUA2), COBALAMIN-INDEPENDENT METHIONINE SYNTHASE/ METHIONINE SYNTHESIS 1 (ATCIMS/MS1), and IRREGULAR XYLEM 8/GALACTURONOSYLTRANSFERASE 12 (IRX8/GAUT12). homologs of which have previously been identified as associated with hemicellulose and lignin related properties in Populus (24). IRX8/GAUT12 is known to be involved in xylan structure (41), and modulation of its expression in poplar results in improved sugar release efficiency (42). More indirect arguments can be made for TUA2 and ATCIMS/MS1. Tubulins affect cortical microtubule arrangement and thus cellulose microfibril angle (MFA) (43, 44). TUA2 specifically is known to be a target of SND1, the master regulator of secondary cell wall deposition (45, 46), and is one of two alpha-tubulins that is significantly up-regulated in Eucalyptus during tension wood formation, during which MFA is one of the main physical changes in the wood (47). Given that MFA is thought to affect wood ultrastructure and stiffness (48, 49), it is interesting to find this tubulin associated with bioprocessing-related traits such as sugar release in this study. Several associations were identified (mainly with sugar release efficiency) for genes involved in cysteine and methionine metabolism, including ATCIMS/MS1. The roles of these genes in biomass formation are becoming increasingly revealed, being linked directly to either lignin (50, 51) or hormone-mediated growth regulation (52). In addition, SCW polysaccharide biosynthesis genes known to be expressed in developing xylem (8) were mainly associated with variation in wood density and glucose release efficiency (Dataset S3). In the latter case, the majority were xylan modification genes, affecting patterns of acetylation, and glucuronic acid and methyl-glucuronic acid decoration of the xylan backbone (53, 54) (Dataset S3).
Trait group 4 contains traits related to the total cell wall sugar content and, surprisingly, also lignin content and percent maximum xylose released. These traits are characterized by the relatively smaller effect of major eQTL peaks that dominate most of the other biomass traits. The lignin subnetwork (Fig. 4) in general lacks most of the genes related to lignin biosynthesis itself. Indeed, few association between the variation of expression of SCW biosynthetic genes (cellulose, xylan, and lignin pathway genes) and the final C5 and C6 sugar and lignin content of the cell wall were apparent (Dataset S3). Several genes highly associated with variation in lignin content code for enzymes involved in carbon metabolism, including phosphoenolpyruvate (phosphoenolpyruvate carboxykinase enolase), pyruvate (plastidial pyruvate kinase 3 and malate dehydrogenase), and acetyl-CoA (pyruvate dehydrogenase E1 α subunit) metabolism, as well as pathways producing UDP-glucose and fructose (UDP-glucose pyrophosphorylase and sucrose synthase; Dataset S3). Genes involved in mitochondrial energy metabolism were also associated with lignin content. Very distinctive for this group of traits are also the genes that relate to abiotic stress (eQTL peak 34) and RNA modification (a process associated with a very distinctive eQTL peak 48).
As an additional external validation, we overlaid previously identified QTLs (blue circles in Fig. 3) for the same complex traits (20) with the obtained eQTL frequency peaks (Fig. 3). These results show that at least some of these trait QTLs are in close proximity to eQTL frequency peaks (especially peaks 28, 29, 48, and 49, located on chromosomes 6 and 10), providing additional evidence that the gene selection is relevant for the trait under study. Several eQTL peaks cannot be directly mapped to trait QTLs. These might represent polymorphisms that only have detectable effects on molecular subcomponents of a trait but cannot be directly associated with the phenotype itself. Given that complex traits are affected by different molecular traits in epistatic and nonlinear ways, a direct link between molecular traits and phenotypic traits is not always expected or the effects are too numerous and small to detect at the level of trait QTLs given the relatively small size of experimental population (n = 156).
SI Materials and Methods
Trait QTL Mapping, eQTL Mapping, and eQTL Classification.
A framework genetic linkage map consisting of 130 DArT markers was previously constructed for this population (18). QTL and eQTL mapping was conducted using QTL Cartographer (61). A walking speed of 1 cM was used in composite interval mapping with forward regression and backward elimination (P ≤ 0.1). Permutation-based likelihood ratio score thresholds were calculated to globally approximate α ≤ 0.05 experiment-wise (62). eQTL that were located further than half the average size of an eQTL from the location of its linked gene, often on a different chromosome, were classified as trans-eQTL. Only trans-eQTL were used in this study.
eQTL Overlap Analysis.
To construct the eQTL network, an all-versus-all eQTL overlap analysis was performed. Pairs of eQTL were classified as having no overlap (when the two eQTL regions did not overlap at all), partial overlap (when both eQTL peaks were not inside the overlapping region), or full overlap (when both eQTL peaks were inside the overlapping region; Fig. S5). A hybrid eQTL overlap score was calculated for each pair of eQTL. The overlap score (OS) is called a “hybrid score” because it uses a decimorgan (dM) scale for cases where there is partial overlap and a morgan (M) scale for cases where there is full overlap. This hybrid scale was implemented because a centimorgan (cM)-based overlap score seemed to place most of the weight on distance between peaks, an M-based overlap score did not differentiate enough between small and large overlap, and a dM-based overlap score (which seemed to be a good compromise) scored cases of full overlap where one QTL is embedded within another for different sized eQTL too low.
Gene–Gene Connectivity Network.
The gene–gene connectivity network consists of prior information and eQTL associations. Prior knowledge on gene–gene interactions (Fig. 1A) was derived from KEGG Arabidopsis thaliana pathways (63) by mapping Arabidopsis thaliana to E. grandis gene identifiers. This interaction network contains 5,288 genes and 158,411 gene interactions. eQTL-derived interactions were obtained as follows: The available eQTL data consisting of a list of gene locus associations were first converted to an eQTL overlap network. For 12,988 genes, in total, 17,930 eQTL were detected. Because each eQTL refers to a genetic location that represents a considerable region of a chromosome, we used the previously described eQTL overlap criterion to assess the degree to which two eQTL overlap using an overlap score ranging from 0 to 1. By truncating the obtained overlap scores (overlap > 0.2), pairs of genes were identified that are likely influenced by the same genetic variability (eQTL locus). These pairs are represented as functional gene–gene interactions in the gene–gene connectivity network (orange links in Fig. 1A).
Gene Connectivity Calculation.
Once the gene–gene connectivity network is constructed, a metric that captures how well a gene node is connected to the other gene nodes in the network was calculated. The assumption is that such a metric will be representative for how similar or relevant a gene is to another gene. Because of their good performance in previous analyses (36), we used the Laplacian exponential diffusion (LED) calculated on graph nodes as metric. It is calculated on the weighted Laplacian matrix L as follows (60):
Here n is the number of entities in the global network, D is the diagonal degree matrix, and A(i,j) represents entry j on row i of the global network’s adjacency matrix A. KLED(i,j) contains, at time t = α, the quantity found in node i when a unit quantity starts diffusing from node j at t = 0. The exp-operator indicates the matrix exponential. All experiments in this work were carried out using 0.001 as the α-parameter. Before further analysis, the kernel values are normalized using
Diffusion of Gene Expression Through the Connectivity Network.
First, the available gene expression data are filtered to contain only genes that are present in the gene–gene connectivity network. In total, 15,097 genes were retained. The appropriate part of the gene–gene connectivity matrix that represents the gene–gene connectivity subnetwork was extracted because it can contain genes that are not present in the gene expression dataset (Fig. 1B). Next, the diagonal of the connectivity matrix was increased by 1, ensuring that the original gene expression signal will be at least partially present in the final diffused data. By multiplying the gene expression matrix with the filtered gene connectivity matrix (Fig. 1C), the network-diffused gene expression was obtained. This diffused value is further referred to as the NBDI-transformed expression value.
In Fig. S6, the effect of network-based expression diffusion is illustrated for a number of genes that are known to participate in the biological processes underlying the traits under study. Whether the diffused expression of a gene correlates highly with the original gene expression depends on the connectivity of the gene in the network and the expression values of the genes found in the network neighborhood of that gene. The less a gene is connected to other genes in the gene–gene connectivity network, the less the raw gene expression will be influenced by the expression values of neighboring genes.
NBDI-Transformed Versus Nontransformed Values.
The difference in prioritization performance between the two approaches for each individual trait reveals that for all traits, except wood density, NBDI-transformed values outperform nontransformed values in terms of enrichment in reference genes among the prioritized genes (Fig. S1), possibly reflecting that wood density is affected by a much larger number of genes than the reference set alone. The benefit of the NBDI transformation is especially clear for the growth (such as DBH, an indicator of wood volume) and sugar-release traits. For wood density, the NBDI-transformed values performed on par with using nontransformed values (for 300 selected genes per trait; Table 1). For some traits (lignin content, total C5 sugar in walls, and total C6 sugar in walls) the prioritized genes are not significantly enriched in genes from our reference set, irrespective of whether nontransformed or NBDI-transformed values were used (Fig. S1 and Table 1). This lower enrichment could be due to the reference set lacking a sufficient number of genes that relate to these traits or, alternatively, might indicate that the observed variation in these traits is caused by factors that cannot directly be related to gene expression variation present in the tested population. The prioritization ranks of the reference genes are also affected by the NBDI transformation (Fig. S2). Genes for which the expression signal clearly associates with a trait will be top ranked irrespective of whether the data are NBDI-transformed or not. However, for most traits the NBDI transformation results in assigning a higher rank to reference genes than when using the nontransformed values, explaining the higher enrichment scores mentioned above.
Due to the high correlations among related biomass and bioprocessing traits, we anticipated that gene selections obtained for the traits would overlap. Statistics that relate to the number and identity of genes prioritized using either transformed or nontransformed expression values show that when 300 genes per trait are selected, 41% of those genes are nonredundant when selected by NBDI vs. 49% when using nontransformed expression values (Fig. S3). This percentage of overlap indicates that the method not only focuses on identifying genes associated with all traits simultaneously but also has the ability to identify trait-specific gene sets.
Linking Expression to Quantitative Traits.
To relate gene expression signals to quantitative trait values, we used the (Pearson) correlation between a gene’s expression (NBDI-transformed or not, depending on the application) and the trait values for each sample. To select the genes that are relevant to a trait, the genes with the highest absolute value of the correlation were selected.
eQTL Peak Detection, Enrichment, and Clustering.
To identify genetic loci that are likely involved in determining trait variation, we first merged all of the gene selections for all traits under study to obtain a combined gene selection. For each gene in this combined selection, for all of the eQTL with which that gene was associated, we collected the genetic positions where the association between genetic variation and gene expression was the strongest. Next, we constructed an eQTL density map by dividing the genome into bins of 5 cM and counting per bin the number of genes that have an eQTL in that bin. To alleviate boundary effects, adjacent bins have a 50% overlap. Finally, we identified the local maxima in the eQTL density profile: a maximum peak is identified if the number of genes in a bin is larger than its neighboring values, and at least three genes are in the bin. To investigate whether the genes in a peak (and by extension, the genetic location of a peak) represent specific biological functions, we performed a per-peak gene ontology (GO) analysis, using the genes present in the gene expression dataset as the reference set.
After processing the combined gene selection, for each trait individually we constructed a similar eQTL density map, this time using only the genes that associate to that particular trait. Next, an identical peak detection operation was performed. Each identified peak was finally associated to the closest peak identified for the combined gene selection, ensuring a uniform peak numbering across traits.
Traits can be clustered using the eQTL counts in the bins by treating the counts as feature vectors for each trait. The trait clustering (Fig. 3) was obtained using hierarchical clustering with the Euclidean distance metric and the single linkage criterion (different linkage criteria gave comparable results).
Constructing Trait-Specific Subnetworks.
To construct a trait-specific subnetwork derived from prior knowledge, we first mapped the genes selected for a trait on the high-quality, curated gene interaction network derived from the Arabidopsis gene–gene interactions (KEGG), i.e., without using the eQTL overlap relations.
High-quality gene prioritizations are expected to cluster together in the interaction network used to diffuse the gene expression, whereas a random gene prioritization is more likely to produce only small clusters of interconnected genes. To assess the probability that a similar large connected component to the one that connects the prioritized genes can be obtained by chance, a P value was assigned to each extracted subnetwork. To this end a background distribution of the size of the largest connected component in the obtained subnetworks was constructed by applying the network construction process described above for 10,000 random gene selections. Afterward, a Poisson distribution was fitted to model this background distribution and to assess the probability that an observed size for the largest connected component of a trait-specific network would be obtained purely by chance.
For visualization purposes, eQTL overlap relations between the mapped genes are added to the obtained subnetwork afterward.
Discussion
The observed variability of woody biomass traits in this study is explained by the variation of combinations of genes or sets of closely interacting pathways influenced by genetic variation segregating in this particular interspecific backcross population. Because of this, linking a quantitative trait to the expression of individual genes might fail or be incomplete if the trait under study is influenced by variation in the expression of large numbers of genes that in turn can be influenced by the expression of other genes, etc. If this is the case, then any method that captures only the marginal effect of a gene on a trait might render only a partial view of the genes that are involved in the underlying biological processes. To cope with this statistical issue we have developed a network-based data integration approach (NBDI) that combines genotyping, expression profiling, and prior network information to prioritize genes and molecular mechanisms associated with measured traits.
This NBDI approach is based on a network model in which connections between genes reflect interactions derived from either prior molecular interaction information or from eQTL information. In the latter case it is assumed that if two genes share an eQTL, they are connected in the network because of a shared coregulation mechanism. Even though incidental overlap of eQTLs is possible, for instance, through the action of separate polymorphisms in tightly linked but unrelated genes, we assumed that the majority of the overlapping trans-eQTLs can be treated as evidence of a shared regulatory polymorphism, as reflected by the shared functional annotations observed for the associated genes. Gene expression signals are then propagated through the network model to obtain an integrated signal that is used to explain the variation in the external traits.
We applied the NBDI approach to study the genomic loci and pathways affecting wood formation in Eucalyptus. The experimental setup used [with high linkage disequilibrium (LD) and large effect QTLs segregating in a single family] is complementary to low-LD studies (with high resolution but typically small effect associations) in populations of unrelated individuals (e.g., refs. 24, 55).
Using our integrative systems genetics approach allowed for prioritizing genes contributing to woody biomass traits and identifying the putative regulatory loci with which these genes and traits are predominantly associated. Based on this analysis, a clear distinction could be made between growth and sugar release (bioprocessing) related traits and traits related to the total cell wall sugar content. Unexpectedly, we noticed little association between the variation of expression of SCW biosynthetic genes (cellulose, xylan, and lignin pathway genes) and the final C5 and C6 sugar and lignin content of the cell wall. Rather, most bona fide lignin biosynthesis genes were associated with growth-related traits (DBH and bark thickness), and most SCW polysaccharide biosynthesis genes were associated with variation in wood density and glucose release efficiency. Several genes highly associated with variation in lignin content code for enzymes involved in carbon metabolism and in mitochrondrial energy metabolism. As a result, we hypothesize that variation in the expression of SCW biosynthetic genes has an effect on the growth and ultrastructure and resultant processability of the secondary cell wall, whereas the quantity of sequestered carbon in the cell wall (in the form of polysaccharides and phenolics) is more related to variation in primary carbon metabolism pathways and hence precursor availability. This assumption further establishes the strong link between physiological/cellular homeostasis and secondary processes such as SCW polysaccharide and lignin biosynthesis that represent a strong, irreversible carbon sink in woody plants.
Materials and Methods
Experimental Population, Transcriptome, and Complex Trait Analysis.
The F2 backcross population was generated from a cross between an E. grandis × E. urophylla F1 interspecific hybrid parent (GUSAP1, Sappi Forest Research, South Africa) and an unrelated E. urophylla parent (USAP1) (18). At 3 y old, immature xylem tissue was harvested from 156 individuals as previously described (56). Samples were collected from 3-y-old trees over a 7.5-h period between 0900 and 1630 hours for 3 d. Total RNA was isolated (57) and used for RNA-Seq expression profiling (30 million; Illumina PE50, BGI Hong Kong). Gene expression values (FPKM) were calculated per gene model using TopHat version 1.3 and Cufflinks version 1.0.3 (bias correction and quartile normalization was enabled for the FPKM calculation) (58, 59). Diameter (cm) at breast height (DBH) of the main stem was assessed as described previously (20). Bark thickness was calculated as the difference between over-bark and under-bark DBH measurements. A wood disk taken at breast height (1.35 m) was used to determine wood basic density using the water displacement method (www.tappi.org/content/SARG/T258.pdf). Chemical wood properties were assessed using different analytical methods, including pyrolysis molecular beam mass spectrometry (pyMBMS).
Trait QTL mapping, eQTL mapping, and eQTL classification are described in SI Materials and Methods.
NBDI Association Analysis.
First, a hybrid gene interaction network was constructed using curated gene interactions downloaded from KEGG and eQTL overlap relations (Fig. 1A). For the latter, we investigated for pairs of genes whether these genes had overlapping eQTL intervals (Fig. S5). If this is the case, a connection in the hybrid network is added. Once the network is constructed, a graph node kernel was calculated (the Laplacian exponential diffusion kernel; 60) to quantify how well each node in the network connects to all other nodes (Fig. 1B). The resulting connectivity matrix was then multiplied with the gene expression matrix to obtain the diffused or transformed gene expression matrix (Fig. 1C). Genes in the network connectivity matrix that were not present in the gene expression matrix were removed and vice versa. The transformed gene expression was finally linked to the measured traits by calculating the absolute value of the Pearson correlation between the transformed gene expression and the measured traits. After ranking, the top 300 genes exhibiting the highest correlation were selected for further analysis. For details of the eQTL overlap procedure, network construction, connectivity calculation, and association analysis, SI Materials and Methods and Figs. S5 and S6.
Supplementary Material
Acknowledgments
This work was supported by the Department of Science and Technology (Strategic Grant for the Eucalyptus Genomics Platform) and National Research Foundation of South Africa (Bioinformatics and Functional Genomics Programme, Grants 86936 and 97911 to A.A.M.), Sappi South Africa and the Technology and Human Resources for Industry Programme (Grant 80118) through the Forest Molecular Genetics Programme at the University of Pretoria (to A.A.M.), Ghent University Multidisciplinary Research Partnership from nucleotides to networks (Project 01MR0410W to Y.V.d.P. and K.M.), the European Union (FP7/2007-2013) under ERC Advanced Grant Agreement 322739–DOUBLEUP (to Y.V.d.P.), the Fonds Wetenschappelijk Onderzoek - Vlaanderen (Projects 3G042813, G.0A53.15N, and SBO-NEMOA to K.M.), and the BioEnergy Science Center, a US Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research on the Department of Energy Office of Science (G.A.T.). Finally, the authors acknowledge Sappi Forest Research for the plant materials and growth and wood property data used in the study.
Footnotes
The authors declare no conflict of interest.
Data deposition: The sequences reported in this paper have been deposited in the NCBI Sequence Read Archive (accession no. SUB2087452).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1620119114/-/DCSupplemental.
References
- 1.Bonan GB. Forests and climate change: Forcings, feedbacks, and the climate benefits of forests. Science. 2008;320(5882):1444–1449. doi: 10.1126/science.1155121. [DOI] [PubMed] [Google Scholar]
- 2.Crowther TW, et al. Mapping tree density at a global scale. Nature. 2015;525(7568):201–205. doi: 10.1038/nature14967. [DOI] [PubMed] [Google Scholar]
- 3.Mizrachi E, Mansfield SD, Myburg AA. Cellulose factories: Advancing bioenergy production from forest trees. New Phytol. 2012;194(1):54–62. doi: 10.1111/j.1469-8137.2011.03971.x. [DOI] [PubMed] [Google Scholar]
- 4.Ragauskas AJ, et al. Lignin valorization: Improving lignin processing in the biorefinery. Science. 2014;344(6185):1246843. doi: 10.1126/science.1246843. [DOI] [PubMed] [Google Scholar]
- 5.Hinchee M, et al. Short-rotation woody crops for bioenergy and biofuels applications. In Vitro Cell Dev Biol Plant. 2009;45(6):619–629. doi: 10.1007/s11627-009-9235-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sederoff R, Myburg A, Kirst M. Genomics, domestication, and evolution of forest trees. Cold Spring Harb Symp Quant Biol. 2009;74:303–317. doi: 10.1101/sqb.2009.74.040. [DOI] [PubMed] [Google Scholar]
- 7.Séguin A. How could forest trees play an important role as feedstock for bioenergy production? Curr Opin Environ Sustain. 2011;3(1-2):90–94. [Google Scholar]
- 8.Myburg AA, et al. The genome of Eucalyptus grandis. Nature. 2014;510(7505):356–362. doi: 10.1038/nature13308. [DOI] [PubMed] [Google Scholar]
- 9.Mansfield SD. Solutions for dissolution--engineering cell walls for deconstruction. Curr Opin Biotechnol. 2009;20(3):286–294. doi: 10.1016/j.copbio.2009.05.001. [DOI] [PubMed] [Google Scholar]
- 10.Hussey SG, Mizrachi E, Creux NM, Myburg AA. Navigating the transcriptional roadmap regulating plant secondary cell wall deposition. Front Plant Sci. 2013;4:325. doi: 10.3389/fpls.2013.00325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mizrachi E, Myburg AA. Systems genetics of wood formation. Curr Opin Plant Biol. 2016;30:94–100. doi: 10.1016/j.pbi.2016.02.007. [DOI] [PubMed] [Google Scholar]
- 12.Vanholme R, et al. A systems biology view of responses to lignin biosynthesis perturbations in Arabidopsis. Plant Cell. 2012;24(9):3506–3529. doi: 10.1105/tpc.112.102574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ingvarsson PK, Street NR. Association genetics of complex traits in plants. New Phytol. 2011;189(4):909–922. doi: 10.1111/j.1469-8137.2010.03593.x. [DOI] [PubMed] [Google Scholar]
- 14.Muchero W, et al. High-resolution genetic mapping of allelic variants associated with cell wall chemistry in Populus. BMC Genomics. 2015;16(1):24. doi: 10.1186/s12864-015-1215-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Geraldes A, et al. A 34K SNP genotyping array for Populus trichocarpa: Design, application to the study of natural populations and transferability to other Populus species. Mol Ecol Resour. 2013;13(2):306–323. doi: 10.1111/1755-0998.12056. [DOI] [PubMed] [Google Scholar]
- 16.Bartholomé J, et al. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly. New Phytol. 2015;206(4):1283–1296. doi: 10.1111/nph.13150. [DOI] [PubMed] [Google Scholar]
- 17.Hudson CJ, et al. A reference linkage map for Eucalyptus. BMC Genomics. 2012;13(1):240. doi: 10.1186/1471-2164-13-240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kullan ARK, et al. High-density genetic linkage maps with over 2,400 sequence-anchored DarT markers for genetic dissection in an F2 pseudo-backcross of Eucalyptus grandis × E. urophylla. Tree Genet Genomes. 2011;8(1):163–175. [Google Scholar]
- 19.Silva-Junior OB, Faria DA, Grattapaglia D. A flexible multi-species genome-wide 60K SNP chip developed from pooled resequencing of 240 Eucalyptus tree genomes across 12 species. New Phytol. 2015;206(4):1527–1540. doi: 10.1111/nph.13322. [DOI] [PubMed] [Google Scholar]
- 20.Kullan AR, et al. Genetic dissection of growth, wood basic density and gene expression in interspecific backcrosses of Eucalyptus grandis and E. urophylla. BMC Genet. 2012;13:60. doi: 10.1186/1471-2156-13-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Thumma BR, et al. Quantitative trait locus (QTL) analysis of wood quality traits in Eucalyptus nitens. Tree Genet Genomes. 2010;6(2):305–317. [Google Scholar]
- 22.McKown AD, et al. Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytol. 2014;203(2):535–553. doi: 10.1111/nph.12815. [DOI] [PubMed] [Google Scholar]
- 23.Evans LM, et al. Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet. 2014;46(10):1089–1096. doi: 10.1038/ng.3075. [DOI] [PubMed] [Google Scholar]
- 24.Porth I, et al. Network analysis reveals the relationship among wood properties, gene expression levels and genotypes of natural Populus trichocarpa accessions. New Phytol. 2013;200(3):727–742. doi: 10.1111/nph.12419. [DOI] [PubMed] [Google Scholar]
- 25.Wegrzyn JL, et al. Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem. New Phytol. 2010;188(2):515–532. doi: 10.1111/j.1469-8137.2010.03415.x. [DOI] [PubMed] [Google Scholar]
- 26.Du Q, et al. Genetic architecture of growth traits in Populus revealed by integrated quantitative trait locus (QTL) analysis and association studies. New Phytol. 2016;209(3):1067–1082. doi: 10.1111/nph.13695. [DOI] [PubMed] [Google Scholar]
- 27.Thavamanikumar S, Southerton S, Thumma B. RNA-Seq using two populations reveals genes and alleles controlling wood traits and growth in Eucalyptus nitens. PLoS One. 2014;9(6):e101104. doi: 10.1371/journal.pone.0101104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Jansen RC, Nap JP. Genetical genomics: The added value from segregation. Trends Genet. 2001;17(7):388–391. doi: 10.1016/s0168-9525(01)02310-1. [DOI] [PubMed] [Google Scholar]
- 29.Feltus FA. Systems genetics: A paradigm to improve discovery of candidate genes and mechanisms underlying complex traits. Plant Sci. 2014;223:45–48. doi: 10.1016/j.plantsci.2014.03.003. [DOI] [PubMed] [Google Scholar]
- 30.Baute J, et al. Combined large-scale phenotyping and transcriptomics in maize reveals a robust growth regulatory network. Plant Physiol. 2016;170(3):1848–1867. doi: 10.1104/pp.15.01883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Proost S, Mutwil M. Tools of the trade: Studying molecular networks in plants. Curr Opin Plant Biol. 2016;30:143–150. doi: 10.1016/j.pbi.2016.02.010. [DOI] [PubMed] [Google Scholar]
- 32.De Maeyer D, Weytjens B, De Raedt L, Marchal K. Network-based analysis of eQTL data to prioritize driver mutations. Genome Biol Evol. 2016;8(3):481–494. doi: 10.1093/gbe/evw010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Verbeke LP, et al. Pathway relevance ranking for tumor samples through network-based data integration. PLoS One. 2015;10(7):e0133503. doi: 10.1371/journal.pone.0133503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ding L, Wendl MC, McMichael JF, Raphael BJ. Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet. 2014;15(8):556–570. doi: 10.1038/nrg3767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Shi K, Gao L, Wang B. Discovering potential cancer driver genes by an integrated network-based approach. Mol Biosyst. 2016;12(9):2921–2931. doi: 10.1039/c6mb00274a. [DOI] [PubMed] [Google Scholar]
- 36.Verbeke LP, Cloots L, Demeester P, Fostier J, Marchal K. EPSILON: An eQTL prioritization framework using similarity measures derived from local networks. Bioinformatics. 2013;29(10):1308–1316. doi: 10.1093/bioinformatics/btt142. [DOI] [PubMed] [Google Scholar]
- 37.Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: Principles, methods and perspectives. Hum Genet. 2014;133(2):125–138. doi: 10.1007/s00439-013-1377-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Carocha V, et al. Genome-wide analysis of the lignin toolbox of Eucalyptus grandis. New Phytol. 2015;206(4):1297–1313. doi: 10.1111/nph.13313. [DOI] [PubMed] [Google Scholar]
- 39.Hoffmann L, et al. Silencing of hydroxycinnamoyl-coenzyme A shikimate/quinate hydroxycinnamoyltransferase affects phenylpropanoid biosynthesis. Plant Cell. 2004;16(6):1446–1465. doi: 10.1105/tpc.020297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Fatichi S, Leuzinger S, Körner C. Moving beyond photosynthesis: From carbon source to sink-driven vegetation modeling. New Phytol. 2014;201(4):1086–1095. doi: 10.1111/nph.12614. [DOI] [PubMed] [Google Scholar]
- 41.Peña MJ, et al. Arabidopsis irregular xylem8 and irregular xylem9: Implications for the complexity of glucuronoxylan biosynthesis. Plant Cell. 2007;19(2):549–563. doi: 10.1105/tpc.106.049320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Biswal AK, et al. Downregulation of GAUT12 in Populus deltoides by RNA silencing results in reduced recalcitrance, increased growth and reduced xylan and pectin in a woody biofuel feedstock. Biotechnol Biofuels. 2015;8:41. doi: 10.1186/s13068-015-0218-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Spokevicius AV, et al. β-tubulin affects cellulose microfibril orientation in plant secondary fibre cell walls. Plant J. 2007;51(4):717–726. doi: 10.1111/j.1365-313X.2007.03176.x. [DOI] [PubMed] [Google Scholar]
- 44.Qiu D, et al. Gene expression in Eucalyptus branch wood with marked variation in cellulose microfibril orientation and lacking G-layers. New Phytol. 2008;179(1):94–103. doi: 10.1111/j.1469-8137.2008.02439.x. [DOI] [PubMed] [Google Scholar]
- 45.Ko J-H, Yang SH, Park AH, Lerouxel O, Han K-H. ANAC012, a member of the plant-specific NAC transcription factor family, negatively regulates xylary fiber development in Arabidopsis thaliana. Plant J. 2007;50(6):1035–1048. doi: 10.1111/j.1365-313X.2007.03109.x. [DOI] [PubMed] [Google Scholar]
- 46.Hussey SG, et al. SND2, a NAC transcription factor gene, regulates genes involved in secondary cell wall development in Arabidopsis fibres and increases fibre cell area in Eucalyptus. BMC Plant Biol. 2011;11:173. doi: 10.1186/1471-2229-11-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Mizrachi E, et al. Investigating the molecular underpinnings underlying morphology and changes in carbon partitioning during tension wood formation in Eucalyptus. New Phytol. 2015;206(4):1351–1363. doi: 10.1111/nph.13152. [DOI] [PubMed] [Google Scholar]
- 48.Evans R, Ilic J. Rapid prediction of wood stiffness from microfibril angle and density. Forest Prod J. 2001;51(3):53–57. [Google Scholar]
- 49.Mansfield SD, et al. Revisiting the transition between juvenile and mature wood: A comparison of fibre length, microfibril angle and relative wood density in lodgepole pine. Holzforschung. 2009;63(4):449–456. [Google Scholar]
- 50.Shen B, Li C, Tarczynski MC. High free-methionine and decreased lignin content result from a mutation in the Arabidopsis S-adenosyl-L-methionine synthetase 3 gene. Plant J. 2002;29(3):371–380. doi: 10.1046/j.1365-313x.2002.01221.x. [DOI] [PubMed] [Google Scholar]
- 51.Li X, Weng J-K, Chapple C. Improvement of biomass through lignin modification. Plant J. 2008;54(4):569–581. doi: 10.1111/j.1365-313X.2008.03457.x. [DOI] [PubMed] [Google Scholar]
- 52.Mao D, et al. FERONIA receptor kinase interacts with S-adenosylmethionine synthetase and suppresses S-adenosylmethionine production and ethylene biosynthesis in Arabidopsis. Plant Cell Environ. 2015;38(12):2566–2574. doi: 10.1111/pce.12570. [DOI] [PubMed] [Google Scholar]
- 53.Rennie EA, Scheller HV. Xylan biosynthesis. Curr Opin Biotechnol. 2014;26:100–107. doi: 10.1016/j.copbio.2013.11.013. [DOI] [PubMed] [Google Scholar]
- 54.Busse-Wicher M, Grantham NJ, Lyczakowski JJ, Nikolovski N, Dupree P. Xylan decoration patterns and the plant secondary cell wall molecular architecture. Biochem Soc Trans. 2016;44(1):74–78. doi: 10.1042/BST20150183. [DOI] [PubMed] [Google Scholar]
- 55.Porth I, et al. Genome-wide association mapping for wood characteristics in Populus identifies an array of candidate single nucleotide polymorphisms. New Phytol. 2013;200(3):710–726. doi: 10.1111/nph.12422. [DOI] [PubMed] [Google Scholar]
- 56.Ranik M, Creux NM, Myburg AA. Within-tree transcriptome profiling in wood-forming tissues of a fast-growing Eucalyptus tree. Tree Physiol. 2006;26(3):365–375. doi: 10.1093/treephys/26.3.365. [DOI] [PubMed] [Google Scholar]
- 57.Chang S, Puryear J, Cairney J. A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Report. 1993;11(2):113–116. [Google Scholar]
- 58.Kim D, et al. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Fouss F, Francoisse K, Yen L, Pirotte A, Saerens M. An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw. 2012;31:53–72. doi: 10.1016/j.neunet.2012.03.001. [DOI] [PubMed] [Google Scholar]
- 61.Basten CJ, Weir BS, Zeng Z-B. QTL Cartographer, version 1.17. Department of Statistics, North Carolina State University; Raleigh, NC: 2004. p. 188. [Google Scholar]
- 62.Doerge RW, Churchill GA. Permutation tests for multiple loci affecting a quantitative character. Genetics. 1996;142(1):285–294. doi: 10.1093/genetics/142.1.285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Okuda S, et al. KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res. 2008;36(Web Server issue):W423-6. doi: 10.1093/nar/gkn282. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.