Abstract
A significant challenge in our understanding of biological systems is the high number of genes with unknown function in many genomes. The fungal genus Aspergillus contains important pathogens of humans, model organisms, and microbial cell factories. Aspergillus niger is used to produce organic acids, proteins, and is a promising source of new bioactive secondary metabolites. Out of the 14,165 open reading frames predicted in the A. niger genome only 2% have been experimentally verified and over 6,000 are hypothetical. Here, we show that gene co-expression network analysis can be used to overcome this limitation. A meta-analysis of 155 transcriptomics experiments generated co-expression networks for 9,579 genes (∼65%) of the A. niger genome. By populating this dataset with over 1,200 gene functional experiments from the genus Aspergillus and performing gene ontology enrichment, we could infer biological processes for 9,263 of A. niger genes, including 2,970 hypothetical genes. Experimental validation of selected co-expression sub-networks uncovered four transcription factors involved in secondary metabolite synthesis, which were used to activate production of multiple natural products. This study constitutes a significant step towards systems-level understanding of A. niger, and the datasets can be used to fuel discoveries of model systems, fungal pathogens, and biotechnology.
INTRODUCTION
The genus Aspergillus (phylum Ascomycota) is comprised of nearly 350 species of saprophytic and ubiquitous fungi, and includes important pathogens of humans (Aspergillus fumigatus), model organisms (Aspergillus nidulans) and microbial cell factories (Aspergillus oryzae, Aspergillus niger). Aspergillus niger has been exploited for over a century by biotechnologists for the production of organic acids, proteins and enzymes (1). It is the major worldwide producer of citric acid with an estimated value of $2.6 billion in 2014, which is predicted to rise to $3.6 billion by 2020 (2). As a prolific secretor of proteins, A. niger is used to produce various enzymes at a bulk scale (1). The first A. niger genome was sequenced in 2007, which contained an estimated 14,165 coding genes ∼6,000 of which were hypothetical (3). More recent sequencing of additional A. niger strains and other genomes from the genus Aspergillus (4–9), combined with refinement of online genome analyses portals (10–12), and comparative genomic studies amongst the Aspergilli (5,13–15), have not significantly increased the percentage of A. niger genes that have functional predictions. While the exact number of ‘hypothetical’ genes varies between databases and A. niger genomes, recent estimates suggest that between 40 and 50% of the genes still remain hypothetical (1,16). Furthermore, only 2% of its genes (n = 247) have a verified function in the Aspergillus Genome Database (AspGD (17)). Even for the gold-standard model organism Saccharomyces cerevisiae, 21% of its predicted genes have dubious functional predictions (18), despite its high genetic tractability and a research community with >1,800 research labs worldwide. Indeed, gene functional predictions for A. nidulans, A. fumigatus, A. oryze and other Aspergilli typically cover 40–50% of the genome (16,17).
Such high frequency of unknown and hypothetical genes severely limits the power of systems-level analyses. One approach to overcome this limitation involves the generation and interrogation of gene expression networks based on transcriptomic datasets (19–21). The hypothesis underlying this approach is that genes which are robustly co-expressed under diverse conditions are likely to function in the same or closely related biological processes or pathways (22). As one example, accurate delineation of fungal secondary metabolite biosynthetic gene clusters has been achieved by robustly defining contiguous gene co-expression during both in vitro (23) and infectious (24) growth, with co-expression analysis pipelines now publicly available to non-coders (25).
In this study, we conducted a meta-analysis of 155 publicly available transcriptomics analyses for A. niger, and used these data to generate a genome-level co-expression network and sub-networks for >9,500 genes. To aid user interpretations of gene biological process, gene sub-networks were analysed for enriched gene ontology (GO) terms, and integrated with information gleaned from 1,200 validated genes from the genus Aspergillus. Interrogation of selected co-expression sub-networks for verified genes and randomly selected hypothetical genes confirmed high quality datasets that enable rapid and facile predictions of biological processes. This co-expression resource has been integrated in the functional genomic database FungiDB (10) for use by the research community.
In order to validate that novel predictions of gene biological function were accurate, we functionally characterized genes which we hypothesized played a role in A. niger natural product synthesis based on co-expression datasets. Experimental validation included generation of null and overexpression mutants of transcription factors present in these sub-networks, controlled bioreactor cultivations, and activation of secondary metabolite gene expression and metabolite biosynthesis. In addition to demonstrating that this study enables novel predictions of A. niger gene function, these data suggest novel mechanisms for activating cryptic secondary metabolism can be used in natural product discovery programs, which is urgent due to the emergence of multi-resistant bacteria and fungi (26,27). As A. niger has been shown to be a superior expression host for medicinal drugs in g/l scale (28), such discoveries have significant translational potential. Taken together, the co-expression resources and experimental validation developed in this study enable high quality gene functional predictions in A. niger.
MATERIALS AND METHODS
Strains and molecular techniques
Aspergillus niger strains used in this study are summarized in Supplementary Table S1. Media compositions, transformation of A. niger, strain purification and fungal chromosomal DNA isolation were described earlier (29). Standard PCR and cloning procedures were used for the generation of all constructs (30) and all cloned fragments were confirmed by DNA sequencing. Correct integrations of constructs in A. niger were verified by Southern analysis (30). In the case of overexpressing TF1, TF2 and HD, the respective open reading frames were cloned into the Tet-on vector pVG2.2 (31) and the resulting plasmids integrated as single or multiple copies at the pyrG locus (for details see Supplementary Table S1). Deletion constructs were made by PCR amplification of the 5′- and 3′-flanks of the respective open reading frames (at least 0.9 kb long). N402 genomic DNA served as template DNA. The histidine selection marker (32) was used for selecting single deletion strains, whereas the pyrG marker was used for the establishment of the strain deleted in both TF1 and TF2. Details on cloning protocols, primers used and Southern blot results can be requested from the authors.
Gene network analysis and quality control
As of 2 March 2016, 283 microarray data (platform: GPL6758) of A. niger covering 155 different cultivation conditions were publically available at the GEO database (33), whose processing and normalization of the arrays have been published (34). In brief, array data in the form of CEL-files (35) were processed using the Affymetrix analysis package (35) (version 1.42.1) from Bioconductor (36) and expression data were calculated for genes under each condition with an MAS5 background correction. Pairwise correlations of gene expression between all A. niger genes were generated by calculating the Spearman's rank correlation coefficient (37) using R. To assess a cut-off indicating biological relevance, Spearman correlations were firstly calculated using a pseudo random data set whereby normalized transcript values for each individual gene were randomized amongst the 283 arrays and 155 experimental conditions. Using this pseudo random data set, the Spearman's rank coefficient was calculated pairwise for all predicted A. niger genes, giving a total of 104,958,315 comparisons/calculated Spearman's rank coefficients, from which 52,476,536 were positively correlated. From these, only two were greater than |0.4| and none above |0.5|. Subsequently |0.5| was taken as a threshold for co-expression. Sub-networks were calculated at an individual gene level using Python. All genes that were co-expressed with individual query ORFs are reported at FungiDB (10) and summarized in Supplementary File 1 using a |≥0.5| and |≥0.7| Spearman cut-off.
To expedite investigation of the sub-networks and their common biological process, gene ontology enrichment (GOE) was implemented using Python version 2.7.13. GO terms and their hierarchical structure were downloaded from AspGD (17). Enriched GO Biological Process terms for all genes residing in a query sub-network were calculated relative to the A. niger genome and statistical significance was defined using the Fishers exact test (P-value < 0.05). For all datasets now available at FungiDB, GO enrichment is conducted using Fisher's exact test and Bonferroni corrections (10). For informant ORFs, experimentally verified A. niger genes were retrieved from AspGD (17). Additionally, A. niger orthologs for any gene with wet lab verification in A. fumigatus, A. nidulans or A. oryzae were identified using the ENSEMBL BLAST tool using default settings (12). Finally, 81 secondary metabolite core enzymes (39,40) were also defined as informant ORFs. We generated informant ORF (‘prioritized ORF’) sub-networks, which report significant co-expression of query genes exclusively with one or more informant ORFs which are reported in Supplementary Table S2.
Reporter gene expression
Protocols for luciferase-based measurement of gene expression in microtiter format based on Tet-on (31) or anafp (34) promoter systems have been published. In case of strain BBA17.6, unable to form spores, the strain was inoculated on complete medium and allowed to grow for 7 days at 30°C. Biomass was harvested using physiological salt solution and used for inoculation. All data shown are derived from biological duplicates each measured in technical quadruplicates if not otherwise indicated. Raw datasets can be requested from the authors.
Bioreactor cultivation
Medium composition and the protocol for glucose-limited batch cultivation of A. niger in 5 l bioreactors have been described (31). In the case of strains overexpressing TF1 (strain MJK10.22) and TF2 (strain MJK11.17), the Tet-on system was induced with a final concentration of 10 μg/ml doxycycline when the culture reached 1 g/kg dry biomass. Samples for transcriptional and metabolome profiling were taken ∼6 h (∼72 h) after induction, i.e. during exponential (post-exponential) growth phase. For control (MJK17.25) and deletion strains (MJK14.7, MJK16.5, MJK18.1), doxycycline was added twice; once after the culture reached 1 g/kg dry biomass and 24 h before samples were taken from post-exponential growth phase for transcriptomics and metabolomics analyses. To avoid modifications in gene expression due to degradation of doxycycline in growth media, 10 μg/ml doxycycline was added every 10–12 h (five times in total) during growth of conditional expression mutants. Growth and physiology profiles for all seven strains cultivated in biological duplicates are summarized in Supplementary File 4.
Transcriptional profiling
Total RNA extraction, RNA quality control, and RNA sequencing were performed at GenomeScan (Leiden, the Netherlands). Quality analysis of raw data was done as previously described (41). In brief, ∼13 million reads of 150 bp were obtained from paired-end mode for each sample. Read data were trimmed and quality controlled with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). STAR (42) was used to map the reads to the A. niger CBS 513.88 genome (http://fungi.ensembl.org/). On average, the unique alignment rate was ∼95%. Data normalization was performed with DEseq2 (43). Differential gene expression was evaluated with Wald test with a threshold of the Benjamini and Hochberg False Discovery Rate (FDR) of 0.05 (44) with DEseq2. Raw and processed data are summarized in Supplementary Table S3 and have been deposited at the GEO database (33) under the accession number GSE119311.
Metabolome profiling
Metabolites were extracted from biomass corresponding to 2.5 mg biomass dry weight by Metabolomic Discoveries GmbH (Potsdam, Germany) and identified based on Metabolomic Discoveries' database entries of authentic standards. Liquid chromatography (LC) separation was performed using hydrophilic interaction chromatography with a iHILIC®-Fusion, 150 × 2.1 mm, 5 μm, 200 Å 5 μm, 200 A column (HILICON, Umeå Sweden), operated by an Agilent 1290 UPLC system (Agilent, Santa Clara, USA). The LC mobile phase was (i) 10 mM ammonium acetate (Sigma-Aldrich, USA) in water (Thermo, USA) with 5% and 95% acetonitrile (Thermo, USA) (pH 6) and (ii) acetonitrile with 5% 10 mM ammonium acetate in 95% water. The LC mobile phase was a linear gradient from 95% to 65% acetonitrile over 8.5 min, followed by linear gradient from 65% to 5% acetonitrile over 1 min, 2.5 min wash with 5% and 3 min re-equilibration with 95% acetonitrile. The flow rate was 400 μl/min and injection volume was 1 μl. Mass spectrometry was performed using a high-resolution 6540 QTOF/MS Detector (Agilent, Santa Clara, USA) with a mass accuracy of <2 ppm. Spectra were recorded in a mass range from 50 m/z to 1700 m/z at 2 GHz in extended dynamic range in both positive and negative ionization mode. The measured metabolite concentrations were normalized to the internal standard. Significant concentration changes of metabolites in different samples were analyzed by appropriate statistical test procedures (ANOVA, paired t-test) using R. When the adjusted P value based on Benjamini and Hochberg FDR (44) was lower than 0.05 and the fold change (log2) higher than ±1, expression of the metabolites was considered as significantly different.
RESULTS
The transcriptomic landscape of A. niger inferred from a gene expression meta-analysis
We normalized and interrogated gene expression across 155 published transcriptomic analyses for the A. niger laboratory wildtype strain N402 (ATCC 64974) and its descendants, comprising of 283 Affymetrix microarray experiments in total (34). Experimental parameters include a diverse range of cultivation conditions (agar plate, bioreactor, shake flask), developmental and morphological stages (germination, mycelial growth, sporulation), deletion and disruption mutants, stress conditions (antifungals, secretion stress, pH), different carbon and nitrogen sources, starvation, and co-cultivation with bacteria. These experimental conditions represent diverse niches inhabited by A. niger as well as industrial cultivation conditions, in addition to (a)biotic and genetic perturbations that result in global changes in gene expression.
In order to demonstrate that accurate values of transcript abundance were derived from this meta-analysis, we plotted average gene expression values for each gene throughout the 155 conditions as a function of chromosomal locus (Figure 1A). From these data, we categorized low, medium, and highly expressed loci. Subsequently, we generated a DNA cassette expressing a luciferase reporter gene under control of the inducible Tet-on promoter (31) and targeted it to the 5′-upstream region of two low and one high expression locus (Figure 1B). The pyrG locus present on chromosome III, routinely used for gene-targeted integration in A. niger, served as locus control for Tet-on driven medium expression of luciferase (31). Luciferase levels measured at these loci were confirmed to be low, medium and high in relative terms in microtiter cultivations of the different A. niger strains (Figure 1C) and in controlled batch cultivation at bioreactor scale (Figure 1D). These data demonstrate that low, medium, or high expression at these loci are applicable for both high-throughput assays (microtiter) and more labour and resource intensive bioreactors cultivations. We conclude that the microarray data accurately reflect A. niger gene expression values. Note that this transcriptomic landscape is a significant addition to the A. niger molecular toolkit, as it facilitates rational control of gene dosage (time of induction and absolute expression level) by targeted locus-specific integration of a gene of interest.
Construction of high quality co-expression networks for A. niger
Experimentally validated gene expression data from the A. niger transcriptional meta-analysis was utilized to generate a gene co-expression network based on Spearman's rank correlation coefficient (37). In order to define a minimum Spearman's rank correlation coefficient (ρ) for which we could be confident in extracting biologically meaningful co-expression, we conducted a preliminary quality control experiment, where transcript values for each individual gene were randomized amongst the 155 experimental conditions. This gave a dataset with identically distributed but randomized expression patterns. Next, we calculated every possible transcriptional correlation between genes on the A. niger genome, resulting in over 100 million ρ-values. This identified 52 million positive and 48 million negative correlations (Figure 2). From this dataset, only two ρ-values were above |0.4|, and none were above |0.5|. Consequently, we took ρ ≥ |0.5| as a minimum cut-off for biologically meaningful co-expression relationships. Calculations of Spearman correlations using the non-randomized microarray data resulted in over 4.5 million correlations which passed the minimum ρ ≥ |0.5| cut-off. From these datasets, co-expression sub-networks for every gene in the global network were generated for both positively and negatively correlated genes (Figure 2, Supplementary File 1). We classified them into two groups: ‘stringent’ (ρ ≥ |0.5|, 9,579 gene networks) and ‘highly stringent’ (ρ ≥ |0.7|, 6,305 gene networks) and calculated enriched GO terms for each gene sub-network relative to the A. niger genome.
Integration of co-expression networks with community-wide experimental evidence of gene function
The Aspergillus community has functionally characterized over a thousand genes in different species of the genus Aspergillus, which we reasoned can be used to aid a priori predictions of hypothetical genes or not yet verified genes in A. niger. In order to integrate such experimental data with the co-expression network, we mined the Aspergillus genome database AspGD (17) to generate a near-complete list of ORFs that have been functionally characterized in Aspergilli. All experimentally validated ORFs for A. niger (n = 247), A. nidulans (n = 639), A. fumigatus (n = 218) and A. oryzae (n = 81) were included in this dataset. Given the strong potential of A. niger as a platform for discovery and production of new bioactive molecules, we also included 81 putative polyketide synthase (PKS) or nonribosomal peptide synthetase (NRPS) encoding genes of A. niger that reside in 78 predicted secondary metabolite clusters (39,40), giving in total 1,266 prioritized ORFs. For every gene in the A. niger genome, we calculated co-expression interactions specifically with these 1,266 rationally prioritized ORFs. A total of 9,263 (ρ ≥ |0.5|) and 5,178 (ρ ≥ |0.7|) candidate genes had one or more correlations with prioritized ORFs (Supplementary Table S2). These datasets thus constitute the most comprehensive co-expression resource for a filamentous fungus and are accessible at FungiDB (10).
Co-expression resources enable facile predictions of gene biological function
In order to test whether biologically meaningful interpretations of gene and network function can be extracted from these resources, we interrogated both stringent and highly stringent datasets for genes where the biological processes, molecular function, and subcellular localization of encoded proteins have been studied in fungi and which represent the broad range of utilities and challenges posed by fungi. From the perspective of industrial biotechnology, we interrogated networks for the gene encoding the ATPase BipA, which is required for high secretion yield of industrially useful enzymes by acting as chaperone to mediate protein folding in the endoplasmic reticulum (45). With regards to potential drug target discovery, we analyzed gene expression networks for Erg11 (Cyp51), which is the molecular target for azoles (46). For assessment of virulence in both plant and human infecting fungi, we interrogated networks for the NRPS SidD, which is necessary for the biosynthesis of the siderophore triacetyl fusarinine C, and ultimately iron acquisition during infection (47). Assessment of all control sub-networks at GO and individual gene-level revealed striking co-expression of genes encoding proteins involved in respective metabolic pathways, associated biological processes, subcellular organelles, protein complexes, known regulatory transcription factors/GTPases/chaperones, and cognate transporters, amongst others (Figure 3). The lowest Spearman correlation coefficient of |0.5| clearly results in biologically meaningful gene co-expression as exemplified by the delineation of diverse yet related processes, including: (i) orchestration of retrograde/anterograde vesicle trafficking via COPI/COPII/secretion associated proteins (BipA) (48); (ii) coordination of ergosterol biosynthesis by sterol regulatory binding element regulators SrbA/SrbB and association of this pathway with respiration at the mitochondrial membrane (Erg11) (49) and (iii) and the linking of respective ergosterol and ornithine primary and secondary metabolic pathways during siderophore biosynthesis via the interdependent metabolite mevalonate (SidD) (50). With regards to co-expressed genes as a function of chromosomal location, a common feature of filamentous fungal genomes is that genes necessary for the biosynthesis of secondary metabolite products occur in physically linked contiguous clusters. SidD resides in a six-gene cluster with SidJ, SidF, SidH, SitT and MirD, all of which were represented in the high stringency network that contained a total of only 13 genes (Figure 3).
Based on enriched GO terms from gene sub-networks and co-expression with experimentally verified ORFs, we could further rapidly infer biological processes for a total of 2,970 (ρ ≥ |0.5|) and 1,016 (ρ ≥ |0.7|) hypothetical genes that were positively and/or negatively associated using this analysis as exemplarily shown for eight hypothetical genes in Supplementary Table S4. Additionally, we interrogated entire families of functionally related genes that have been well characterized in the Aspergilli, including phosphatases, chromatin remodelers, and transcription factors and were able to assign novel biological processes for all these predicted genes as exemplarily shown for nine genes in Supplementary Table S5.
Additionally, in order to confirm that hypotheses regarding gene function and co-expression networks were independent of the array platform and/or strain utilised, we compared data derived from the microarray used in our study (specifically the Affymetrix technology for A. niger isolate CBS 513.88), with a separate tri-species platform for strain ATCC 1015 (51). This latter microarray has been used to concomitantly compare batch cultivations of A. niger, A. oryzae and A. nidulans on glucose and xylose media, which identified 23 genes to be a conserved response to xylose utilisation, including the xylose transcriptional regulator XlnR (An15g05810, (51)). Despite differences in strain, microarray platform, and experimental design, interrogation of the XlnR sub-network from our study revealed strong concordance with the conserved xylose response genes reported by Andersen et al. (51), including those encoding an l-arabitol dehydrogenase (An01g10920), aldose 1-epimerase (An02g09090), glycoside hydrolase (An12g01850), xylitol dehydrogenase (An12g00030), sugar transporter (An03g01620) and short-chain dehydrogenase (An04g03530). We therefore conclude that sub-networks generated in our study can be used to define A. niger co-expression relationships and infer gene function across strain backgrounds.
Taken together, these quality control experiments strongly suggest that the co-expression resources developed in this study can be used for high confidence hypothesis generation at a variety of conceptual levels, including biological process, metabolic pathway, protein complexes, and individual genes.
Co-expression resources accurately predict transcription factors of the ribosomally synthesized natural product AnAFP of A. niger
In order to provide experimental confirmation in predictions of biological processes gleaned from this co-expression resource, we interrogated all datasets associated with the gene encoding the A. niger antifungal peptide AnAFP. Ribosomally synthesized antifungal peptides of the AFP family are promising molecules for use in medical or agricultural applications to combat human- and plant-pathogenic fungi (52). We and others could show that expression of their cognate genes are under tight temporal and spatial regulation in their native hosts and precedes asexual sporulation (34,53–55).
The gene encoding AnAFP (An07g01320), is co-expressed with 986 genes (ρ ≥ |0.5|; 605 positively correlated / 381 negatively correlated (34)). GO enrichment analyses of positively correlated sub-networks uncovered that anafp gene expression parallels with fungal secondary metabolism, carbon limitation and autophagy (34). In total, 23 predicted transcription factors are co-expressed at a stringent level among which were the transcription factors VelC (An04g07320) and StuA (An05g00480; Figure 4), both of which are key regulators of asexual development and secondary metabolism in Aspergilli, whereby VelC is known as an activator, and StuA is both activator or repressor (56–58). In order to confirm a regulatory function of these transcription factors on anafp expression, we used a reporter strain in which the anafp ORF has been replaced with a luciferase gene. Deletion of stuA or velC in this background revealed a strong increase or decreased/delayed activation of the anafp promoter, respectively (Figure 4). Interestingly, the transcription factor binding site for VelC is unknown, whereas the binding sequence for StuA is absent from the predicted promoter region of anafp. These data thus indicate that the resources generated in this study enable accurate predictions of (in)direct regulatory proteins even in the absence of DNA binding sites.
Co-expression resources accurately predict transcription factors of non-ribosomally synthesized natural products of A. niger
The transcriptional activation of secondary metabolite (SM) gene clusters in different filamentous fungi is one current focus of the fungal research community (16) as >60% of currently approved clinical drugs are derived from natural products (59). A. niger stands out due to its exceptional high number of predicted SM gene clusters in its genome (n = 78), harboring 81 core enzymes in total, such as NRPS and PKS (39). However, only a dozen of SMs have been identified from A. niger so far (60). Our survey of the expression data of all gene clusters under the 155 cultivation conditions uncovered that the majority of SM core genes (53) are expressed in at least one condition (Supplementary File 2). The majority of expressed core genes are also co-expressed with their cluster members (Supplementary File 3). Notably, not all cluster members are co-expressed with contiguous transcription factors. Indeed, only ∼30% of the gene clusters display co-expression with contiguous transcription factors (Supplementary File 3). We thus questioned which transcription factors are regulating these SM gene clusters, and used the co-expression dataset to assign biological processes to genes predicted to encode transcription factors. Given the important role of chromatin remodelers in activation and silencing of secondary metabolite clusters, we also interrogated genes predicted to encode histone deacetylases (61). This identified two ORFs encoding putative transcription factors: An07g07370 (TF1) and An12g07690 (TF2), and a histone deacetylase (An09g06520, HD) that are positively and negatively (TF1, TF2) or only negatively (HD) co-expressed with numerous core SM genes (Supplementary Table S5). Notably, all three genes do not reside in contiguous SM gene clusters but belong to a large SM sub-network consisting of 152 genes including 26 SM core genes (Supplementary Table S6), whereby gene expression of TF1 and TF2 correlate very strongly (ρ = 0.87). Interrogation of enriched GO terms for both TF1 and TF2 gene sub-networks revealed enrichment of fatty acid metabolism, autophagy, mitochondria degradation (positively correlated) and maturation of rRNA and tRNA, ribosomal assembly and amino acid metabolism (negatively correlated). This analysis thus allowed us to select genes for in vivo functional studies based on a non-intuitive selection procedure. Neither TF1, TF2 nor HD have been experimentally characterized in fungi so far. In order to confirm a regulatory function of these putative regulators on SM core gene expression, we generated (i) single deletion strains for TF1, TF2 and HD, respectively, (ii) a double deletion strain for TF1 and TF2, and (iii) individual conditional overexpression mutants for TF1, TF2 and HD using the Tet-on system (31). The strongest effect on the metabolome profile of A. niger was observed during overexpression of TF1 and TF2 (Figure 5). SMs up-regulated under these conditions included, but were not limited to aurasperones, citreoviridin D, terrein, aspernigrin A, nigerazine A and B, pyranonigrin A and D, flavasperone, fonsecin, O-demethylfonsecin, flaviolin, funalenone; among the SMs down-regulated under these conditions were asperpyrone, L-agaridoxin and nummularine F (Figure 5, Supplementary File 4, Supplementary Table S7). These physiological changes were paralleled by up/down-regulation of thousands of genes as determined by RNA-Seq analyses (Supplementary Table S3), whereby controlled overexpression of TF1 (TF2) modulated expression of 45 (43) SM core genes especially during post-exponential growth phase of A. niger (Figure 5 and Supplementary File 4). This strongly suggests that both transcription factors are likely global regulators modulating gene expression dynamics during late growth stages of A. niger either directly or indirectly.
DISCUSSION
In this study, we performed a transcriptomic meta-analysis to generate a high-quality gene co-expression network, and used this to predict biological processes for 9,579 (∼65%) of all A. niger genes including 2,970 hypothetical genes (ρ ≥ |0.5|). The compendium of resources developed in this work consists of (i) gene-specific sub-networks with two stringent Spearman cut-offs that ensure high confidence in biologically meaningful interpretations; (ii) statistically enriched GO terms for each co-expression network, and (iii) a refined list of co-expression relationships which incorporate over 1,200 experimentally characterized ORFs to aid predictions of gene biological process based on experimental evidence.
In order to demonstrate the utility of these resources, we firstly interrogated these datasets at a gene-level, demonstrating that transcription factors VelC and StuA, which are critical components of ascomycete development and secondary metabolism, also regulate expression of the A. niger antifungal peptide AnAFP. Our data provide further evidence of the coupling between development, biosynthesis of secondary metabolites, and secreted antifungal peptides (34). The datasets generated in this study can also be used to identify global regulators at the level of biological processes as demonstrated for fungal secondary metabolism and the two transcription factors TF1 and TF2 (which we name MjkA and MjkB, respectively). Previously, co-expression of contiguous genes has been used to determine the boundaries of secondary metabolite clusters in A. nidulans and other fungi (23–25). Our study can be viewed as complementary to such analyses, as MjkA and MjkB are physically located outside the boundaries of any putative secondary metabolite cluster, and as such would not be identified using previous approaches (23,25). Generation of loss- and gain-of-function mutants demonstrated that they likely (in)directly regulate dozens of secondary metabolite loci at the transcript and metabolite level. Interestingly, MjkA is a Myb-like transcription factor highly conserved in Aspergilli with orthologues present in several plant genomes. Myb transcription factors have recently been demonstrated to regulate plant natural product biosynthesis (62), and our co-expression data and wet lab experiments suggest that titratable control of MjkA is a promising strategy for the activation of ascomycete secondary metabolism during drug discovery programs. With regards to the application of our co-expression approach to predict gene biological processes in other fungi, interrogation of the GEO database (33) demonstrates that several hundred global gene expression experiments are available for industrial cell factories (e.g. A. oryzae, Trichoderma reesei) and human or plant infecting fungi (e.g. A. fumigatus, Cryptococcus neoformans, Candida albicans, Magnaporthe oryzae), indicating that our approach can be broadly applied for industrial, medically, and agriculturally relevant fungi. As the financial costs for gene expression profiling continues to decline, this study paves the way for prediction of gene biological function using co-expression network analyses throughout the fungal kingdom.
DATA AVAILABILITY
All datasets generated and/or analyzed during this study are available at FungiDB (http://fungidb.org/fungidb/). Spearman's correlation coefficients for gene co-expression sub-networks will also be made available by the corresponding author upon request. RNA seq data have been deposited at the Gene expression Omnibus under accession number GSE119311.
Supplementary Material
ACKNOWLEDGEMENTS
The authors wish to acknowledge all members of the FungiDB project for providing the bioinformatics infrastructure to integrate the omic's datasets of this study. Carina Feldle is acknowledged for her assistance during bioreactor cultivations.
Authors' contributions: P.S. performed the transcriptome meta-analyses, constructed the co-expression network and executed in silico quality analyses. M.J.K. analyzed sub-networks related to secondary metabolism. M.J.K., S.J. and N.P. generated deletion and overexpression strains and characterized them. M.J.K. and T.S. performed bioreactor cultivations. M.J.K. analyzed transcriptome and metabolome data derived from deletion and overexpression strains. B.B. and B.G. contributed to molecular analyses, B.N. and S.L. contributed to bioinformatics analyses. T.C. contributed to bioinformatics analyses and co-wrote the final text. V.M. initiated this study, coordinated the project and co-wrote the final text. All authors read and approved the final manuscript.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
European Commission (funding by the Marie Curie International Training Network QuantFung, FP7-People-2013-ITN) [607332]. Funding for open access charge: Technische Universität Berlin.
Conflict of interest statement. None declared.
REFERENCES
- 1. Cairns T.C., Nai C., Meyer V.. How a fungus shapes biotechnology: 100 years of Aspergillus niger research. Fungal Biol. Biotechnol. 2018; 5:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Show P.L., Oladele K.O., Siew Q.Y., Aziz Zakry F.A., Lan J.C.-W., Ling T.C.. Overview of citric acid production from Aspergillus niger. Front. Life Sci. 2015; 8:271–283. [Google Scholar]
- 3. Pel H.J., Winde J.H., Archer D.B., Dyer P.S., Hofmann G., Schaap P.J., Turner G., de Vries R.P., Albang R., Albermann K. et al. Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88. Nat. Biotechnol. 2007; 25:221–231. [DOI] [PubMed] [Google Scholar]
- 4. Gong W., Cheng Z., Zhang H., Liu L., Gao P., Wang L.. Draft genome sequence of Aspergillus niger strain An76. Genome Announc. 2016; 4:e01700–e01715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. de Vries R.P., Riley R., Wiebenga A., Aguilar-Osorio G., Amillis S., Uchima C.A., Anderluh G., Asadollahi M., Askin M., Barry K. et al. Comparative genomics reveals high biological diversity and specific adaptations in the industrially and medically important fungal genus Aspergillus. Genome Biol. 2017; 18:28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Wang B., Lv Y., Li X., Lin Y., Deng H., Pan L.. Profiling of secondary metabolite gene clusters regulated by LaeA in Aspergillus niger FGSC A1279 based on genome sequencing and transcriptome analysis. Res. Microbiol. 2018; 169:67–77. [DOI] [PubMed] [Google Scholar]
- 7. Singh N.K., Blachowicz A., Romsdahl J., Wang C., Torok T., Venkateswaran K.. Draft genome sequences of several fungal strains selected for exposure to microgravity at the international space station. Genome Announc. 2017; 5:e01602–e01616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Yin C., Wang B., He P., Lin Y., Pan L.. Genomic analysis of the aconidial and high-performance protein producer, industrially relevant Aspergillus niger SH2 strain. Gene. 2014; 541:107–114. [DOI] [PubMed] [Google Scholar]
- 9. Paul S., Ludeña Y., Villena G.K., Yu F., Sherman D.H., Gutiérrez-Correa M.. High-quality draft genome sequence of a biofilm forming lignocellulolytic Aspergillus niger strain ATCC 10864. Stand Genomic Sci. 2017; 12:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Stajich J.E., Harris T., Brunk B.P., Brestelli J., Fischer S., Harb O.S., Kissinger J.C., Li W., Nayak V., Pinney D.F. et al. FungiDB: an integrated functional genomics database for fungi. Nucleic Acids Res. 2012; 40:D675–D681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Grigoriev I.V., Nikitin R., Haridas S., Kuo A., Ohm R., Otillar R., Riley R., Salamov A., Zhao X., Korzeniewski F. et al. MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Res. 2014; 42:D699–D704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Zerbino D.R., Achuthan P., Akanni W., Amode M.R., Barrell D., Bhai J., Billis K., Cummins C., Gall A., Girón C.G. et al. Ensembl 2018. Nucleic Acids Res. 2018; 46:D754–D761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Yin X., Shin H.D., Li J., Du G., Liu L., Chen J.. Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production. Sci. Rep. 2017; 7:41040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Andersen M.R., Salazar M.P., Schaap P.J., Van De Vondervoort P.J.I., Culley D., Thykaer J., Frisvad J.C., Nielsen K.F., Albang R., Albermann K. et al. Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88. Genome Res. 2011; 21:885–897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Vesth T.C., Nybo J.L., Theobald S., Frisvad J.C., Larsen T.O., Nielsen K.F., Hoof J.B., Brandl J., Salamov A., Riley R. et al. Investigation of inter- and intraspecies variation through genome sequencing of Aspergillus section Nigri. Nat. Genet. 2018; doi:10.1038/s41588-018-0246-1. [DOI] [PubMed] [Google Scholar]
- 16. Meyer V., Andersen M.R., Brakhage A.A., Braus G.H., Caddick M.X., Cairns C.T., de Vries R.P., Haarmann T., Hansen K., Hertz-Fowler K. et al. Current challenges of research on filamentous fungi in relation to human welfare and a sustainable bio-economy: a white paper. Fungal Biol. Biotechnol. 2016; 3:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Cerqueira G.C., Arnaud M.B., Inglis D.O., Skrzypek M.S., Binkley G., Simison M., Miyasato S.R., Binkley J., Orvis J., Shah P. et al. The Aspergillus Genome Database: Multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations. Nucleic Acids Res. 2014; 42:D705–D710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Cherry J.M., Hong E.L., Amundsen C., Balakrishnan R., Binkley G., Chan E.T., Christie K.R., Costanzo M.C., Dwight S.S., Engel S.R. et al. Saccharomyces Genome Database: The genomics resource of budding yeast. Nucleic Acids Res. 2012; 40:D700–D705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Gov E., Arga K.Y.. Differential co-expression analysis reveals a novel prognostic gene module in ovarian cancer. Sci. Rep. 2017; 7:4996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. van Dam S., Võsa U., van der Graaf A., Franke L., de Magalhães J.P.. Gene co-expression analysis for functional classification and gene–disease predictions. Brief. Bioinform. 2017; 19:575–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Hsu C.-L., Juan H.-F., Huang H.-C.. Functional analysis and characterization of differential coexpression networks. Sci. Rep. 2015; 5:13295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Dutkowski J., Kramer M., Surma M.A., Balakrishnan R., Cherry J.M., Krogan N.J., Ideker T.. A gene ontology inferred from molecular networks. Nat. Biotechnol. 2013; 31:38–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Andersen M.R., Nielsen J.B., Klitgaard A., Petersen L.M., Zachariasen M., Hansen T.J., Blicher L.H., Gotfredsen C.H., Larsen T.O., Nielsen K.F. et al. Accurate prediction of secondary metabolite gene clusters in filamentous fungi. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:E99–E107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Cairns T., Meyer V.. In silico prediction and characterization of secondary metabolite biosynthetic gene clusters in the wheat pathogen Zymoseptoria tritici. BMC Genomics. 2017; 18:631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Vesth T.C., Brandl J., Andersen M.R.. FunGeneClusterS: Predicting fungal gene clusters from genome and transcriptome data. Synth. Syst. Biotechnol. 2016; 1:122–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Cooper M.A., Shlaes D.. Fix the antibiotics pipeline. Nature. 2011; 472:32. [DOI] [PubMed] [Google Scholar]
- 27. Macheleidt J., Mattern D.J., Fischer J., Netzker T., Weber J., Schroeckh V., Valiante V., Brakhage A.A.. Regulation and Role of Fungal Secondary Metabolites. Annu. Rev. Genet. 2016; 50:371–392. [DOI] [PubMed] [Google Scholar]
- 28. Boecker S., Grätz S., Kerwat D., Adam L., Schirmer D., Richter L., Petras D., Süssmuth R.D., Meyer V.. Aspergillus niger is a superior expression host for the production of bioactive fungal cyclodepsipeptides. Fungal Biol. Biotechnol. 2018; 5:4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Meyer V., Ram A.F.J., Punt P.J.. Genetics, genetic manipulation, and approaches to strain improvement of filamentous fungi. Manual of Industrial Microbiology and Biotechnology. 2010; 3rd ednNY: Wiley; 318–329. [Google Scholar]
- 30. Green M.R., Sambrook J.. Molecular Cloning: A Laboratory Manual. 2012; 1-3:NY: Cold Spring Harbor Laboratory Press; 1–2028. [Google Scholar]
- 31. Meyer V., Wanka F., van Gent J., Arentshorst M., van den Hondel C.A., Ram A.F.. Fungal gene expression on demand: an inducible, tunable, and metabolism-independent expression system for Aspergillus niger. Appl. Environ. Microbiol. 2011; 77:2975–2983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Fiedler M.R.M., Gensheimer T., Kubisch C., Meyer V.. HisB as novel selection marker for gene targeting approaches in Aspergillus niger. BMC Microbiol. 2017; 17:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M. et al. NCBI GEO: Archive for functional genomics data sets - Update. Nucleic Acids Res. 2013; 41:D991–D995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Paege N., Jung S., Schäpe P., Müller-Hagen D., Ouedraogo J.P., Heiderich C., Jedamzick J., Nitsche B.M., van den Honde C.A., Ram A.F. et al. A transcriptome meta-analysis proposes novel biological roles for the antifungal protein Anafp in Aspergillus niger. PLoS One. 2016; 11:e0165755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Gautier L., Cope L., Bolstad B.M., Irizarry R.A.. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004; 20:307–315. [DOI] [PubMed] [Google Scholar]
- 36. Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004; 5:R80–R80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Spearman C. The proof and measurement of association between two things. By C. Spearman, 1904. Am. J. Psychol. 1987; 100:441–471. [PubMed] [Google Scholar]
- 39. Inglis D.O., Binkley J., Skrzypek M.S., Arnaud M.B., Cerqueira G.C., Shah P., Wymore F., Wortman J.R., Sherlock G.. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillusnidulans, A. fumigatus, A. niger and A. oryzae. BMC Microbiol. 2013; 13:91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Sanchez J.F., Wang C.C.C.. The chemical identification and analysis of Aspergillus nidulans secondary metabolites. Fungal Secondary Metabolism - Methods and Protocols. 2012; 97–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Park J., Hulsman M., Arentshorst M., Breeman M., Alazi E., Lagendijk E.L., Rocha M.C., Malavazi I., Nitsche B.M., van den Hondel C.A. et al. Transcriptomic and molecular genetic analysis of the cell wall salvage response of Aspergillus niger to the absence of galactofuranose synthesis. Cell Microbiol. 2016; 18:1268–1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Love M.I., Huber W., Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Benjamini Y., Hochberg Y.. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 1995; 57:289–300. [Google Scholar]
- 45. Punt P.J., van Gemeren I.A., Drint-Kuijvenhoven J., Hessing J.G., van Muijlwijk-Harteveld G.M., Beijersbergen A., Verrips C.T., van den Hondel C.A.. Analysis of the role of the gene bipA, encoding the major endoplasmic reticulum chaperone protein in the secretion of homologous and heterologous proteins in black Aspergilli. Appl. Microbiol. Biotechnol. 1998; 50:447–454. [DOI] [PubMed] [Google Scholar]
- 46. Hargrove T.Y., Wawrzak Z., Lamb D.C., Guengerich F.P., Lepesheva G.I.. Structure-functional characterization of cytochrome P450 Sterol 14-alpha-Demethylase (CYP51B) from Aspergillus fumigatus and molecular basis for the development of antifungal drugs. J. Biol. Chem. 2015; 290:23916–23934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Scharf D.H., Heinekamp T., Brakhage A.A.. Human and plant fungal Pathogens: The role of secondary metabolites. PLOS Pathog. 2014; 10:e1003859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Hoang H.D., Maruyama J.I., Kitamoto K.. Modulating endoplasmic reticulum-Golgi cargo receptors for improving secretion of carrier-fused heterologous proteins in the filamentous fungus Aspergillus oryzae. Appl. Environ. Microbiol. 2015; 81:533–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Dhingra S., Cramer R.A.. Regulation of sterol biosynthesis in the human fungal pathogen Aspergillus fumigatus: Opportunities for therapeutic development. Front. Microbiol. 2017; 8:92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Yasmin S., Alcazar-Fuoli L., Grundlinger M., Puempel T., Cairns T., Blatzer M., Lopez J.F., Grimalt J.O., Bignell E., Haas H. et al. Mevalonate governs interdependency of ergosterol and siderophore biosyntheses in the fungal pathogen Aspergillus fumigatus. Proc. Natl. Acad. Sci. U.S.A. 2012; 109:E497–E504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Andersen M.R., Vongsangnak W., Panagiotou G., Salazar M.P., Lehmann L., Nielsen J.. A trispecies Aspergillus microarray: comparative transcriptomics of three Aspergillus species. Proc. Natl. Acad. Sci. 2008; 105:4387–4392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Meyer V. A small protein that fights fungi: AFP as a new promising antifungal agent of biotechnological value. App. Microbiol. Biotechnol. 2008; 78:17–28. [DOI] [PubMed] [Google Scholar]
- 53. Meyer V., Wedde M., Stahl U.. Transcriptional regulation of the antifungal protein in Aspergillus giganteus. Mol. Genet. Genomics. 2001; 266:747–757. [DOI] [PubMed] [Google Scholar]
- 54. Hegedüs N., Sigl C., Zadra I., Pócsi I., Marx F.. The paf gene product modulates asexual development in Penicillium chrysogenum. J. Basic Microbiol. 2011; 51:253–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Meyer V., Jung S.. Antifungal peptides of the AFP family revisited: are these cannibal toxins. Microorganisms. 2018; 6:E50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Sheppard D.C. The Aspergillus fumigatus StuA protein governs the Up-Regulation of a discrete transcriptional program during the acquisition of developmental competence. Mol. Biol. Cell. 2005; 16:5866–5879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Hu P., Wang Y., Zhou J., Pan Y., Liu G.. AcstuA, which encodes an APSES transcription regulator, is involved in conidiation, cephalosporin biosynthesis and cell wall integrity of Acremonium chrysogenum. Fungal Genet. Biol. 2015; 83:26–40. [DOI] [PubMed] [Google Scholar]
- 58. Clutterbuck A.J. A mutational analysis of conidial development in Aspergillus nidulans. Genetics. 1969; 63:317–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Newman D.J., Cragg G.M.. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Products. 2012; 75:311–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Nielsen K.F., Mogensen J.M., Johansen M., Larsen T.O., Frisvad J.C.. Review of secondary metabolites and mycotoxins from the Aspergillus niger group. Anal. Bioanalyt. Chem. 2009; 395:1225–1242. [DOI] [PubMed] [Google Scholar]
- 61. Bayram Ö., Krappmann S., Ni M., Jin W.B., Helmstaedt K., Valerius O., Braus-Stromeyer S., Kwon N.J., Keller N.P., Yu J.H. et al. VelB/VeA/LaeA complex coordinates light signal with fungal development and secondary metabolism. Science. 2008; 320:1504–1506. [DOI] [PubMed] [Google Scholar]
- 62. Liu J., Osbourn A., Ma P.. MYB transcription factors as regulators of phenylpropanoid metabolism in plants. Mol. Plant. 2015; 8:689–708. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All datasets generated and/or analyzed during this study are available at FungiDB (http://fungidb.org/fungidb/). Spearman's correlation coefficients for gene co-expression sub-networks will also be made available by the corresponding author upon request. RNA seq data have been deposited at the Gene expression Omnibus under accession number GSE119311.