Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Feb 15.
Published in final edited form as: Genet Res (Camb). 2011 Dec;93(6):387–395. doi: 10.1017/S0016672311000346

Functional Genome Annotation of Drosophila Seminal Fluid Proteins Using Transcriptional Genetic Networks

Julien F Ayroles 1,2,, Brooke A Laflamme 3,, Eric A Stone 1,2, Mariana F Wolfner 3, Trudy F C Mackay 1,2
PMCID: PMC3925343  NIHMSID: NIHMS553616  PMID: 22189604

Summary

Predicting functional gene annotations remains a significant challenge, even in well-annotated genomes such as yeast and Drosophila. One promising, high-throughput method for gene annotation is to use correlated gene expression patterns to annotate target genes based on the known function of focal genes. The Drosophila melanogaster transcriptome varies genetically among wild derived inbred lines, with strong genetic correlations among the transcripts. Here, we leveraged the genetic correlations in gene expression among known seminal fluid protein (SFP) genes and the rest of the genetically varying transcriptome to identify 176 novel candidate SFPs (cSFPs). We independently validated the correlation in gene expression between seven of the cSFPs and a known SFP gene, as well as expression in male reproductive tissues. We argue that this method can be extended to other systems for which information on genetic variation in gene expression is available.

1. Introduction

The diminishing cost of high-throughput technologies such as whole genome transcript profiling, high-density genotyping and whole genome re-sequencing has shifted the focus of genomic sciences from data production to data interpretation. Foremost among the challenges in interpretation is functional gene annotation, through experimental validation or computational prediction. Even for the best-annotated genomes, a significant proportion of genes have yet to be functionally characterized (Pena-Castillo & Hughes, 2007; Costello et al., 2009); less than half in Drosophila (Costello et al., 2009).

Most knowledge regarding gene function in eukaryotes comes from mutagenesis, single gene knock-outs and RNAi knock-down experiments performed in yeasts, Drosophila, C. elegans, mouse and Arabidopsis (Winzeler et al., 1999; Alonso et al., 2003; Kamath & Ahringer, 2003; Bellen et al., 2004; Dietzl et al., 2007; Ni et al., 2009; Guan et al., 2010; Spirek et al., 2010). These approaches have provided functions for a large number of genes in many organisms and the basis for making gene function predictions based on gene sequence similarities. However, screening large mutant collections for quantitative phenotypes is highly laborious. Furthermore, unique mutations in the same gene, or the same mutation in multiple genetic backgrounds can give different phenotypes, further complicating the interpretation of such screens (Flint & Mackay, 2009; Mackay et al., 2009; Dowell et al., 2010).

Computational methods for gene annotation complement experimental approaches. Computational methods rely on the detection of particular sequence motifs (e.g., a binding domain) (Hrmova & Fincher, 2009); strong orthology with a gene of known function in a closely related species; or “guilt-by-association” (Bréhélin et al., 2010). The last approach is based on correlative evidence, such as the co-regulation of gene expression or the existence of known protein-protein interactions. In all cases, the functional annotation of a known gene is transferred to its interacting or correlated partner, providing an hypothesis that can be verified experimentally.

Traditionally, guilt-by-association annotation has been used in the context of environmental perturbations (Walker et al., 1999; Reverter et al., 2008; Vandepoele et al., 2009; Klie et al., 2010). A complementary approach is to utilize natural variation in genetically correlated transcriptional networks to identify co-regulated transcripts. Previously, we used genome wide transcript profiles from 40 lines from the Drosophila Genetic Reference Panel (DGRP, Ayroles et al., 2009), a set of inbred lines recently derived from the wild, as a source of genetic variation in gene expression. The genetic variation among these inbred lines greatly exceeds that which can be obtained by mutagenesis screens or standard genetic crosses, while sampling multiple genetically identical individuals from each line reduces environmental variance. The genetically variable transcripts are highly correlated among the lines, forming 241 transcriptional co-expression modules (Ayroles et al., 2009). These co-expression modules were enriched for common Gene Ontology categories, expression in the same tissues, common transcriptional factor binding sites, and associations of gene expression with the same quantitative traits. These observations suggest that genetic correlation of gene expression with a co-expression module may be due to co-regulation and that transcripts genetically correlated with a target gene of known function are plausibly involved in the same biological process or molecular function as the target gene (Luo et al., 2007; Ayroles et al., 2009). Here, we test this hypothesis using seminal fluid proteins (SFPs) as the focal genes.

We chose SFPs as focal genes for two reasons. First, many of the gene products of the secretory tissues of the male reproductive tract that produce the SFPs are well-understood in D. melanogaster (Wolfner, 2009). This is especially true for the male accessory glands, which produce proteins known collectively as ACcessory gland Proteins (ACPs). ACPs are transferred to females in the seminal fluid and affect a number of post-mating processes (Wolfner, 2009), including sperm storage and maintenance (Neubaum & Wolfner, 1999; Tram & Wolfner, 1999, Ravi Ram & Wolfner, 2007; 2009), egg production and mating receptivity (Chapman et al., 2003; Liu and Kubli, 2003; Heifetz et al., 2000), female feeding behavior (Carvalho et al., 2006), and sleep patterns (Isaac et al., 2010). Proteomic (Findlay et al., 2008; 2009) and gene expression (Swanson et al., 2001) studies have identified 187 SFPs, most of which are ACPs. Second, we observed strong genetic correlations in expression among the known ACPs (Ayroles et al., 2009), suggesting that new SFPs, and potentially genes important for the production or function of these proteins, could be found by analyzing the correlation structure between genetically variable transcripts.

Using the DGRP expression data (Ayroles et al., 2009), we identified transcripts whose expression patterns correlated with known SFPs. These correlated transcripts are candidates for both previously unknown SFPs and genes that are required for regulation of SFP production. Very little is known about how SFP genes are regulated in the male; this method provides a means to identify candidate regulatory genes for further study. As a proof of principle, the only known transcription factor required for the expression of specific SFP genes (Xue and Noll, 2002) was among the candidate genes we identified. Though proteins encoded by regulatory genes would not necessarily be transferred to females during mating, and are therefore not SFPs per se, we refer to our set of candidate SFPs as cSFPs.

We identified 176 cSFP genes. For validation, we selected seven candidates with varying levels of correlation to known SFP genes and used quantitative real-time PCR to validate the correlation patterns. We also used RT-PCR to test the tissue of expression for these seven genes. We propose that this method can be widely applied to similar datasets, beyond the example of the SFP functional annotation we present.

2. Methods

(i) Gene expression data

The gene expression data are from Ayroles et al. (2009). Whole genome expression was quantified using Affymetrix Drosophila 2.0 arrays for two replicate pools of 3–5 day old mated males and females for each of 40 DGRP lines. We median-centered the perfect match (PM) data and removed probes that were identified as likely single feature polymorphisms. We used the median log2 signal intensity of the remaining PM probes in each probe set as the measure of expression. A total of 14,840 (78.9%) of the 18,767 transcripts on the array were expressed. Because we focus here on highly male biased transcripts, we only used the male gene expression data to identify genetically variable transcripts. We fitted the following model to the expression data: Y = L + e, where Y is the median log2 signal intensity, L the line effect and e the residual. We identified 7,151 transcripts as genetically variable at FDR < 0.01.

The raw microarray data are deposited in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-MEXP-1594. The DGRP stocks are available from the Bloomington Drosophila Stock Center (Bloomington, Indiana).

(ii) Candidate SFPs

Of the 187 known SFP genes, 107 had genetically variable expression levels in the DGRP lines. We computed pairwise Pearson correlations between the 107 genetically variable SFPs and all 7,151 genetically variable transcripts. We then calculated an ‘SFP score’ for each of the 7,151 transcripts by tallying the number of significant correlations (p < 0.01) with known SFPs, divided by 107. For a given transcript, a score of 100 indicates that it is correlated with all 107 known SFPs, and a score of 0.93 (1/107 *100) indicates the absence of significant correlation between the focal gene and any of the known SFP genes (i.e., only showing correlation to itself). The thresholds used to compute this score are arbitrary, but this method is both simple and intuitive, and gives similar results to more sophisticated statistics such as the identification of eigengenes (Langfelder, 2007) following the construction of co-expression gene networks and using the PCA loadings to identify correlated transcripts may improve functional prediction.

In addition to the correlation structure, we used several criteria to identify transcripts as putative SFPs (proteins that are predominately or exclusively expressed in the male reproductive tract and likely to be transferred to females), or as potential regulatory genes (those that produce proteins unlikely to be transferred to females) but whose expression is also predominately limited to male reproductive tissues. We used FlyAtlas (Chintapalli et al., 2007), a database of tissue specific expression for Drosophila melanogaster, to examine the tissues of expression for each gene with an SFP score of greater than 8. In addition, because SFPs are secreted proteins, we used SignalP software (http://www.cbs.dtu.dk/services/SignalP/) to identify the presence of predicted signal sequences. The program calculates the probability that the input amino acid sequence contains an N-terminal secretion signal. Here, we used the signal peptide probability score given from the SignalP-HMM prediction method. Signal peptides are usually 15–30 amino acids long and contain a stereotypical pattern of charged, hydrophobic, and uncharged residues, though the amino acid sequence itself is not conserved (Emanuelsson et al., 2007). However, not all secreted proteins contain predicted signal sequences (Findlay et al., 2008), and not all proteins with secretion signals are secreted (Emanuelsson et al., 2007). Therefore we do not exclude genes as being seminal fluid proteins or ACP candidates based solely on a low SignalP score.

(iii) Experimental validation of cSFPs

We chose seven genes identified as cSFPs for validation of the guilt-by-association results as well as further characterization. These genes have a range of SFP scores and a few have predicted biochemical functions, though none were predicted to be involved with SFP function. In addition to the seven candidates, we also included a known ACP gene (CG9997; Swanson et al., 2001; Ravi Ram & Wolfner, 2007), and a known ejaculatory duct protein gene (Dup99B; Saudan et al., 2002), both of whose products are transferred to females, as positive controls. We expect cSFP genes, including those expressed in the ejaculatory duct or bulb, to correlate in expression with the known SFP, CG9997. We included CG34422 as a negative control, given its low SFP score and wide expression pattern across tissues, including the male accessory glands, brain, eye, and hindgut. This gene should not show a significant correlation to CG9997 in the quantitative PCR experiment, in contrast to the seven cSFPs.

We independently validated the tissue-biased expression results from FlyAtlas (Chintapalli et al., 2007) for these 10 genes. We reared Canton-S males on standard yeast-glucose medium under uncrowded conditions at ~24ºC. We dissected 50–60 testes (T), accessory glands (AG), ejaculatory ducts (ED), ejaculatory bulbs (EB), and male carcasses (C; no reproductive tract). Dissected tissues were placed directly into TRIzol Reagent (Invitrogen) on ice. We collected two biological replicates for each RNA extraction.

We used quantitative real-time PCR to validate the correlation structure between the genes that had been inferred from the microarray experiment. We randomly selected 20 of the 40 DGRP lines used in the microarray study (Ayroles et al., 2009), and isolated total RNA from two biological replicates, each with 8–12 males of each line (3–7 days post-eclosion). We then estimated the correlation of gene expression with the known SFP, CG9997.

(iv) RNA extractions and cDNA synthesis

We extracted total RNA by grinding dissected tissues in 150μL TRIzol Reagent (Invitrogen), following the manufacturer’s recommendations for RNA isolation, except that 0.5mL chloroform was used for every 1mL TRIzol. Total RNA was treated with DNase1 (Invitrogen) and converted to cDNA with Superscript II Reverse Transcriptase (Invitrogen) and oligo-dT primers as recommended by the manufacturer. We used 500ng of total RNA per 20 μl reverse transcription reaction. Negative controls without reverse transcriptase were tested once for all genes and all cDNA samples to exclude potential genomic DNA contamination.

(v) Quantitative Real-Time PCR

We quantified mRNA levels by quantitative RT–PCR in 25μL reactions with the SYBR green detection method (iQ SYBR Green Supermix, Bio-Rad) according to the protocol from MyiQ Single-Color Real-Time PCR Detection System (Bio-Rad). Each reaction was performed with 2 picograms of total cDNA, using a BioRad MyiQ Single-Color Real-Time PCR Detection system. We used the actin5C gene (Burn et al., 1989) as an internal standard. We used Primer3 (http://frodo.wi.mit.edu/primer3/) to design transcript-specific primers to amplify 85 to 148-bp regions of the genes of interest. CG34422 primers were designed to encompass the common regions of alternative transcripts. The starting template concentration of each transcript was calculated from the standard curve of that primer pair according to the method described by Qiagen (http://www1.qiagen.com/literature/brochures/pcr/qt/1037490_ag_pcr_0206_int_lr.pdf). We used the linear regression model Y = mX+b to quantify transcript abundance, where Y is the critical threshold (Ct) values from the qRT-PCR experiment, m is the slope and b is the intercept of the standard curve, and X is the transcript abundance. We standardized this estimate by dividing by the transcript abundance of actin5C in the same sample.

(vi) Gene Ontology analysis

We used Gene Ontology (GO) analysis to assign functional categories to the candidate SFP genes tested. We computed the genetic correlations between each of the seven new focal genes with the remainder of the genetically variable transcriptome. We then performed a GO enrichment analysis for the genes most strongly correlated to the focal gene (p < 0.001 and |r| > 0.5.) The conclusions regarding enrichment were the same if the threshold was increased to p <0.0001. We performed this analysis using DAVID 6.7 (Huang et al., 2009).

3. Results and Discussion

Of the 187 known SFPs, 107 had genetically variable transcripts among the 40 DGRP lines (Ayroles et al., 2009). The 107 known SFPs were highly genetically correlated (Figure 1), reinforcing the idea that gene co-expression may be a reflection of shared function. We attempted to cluster this correlation matrix further into modules using various clustering algorithms, including MMC (Stone & Ayroles, 2009), but did not find strong community structure in the graph resulting from this correlation matrix. In addition, we did not find evidence supporting the idea that genes sharing a similar GO term were more strongly correlated with each other than they were to the rest of the genes.

Figure 1.

Figure 1

Graphical representation of the correlation of among known SFPs. Each node represents a gene and each edge the correlation between two genes. The thickness of each edge is scaled proportional to the strength of the correlation between two genes. The absolute value of all correlations depicted is greater than 0.5 (p < 0.001).

We then analyzed the correlation matrix between the 107 known seminal fluid proteins and 7,151 transcripts that were genetically variable in males. We assigned an SFP score to each of the genetically variable transcripts based on the number of significant correlations with known SFPs (Supplementary Table 1). We next asked whether this approach would allow us to recover the known seminal fluid proteins. We ranked the vector of SFP scores from the highest to the lowest and applied the filter that cSFPs should be expressed in male reproductive tissues based on FlyAtlas (Chintapalli et al., 2007) data. We found that 78% of the known SFPs are in the top 500 transcripts.

We identified 176 cSFP genes that have correlated expression patterns to at least seven of the 107 genetically variable known seminal protein genes and are expressed in male reproductive tissues (Supplementary Table 1). A total of 37 of the 176 candidates have no known or predicted functions or GO terms. An additional 13 transcripts correspond to probe sets on the Affymetrix array but not annotated genes, and could correspond to new genes. Independent confirmation of cSFP identification comes from a proteomic screen aimed at identifying male proteins transferred during mating (Findlay et al., 2008; 2009). Two candidate transcripts were confirmed as bona fide SFPs: CG34002 (with an SFP score of 15) and Sfp26Ad (with an SFP score of 41). Sfp26Ad was not annotated as a gene at the time we performed this experiment and corresponded to probeset 637742_at on the Affymetrix array.

We chose seven candidate SFP genes (CG9720, CG11828, CG31413, CG31493, CG31496, CG32985, CG34002), as well as two positive control genes (the ACP gene CG9997 and the ejaculatory duct protein gene Dup99B) and one negative control gene (CG34422, with an SFP score of 0.93) for validation of the microarray correlation results using real-time quantitative PCR in 20 of the DGRP lines. The candidate genes have SFP scores ranging from moderately low (8) to very high (42, the highest SFP score found) (Table 1). The real-time PCR results confirmed the correlation between all seven candidate SFPs and the known ACP gene CG9997 across the twenty lines (Figure 2). As predicted, expression of the negative control CG34422 was not genetically correlated with that of CG9997. However, expression of the ejaculatory duct protein gene Dup99B, whose gene product is transferred with the seminal fluid to females, was genetically correlated with CG9997, demonstrating that non-ACP SFPs can also be identified with this method.

Table 1.

Genes selected for experimental validation. The SFP score is the fraction of known SFPs with which the gene had correlated expression. Sprob is the predicted probability of a secretion signal sequence as given by SignalP. Tissue of expression is given from the FlyAtlas compilation and our RT-PCR data from the male reproductive tract and carcass. Ejaculatory duct and bulb are not represented in FlyAtlas.

Category Gene Affymetrix ID SFP Score Sprob Tissue (FlyAtlas) Tissue (RT-PCR)
Candidate SFA CG9720 1624902_at 35 0.997 AG AG, ED, EB
Candidate SFA CG11828 1633604_at 41 0 AG AG, ED
Candidate SFA CG31413 1635084_at 42 0.987 AG AG, ED
Candidate SFA CG31493 1640609_at 36 0 AG AG
Candidate SFA CG31496 1628103_at 8 0.721 AG, LSG AG, ED. EB
Candidate SFA CG32985 1632491_at 38 0 AG AG, ED. EB
Candidate SFA CG34002 1625512_s_at 15 0.991 AG AG
ACP Positive Control CG9997 1634224_at 39 0.999 AG AG
ED Positive Control Dup99B 1639365_at 29 0.98 AG AG, ED, EB
ACP Negative Control CG34422 1641329_at 1 0 All but T, HT, HD All

Bold font denotes tissues of predominant expression. AG: accessory glands; ED: ejaculatory duct; EB: ejaculatory bulb; T: testis; LSG: larval salivary glands; HT: heart; HD: head. ED and EB are not represented in FlyAtlas.

Figure 2.

Figure 2

Correlation of quantitative RT-PCR estimates of gene expression between candidate SFP genes and positive and negative SFP control genes (y-axis) to a known ACP gene (CG9997, x-axis) among males of 20 inbred lines. All estimates of gene expression are normalized to that of actin5C. The linear regression line is shown, along with the t-test p-value and the estimate of the correlation coefficient, r. (a) CG11828, r = 0.68, p = 0.001. (b) CG31413, r = 0.81, p = 0.000014. (c) CG31493, r = 0.77, p = 0.000063. (d) CG31496, r = .51, p = 0.022. (e) CG32985, r = 0.53, p = 0.015. (f) CG34002, r = 0.66, p = 0.0017. (g) CG9720, r = 0.66, p = 0.0016. (h) Dup99B (positive control), r = 0.55, p = 0.012. (i) CG34422 (negative control), r =0.12, p = 0.61.

Table 1 gives SFP scores, secretion signal peptide probability, and tissue of expression for these seven genes and for the positive and negative controls. Three genes with high SFP scores were not predicted to have secretion signals. These genes’ products may be secreted nevertheless, as has been seen in other cases (Findlay et al., 2008), or they may be non-SFP genes that are important for the regulation of other SFPs.

Among the seven genes, all that were predicted to be expressed in accessory glands (Chintapalli et al., 2007) were confirmed as expressed in that tissue (Table 1, Figure 3). This transcript was only seen in the ejaculatory duct, with possible low expression in the ejaculatory bulb. A possible explanation is that the ejaculatory ducts and bulbs were not examined in the FlyAtlas compendium (Chintapalli et al., 2007), and some ejaculatory duct might have remained partially or completely attached to the accessory glands during the tissue dissections used for FlyAtlas.

Figure 3.

Figure 3

RT-PCR analysis of gene expression for candidate SFP genes and positive and negative SFP control genes in five male tissues: accessory glands (AG), testes (T), ejaculatory bulb (EB), ejaculatory duct (ED), and carcass (C, non-reproductive tissues). The subscripts denote the two biological replicates of each tissue. The number of PCR cycles for each gene was normalized to give non-saturation results. Actin5C was used as a control for cDNA synthesis. Whole male cDNA was used as a positive (+) PCR control, and no DNA template was used as a negative (−) PCR control.

To gain insight into the possible biological processes and molecular functions of the seven candidate genes chosen for validation, we used a GO enrichment analysis implemented in DAVID (6.7) (Huang et al., 2009). For each candidate gene, we analyzed the function of its most correlated transcripts (p < 0.001 and r > 0.5). Four of the seven candidate genes (CG11828, CG31413, CG31493 and CG34002) were significantly associated with serine-type endopeptidase inhibitor activity, a predicted function shared by several other SFPs (Wolfner, 2009). However, it is important to note that CG11828, CG31413, and CG31493 do not contain conserved protease domains but do contain other types of predicted conserved domains. No significant GO-class enrichment was observed for CG9720, CG31496 or CG32985.

It is possible that some of the cSFP genes are important for SFP expression and function but may not encode proteins that are transferred to females as part of the seminal fluid. As proof of principle that such genes can be identified by this method, our analysis detected paired (SFP score = 16), which encodes a transcription factor important in accessory gland development and ACP expression (Supplementary Table 2). This Pax gene has a dual function in Drosophila: it acts first as a pair-rule gene in early embryo development (Nusslein-Volhard & Weischaus, 1980; Kilchherr et al., 1986) and later is required for viability and male fertility (Bertuccioli et al., 1996; Xue & Noll, 1996; 2000). Accessory gland formation and expression of at least two seminal fluid proteins expressed in the accessory gland (ACP26Aa and SP) both require the function of paired (Xue & Noll, 2000, 2002).

Guilt-by-association methods most frequently rely on clustering algorithms to identify the functional membership of a candidate gene or transcript (Aravind, 2000; Miozzi et al., 2008; Reverter et al., 2008; Klie et al., 2010). In its most common use, guilt-by-association is used to assign functions to any or all unannotated genes that respond to a given treatment or are differentially regulated under disease conditions. Here, we have demonstrated the use of guilt-by-association methods in another context: to identify genes in a specific functional class using correlated genetic variation in gene expression among wild-derived inbred lines. This method removes the requirement for relying on arbitrary clustering or reliance on gene ontology (GO) terms to assign candidate functions to new genes. Instead, a group of genes that has been annotated and functionally clustered experimentally is used to find correlated transcripts that can then be included in the group. In this case, we used SFPs, a group defined by a biological phenomenon rather than a biochemical function. As in potentially many other cases, for example identifying genes involved in specific behaviors, GO terms do not define our selected group of genes as belonging to a biologically significant group. The group of genes we identified (cSFPs) have diverse GO functions (ranging from proteases to prohormones). A given cSFP gene could not be predicted as an SFP on the basis of GO membership.

SFP genes are well suited for this study since their expression is specific, or highly biased, to the male reproductive tract, facilitating their confirmation as SFPs; and expression of the known SFPs is genetically variable in the population of lines surveyed. An increasing number of studies are taking advantage of natural genetic variation to better understand the genetic basis of phenotypic variation (Mackay et al., 2009). In the future, the availability of sequence information for the D. melanogaster population used in this study will allow us to associate co-expression with eQTL analysis (Mackay et al., 2009). This additional layer of information will further our understanding of what genetic factors are driving co-expression between SFP genes, and may lead us to rethink what information should be considered when annotating a segment of sequence.

To complement this study, and generalize the simple analysis presented in this manuscript, we have created a webtool (dgrp.statgen.ncsu.edu) that allows user to input the Affymetrix Drosophila 2.0 ID of any focal gene of interest and retrieve a vector of genes, their ranked correlation with the focal gene, as well as the gene ontology of the correlated transcripts. This tool integrates FlyAtlas information (Chintapalli et al., 2007), allowing users to restrict the computation of correlations to genes expressed in specific tissue or to genes with strong tissue-biased expression.

Many studies using natural genetic variation to study phenotypic variation also investigate variation in gene expression and gene co-expression (Mackay et al. 2009). However, very rarely is this information translated in the form of hypothetical functional annotation for any unannotated genes involved. We advocate that such datasets be used more routinely as patterns should emerge across studies and this information will greatly improve our understanding of genes, their function, and regulation. In particular, directed analyses such as the one presented here, in which genes involved in an experimentally-defined group are sought, may help to uncover pleiotropy among previously annotated genes and increase our understanding of how various biological systems function together.

Supplementary Material

Table S1. Supplementary Table 1.

Genes for which variation in gene expression among the DGRP lines was correlated with at least one known SFP, and corresponding SFP scores.

Table S2. Supplementary Table 2.

Candidate SFP genes with correlated expression patterns to at least seven known SFPs, and which are expressed in male reproductive tissues.

Acknowledgments

This work was funded by National Institutes of Health grants R01 GM45146 (T. F. C. M.) and R0I HD038921 (M. F. W.). B. A. L. was supported by an NSF Predoctoral Fellowship. We thank G. Findlay and E. Kelleher for comments on the manuscript, and J. Mezey and MBG colleagues for use of the qPCR machine. This is a publication of the W. M. Keck Center for Behavioral Biology.

References

  1. Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, Gadrinab C, Heller C, Jeske A, Koesema E, Meyers CC, Parker H, Prednis L, Ansari Y, Choy N, Deen H, Geralt M, Hazari N, Hom E, Karnes M, Mulholland C, Ndubaku R, Schmidt I, Guzman P, Aguilar-Henonin L, Schmid M, Weigel D, Carter DE, Marchand T, Risseeuw E, Brogden D, Zeko A, Crosby WL, Berry CC, Ecker JR. Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science. 2003;301:653–657. doi: 10.1126/science.1086391. [DOI] [PubMed] [Google Scholar]
  2. Aravind L. Guilt by association: contextual information in genome analysis. Genome Research. 2000;10:1074–1077. doi: 10.1101/gr.10.8.1074. [DOI] [PubMed] [Google Scholar]
  3. Ayroles JF, Carbone MA, Stone EA, Jordan KW, Lyman RF, Magwire MM, Rollmann SM, Duncan LH, Lawrence F, Anholt RRH, Mackay TFC. Systems genetics of complex traits in Drosophila melanogaster. Nature Genetics. 2009;41:299–307. doi: 10.1038/ng.332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bellen HJ, Levis RW, Liao G, He Y, Carlson JW, Tsang G, Evans-Holm M, Hiesinger PR, Schulze KL, Rubin GM, Hoskins RA, Spradling AC. The BDGP gene disruption project: single transposon insertions associated with 40% of Drosophila genes. Genetics. 2004;67:761–781. doi: 10.1534/genetics.104.026427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bertuccioli C, Fasano L, Jun S, Wang S, Sheng G, Desplan C. In vivo requirement for the paired domain and homeodomain of the paired segmentation gene product. Development. 1996;122:2673–2685. doi: 10.1242/dev.122.9.2673. [DOI] [PubMed] [Google Scholar]
  6. Bréhélin L, Florent I, Gascuel O, Maréchal E. Assessing functional annotation transfers with inter-species conserved coexpression: application to Plasmodium falciparum. BMC Genomics. 2010;11:35. doi: 10.1186/1471-2164-11-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carvalho GB, Kapahi P, Anderson DJ, Benzer S. Allocrine modulation of feeding behavior by the Sex Peptide of Drosophila. Current Biology. 2006;16:692–696. doi: 10.1016/j.cub.2006.02.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chapman T, Bangham J, Vinti G, Seifried B, Lung O, Wolfner MF, Smith HK, Partridge L. The sex peptide of Drosophila melanogaster: female post-mating responses analyzed by using RNA interference. Proceedings of the National Acadamy of Sciences of the USA. 2003;100:9923–9928. doi: 10.1073/pnas.1631635100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chintapalli VR, Wang J, Dow JA. Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nature Genetics. 2007;39:715–720. doi: 10.1038/ng2049. [DOI] [PubMed] [Google Scholar]
  10. Costello JC, Dalkilic MM, Beason SM, Gehlhausen JR, Patwardhan R, Middha S, Eads BD, Andrews JR. Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function. Genome Biology. 2009;10:R97. doi: 10.1186/gb-2009-10-9-r97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dietzl G, Chen D, Schnorrer F, Su KC, Barinova Y, Fellner M, Gasser B, Kinsey K, Oppel S, Scheiblauer S, Couto A, Marra V, Keleman K, Dickson BJ. A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature. 2007;448:151–156. doi: 10.1038/nature05954. [DOI] [PubMed] [Google Scholar]
  12. Dowell RD, Ryan O, Jansen A, Cheung D, Agarwala S, Danford T, Bernstein DA, Rolfe PA, Heisler LE, Chin B, Nislow C, Giaever G, Phillips PC, Fink GR, Gifford DK, Boone C. Genotype to phenotype: a complex problem. Science. 2010;328:469. doi: 10.1126/science.1189015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols. 2007;2:953–971. doi: 10.1038/nprot.2007.131. [DOI] [PubMed] [Google Scholar]
  14. Findlay GD, Yi X, Maccoss MJ, Swanson WJ. Proteomics reveals novel Drosophila seminal fluid proteins transferred at mating. Public Library of Science Biology. 2008;6:e178. doi: 10.1371/journal.pbio.0060178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Findlay GD, MacCoss MJ, Swanson WJ. Proteomic discovery of previously unannotated, rapidly evolving seminal fluid genes in Drosophila. Genome Research. 2009;19:886–896. doi: 10.1101/gr.089391.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Flint J, Mackay TFC. Genetic architecture of quantitative traits in mice, flies, and humans. Genome Research. 2009;19:723–733. doi: 10.1101/gr.086660.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Guan C, Ye C, Yang X, Gao J. A review of current large-scale mouse knockout efforts. Genesis. 2010;48:73–85. doi: 10.1002/dvg.20594. [DOI] [PubMed] [Google Scholar]
  18. Heifetz Y, Lung O, Frongillo EA, Jr, Wolfner MF. The Drosophila seminal fluid protein Acp26Aa stimulates release of oocytes by the ovary. Current Biology. 2000;10:99–102. doi: 10.1016/s0960-9822(00)00288-8. [DOI] [PubMed] [Google Scholar]
  19. Hrmova M, Fincher GB. Functional genomics and structural biology in the definition of gene function. Methods in Molecular Biology. 2009;513:199–227. doi: 10.1007/978-1-59745-427-8_11. [DOI] [PubMed] [Google Scholar]
  20. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protocols. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  21. Isaac RE, Li C, Leedale AE, Shirras AD. Drosophila male sex peptide inhibits siesta sleep and promotes locomotor activity in the post-mated female. Proceedings. Biological Sciences. 2010;277:65–70. doi: 10.1098/rspb.2009.1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kamath RS, Ahringer J. Genome-wide RNAi screening in Caenorhabditis elegans. Methods. 2003;30:313–321. doi: 10.1016/s1046-2023(03)00050-1. [DOI] [PubMed] [Google Scholar]
  23. Kilchherr E, Schumaker VN, Phillips ML, Curtiss LK. Activation of the first component of human complement, C1, by monoclonal antibodies directed against different domains of subcomponent C1q. Journal of Immunology. 1986;137:255–262. [PubMed] [Google Scholar]
  24. Klie S, Nikoloski Z, Selbig J. Biological cluster evaluation for gene function prediction. Journal of Computational Biology. 2010 doi: 10.1089/cmb.2009.0129. Epub ahead of print. [DOI] [PubMed] [Google Scholar]
  25. Langfelder P, Horvath S. Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology. 2007;1:54. doi: 10.1186/1752-0509-1-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Liu H, Kubli E. Sex-peptide is the molecular basis of the sperm effect in Drosophila melanogaster. Proceedings of the National Academy of Sciences of the USA. 2003;100:9929–9933. doi: 10.1073/pnas.1631700100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Luo F, Yang Y, Zhong J, Gao H, Khan L, Thompson DK, Zhou J. Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory. BMC Bioinformatics. 2007;8:299. doi: 10.1186/1471-2105-8-299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Mackay TFC, Stone EA, Ayroles JF. The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics. 2009;10:565–677. doi: 10.1038/nrg2612. [DOI] [PubMed] [Google Scholar]
  29. Miozzi L, Piro RM, Rosa F, Ala U, Silengo L, Di Cunto F, Provero P. Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data. Public Library of Science One. 2008;6:e2439. doi: 10.1371/journal.pone.0002439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Neubaum DM, Wolfner MF. Wise, winsome, or weird? Mechanisms of sperm storage in female animals. Current Topics in Developmental Biology. 1999;41:67–97. doi: 10.1016/s0070-2153(08)60270-7. [DOI] [PubMed] [Google Scholar]
  31. Ni JQ, Liu LP, Binari R, Hardy R, Shim HS, Cavallaro A, Booker M, Pfeiffer BD, Markstein M, Wang H, Villalta C, Laverty TR, Perkins LA, Perrimon N. A Drosophila resource of transgenic RNAi lines for neurogenetics. Genetics. 2009;182:1089–1100. doi: 10.1534/genetics.109.103630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Nüsslein-Volhard C, Wieschaus E. Mutations affecting segment number and polarity in Drosophila. Nature. 1980;287:795–801. doi: 10.1038/287795a0. [DOI] [PubMed] [Google Scholar]
  33. Peña-Castillo L, Hughes TR. Why are there still over 1000 uncharacterized yeast genes? Genetics. 2007;176:7–14. doi: 10.1534/genetics.107.074468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ravi Ram K, Wolfner MF. Seminal influences: Drosophila ACPs and the molecular interplay between males and females during reproduction. Integrative and Comparative Biology. 2007;47:427–445. doi: 10.1093/icb/icm046. [DOI] [PubMed] [Google Scholar]
  35. Ravi Ram K, Wolfner MF. A network of interactions among seminal proteins underlies the long-term postmating response in Drosophila. Proceedings of the National Academy of Sciences of the USA. 2009;106:15384–15389. doi: 10.1073/pnas.0902923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Reverter A, Ingham A, Dalrymple BP. Mining tissue specificity, gene connectivity and disease association to reveal a set of genes that modify the action of disease causing genes. BioData Mining. 2008;1:8. doi: 10.1186/1756-0381-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Saudan P, Hauck K, Soller M, Choffat Y, Ottiger M, Spörri M, Ding Z, Hess D, Gehrig PM, Klauser S, Hunziker P, Kubli E. Ductus ejaculatorius peptide 99B (DUP99B), a novel Drosophila melanogaster sex-peptide pheromone. European Journal of Biochemistry. 2002;269:989–997. doi: 10.1046/j.0014-2956.2001.02733.x. [DOI] [PubMed] [Google Scholar]
  38. Spirek M, Benko Z, Carnecka M, Rumpf C, Cipak L, Batova M, Marova I, Nam M, Kim DU, Park HO, Hayles J, Hoe KL, Nurse P, Gregan J. S. pombe genome deletion project: An update. Cell Cycle. 2010;9:2399–2402. doi: 10.4161/cc.9.12.11914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Stone EA, Ayroles JF. Modulated modularity clustering as an exploratory tool for functional genomic inference. Public Library of Science Genetics. 2009;5:e1000479. doi: 10.1371/journal.pgen.1000479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Swanson WJ, Clark AG, Waldrip-Dail HM, Wolfner MF, Aquadro CF. Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proceedings of the National Academy of Sciences of the USA. 2001;98:7375–7379. doi: 10.1073/pnas.131568198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Tram U, Wolfner MF. Male seminal fluid proteins are essential for sperm storage in Drosophila melanogaster. Genetics. 1999;153:837–844. doi: 10.1093/genetics/153.2.837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Vandepoele K, Quimbaya M, Casneuf T, De Veylder L, Van de Peer Y. Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and coexpression networks. Plant Physiology. 2009;150:535–546. doi: 10.1104/pp.109.136028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Walker MG, Volkmuth W, Sprinzak E, Hodgson D, Klingler T. Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Research. 1999;9:1198–1203. doi: 10.1101/gr.9.12.1198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, El Bakkoury M, Foury F, Friend SH, Gentalen E, Giaever G, Hegemann JH, Jones T, Laub M, Liao H, Liebundguth N, Lockhart DJ, Lucau-Danila A, Lussier M, M’Rabet N, Menard P, Mittmann M, Pai C, Rebischung C, Revuelta JL, Riles L, Roberts CJ, Ross-MacDonald P, Scherens B, Snyder M, Sookhai-Mahadeo S, Storms RK, Véronneau S, Voet M, Volckaert G, Ward TR, Wysocki R, Yen GS, Yu K, Zimmermann K, Philippsen P, Johnston M, Davis RW. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999;285:901–906. doi: 10.1126/science.285.5429.901. [DOI] [PubMed] [Google Scholar]
  45. Wolfner MF. Battle and ballet: molecular interactions between the sexes in Drosophila. Journal of Heredity. 2009;100:399–410. doi: 10.1093/jhered/esp013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Xue L, Noll M. The functional conservation of proteins in evolutionary alleles and the dominant role of enhancers in evolution. European Molecular Biology Organization Journal. 1996;15:3722–3731. [PMC free article] [PubMed] [Google Scholar]
  47. Xue L, Noll M. Drosophila female sexual behavior induced by sterile males showing copulation complementation. Proceedings of the National Academy of Sciences of the USA. 2000;97:3272–3275. doi: 10.1073/pnas.060018897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Xue L, Noll M. Dual role of the Pax gene paired in accessory gland development of Drosophila. Development. 2002;129:339–346. doi: 10.1242/dev.129.2.339. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Supplementary Table 1.

Genes for which variation in gene expression among the DGRP lines was correlated with at least one known SFP, and corresponding SFP scores.

Table S2. Supplementary Table 2.

Candidate SFP genes with correlated expression patterns to at least seven known SFPs, and which are expressed in male reproductive tissues.

RESOURCES