Skip to main content
Molecular & Cellular Proteomics : MCP logoLink to Molecular & Cellular Proteomics : MCP
. 2022 Aug 17;21(10):100281. doi: 10.1016/j.mcpro.2022.100281

Functional Diversity and Evolution of the Drosophila Sperm Proteome

Martin D Garlovsky 1,, Jessica A Sandler 2, Timothy L Karr 2,3,
PMCID: PMC9494239  PMID: 35985624

Abstract

Spermatozoa are central to fertilization and the evolutionary fitness of sexually reproducing organisms. As such, a deeper understanding of sperm proteomes (and associated reproductive tissues) has proven critical to the advancement of the fields of sexual selection and reproductive biology. Due to their extraordinary complexity, proteome depth-of-coverage is dependent on advancements in technology and related bioinformatics, both of which have made significant advancements in the decade since the last Drosophila sperm proteome was published. Here, we provide an updated version of the Drosophila melanogaster sperm proteome (DmSP3) using improved separation and detection methods and an updated genome annotation. Combined with previous versions of the sperm proteome, the DmSP3 contains a total of 3176 proteins, and we provide the first label-free quantitation of the sperm proteome for 2125 proteins. The top 20 most abundant proteins included the structural elements α- and β-tubulins and sperm leucyl-aminopeptidases. Both gene content and protein abundance were significantly reduced on the X chromosome, consistent with prior genomic studies of X chromosome evolution. We identified 9 of the 16 Y-linked proteins, including known testis-specific male fertility factors. We also identified almost one-half of known Drosophila ribosomal proteins in the DmSP3. The role of this subset of ribosomal proteins in sperm is unknown. Surprisingly, our expanded sperm proteome also identified 122 seminal fluid proteins (Sfps), proteins originally identified in the accessory glands. We show that a significant fraction of ‘sperm-associated Sfps’ are recalcitrant to concentrated salt and detergent treatments, suggesting this subclass of Sfps are expressed in testes and may have additional functions in sperm, per se. Overall, our results add to a growing landscape of both sperm and seminal fluid protein biology and in particular provides quantitative evidence at the protein level for prior findings supporting the meiotic sex-chromosome inactivation model for male-specific gene and X chromosome evolution.

Keywords: Spermatozoa, seminal fluid proteins, ribosomes, meiotic sex chromosome inactivation, fertility, evolution, discovery proteomics, human disease, OMIM, Drosophila

Abbreviations: DmSP, Drosophila melanogaster sperm proteome; GO, gene ontology; LFQ, label-free quantitation; Sfp, seminal fluid protein; S-Laps, sperm leucyl-aminopeptidases

Graphical Abstract

graphic file with name fx1.jpg

Highlights

  • The sperm proteome contains unexpected proteins (e.g., neuronal and yolk proteins).

  • Ribosomal protein “swapping” suggest stage & development specific loading into sperm.

  • Label-free quantitation and fractionation revealed seminal fluid proteins in sperm.

  • Low abundance of X-linked proteins supports meiotic sex-chromosome inactivation.

In Brief

Fertilization is the sine qua non of animal and plant reproduction, a process conserved across the tree of life. During fertilization, a specific subset of male-derived proteins, many of unknown function, enter and are present in the developing egg and zygote. We therefore reinterrogated previous published sperm proteomes using improved purification and fractionation methods and identified >3000 proteins in sperm including a novel exchange of paralogous ribosomal proteins suggesting complex patterns of paralog switching and selectivity during spermatogenesis.


Spermatozoa form, function, and evolution is determined in large measure by its proteome (1). High throughput proteomics using LC-MS has been used to characterize the composition of the sperm proteome in a wide range of animals (1, 2, 3) revealing several common features of sperm as expected for an ancient cell type with a highly conserved function (4, 5). For instance, despite exhibiting exceptional morphological diversity across the tree of life (4, 6), sperm proteomes show enrichment of metabolic processes, mitochondria, axoneme, microtubules, and cytoskeletal components (2, 3, 7, 8).

Drosophila melanogaster provides a powerful genetic and functional genomics model system to understand reproduction and fertility (e.g., (9)). Discovery (aka, shotgun) bottom-up proteomics is used to match empirical- to theoretical-peptide mass spectra and infer protein presence (10). However, although characterizing proteome composition is a critical first step, estimating protein abundance is equally critical for assigning targets of interest, potential functions, and comparative studies both within and between taxa. Indeed, the discovery and subsequent study of sperm leucyl-aminopeptidases (S-Laps) arose from protein quantitation from two-dimensional gels (11, 12). The current study was motivated by recent advances in LC-MS technology, particularly in data acquisition time and improved liquid chromatographic systems leading to enhanced proteome coverage of complex cell and tissue types (10, 13). These advances allow routine and accurate quantitation of both label- and label-free methods, an essential element for comparative studies of sperm composition and function (14, 15). Additionally, these advances permit direct injection of sample peptides without the need for prefractionation using PAGE thus avoiding sample loss, increasing proteome coverage. Here, we reinterrogated the D. melanogaster sperm proteome using direct solubilization of sperm followed by on-line fractionation of tryptic peptides.

Our previous efforts identified over 1000 D. melanogaster sperm proteins with prior versions designated DmSP1 (7) and DmSP2 (11). The DmSP3 described in this study significantly increases coverage and refinement of the D. melanogaster sperm proteome, from the 1108 sperm proteins identified in the DmSP2 (11) to more than 3000 proteins in the DmSP3 (Table 1). Table 1 highlights our extended knowledge base not only in terms of absolute numbers of sperm proteins but also discovery of new protein groups. We report a significant increase in both proteome size and content and provide a detailed analysis of relative abundance of sperm proteins for the first time. We confirmed high abundances of S-Laps and provide a wealth of new quantitative information including the surprising findings of substantial levels of ribosomal proteins, seminal fluid proteins (Sfps), and Y-linked proteins.

Table 1.

History of the Drosophila melanogaster sperm proteome (DmSP)

Category DmSP1 DmSP2 DmSP3
Methods/technology LC-MS2/Maldi SDS-PAGE Cell digest
Machine Thermo LCQ LTQ Orbitrap Orbitrap Fusion Lumos
Proteins identified 341 1108 (+767) 3176 (+2068)
X-linkeda Under Ns Under
Y-linked 0 4 9 (+5)
Sfps CG2918 11 (+10) 122 (+111)
Ribosomal proteins 0 9 83 (+74)

DmSP1: (7); DmSP2: (11); DmSP3: this study. Note that the DmSP2 combined the 341 proteins identified in the DmSP1 with the 956 proteins identified in the DmSP2. Likewise, the DmSP3 reported here represents the combined total of all proteins identified in the DmSP2 (n = 1108) with the 2562 proteins identified across all experiments in the current study. Numbers in parentheses denote number of newly identified proteins.

a

Under = significant gene underrepresentation compared to expected value (see Experimental procedures); ns = not significant.

The function of sperm beyond a delivery system for the male haploid genetic material to the next generation has gained renewed attention (16, 17, 18). For instance, sperm were often thought to be stripped of most cellular machinery, remaining transcriptionally silent prior to maturation, thus precluding the need for cellular components such as ribosomes. Additionally, previous studies found some Sfps bind to sperm in the female reproductive tract (19) and several Sfps were identified in the DmSP1 and DmSP2 but not quantified (7, 11). Therefore, we conducted two experiments to determine the binding properties of proteins associated with sperm. First, we washed sperm with a strong anionic detergent to disrupt the plasma membrane to strip away both weakly binding and sperm plasma membrane proteins. Second, we washed sperm with high molar salt to weaken ionic bonds and eliminate nonspecific protein binding to sperm (including Sfps). This approach identified over 60 ‘sperm-associated Sfps’ recalcitrant to detergent or salt treatment. Furthermore, while the current article was under review, another study also demonstrated Sfps bind sperm in the seminal vesicles (20). Therefore, we performed additional analyses to compare our results, which together provide strong evidence in support of a subclass of sperm-associated Sfps. Finally, we use the increased proteome coverage and quantitative information in the DmSP3 to provide a detailed analysis of relative abundance of sperm proteins for the first time and re-examine the evolutionary dynamics, gene age, and chromosomal distribution of sperm proteins. The analyses provide stronger support for previous claims and in particular cements the subjective prior findings supporting the meiotic sex-chromosome inactivation model for male-specific gene and X chromosome evolution (21, 22, 23, 24).

Experimental Procedures

Fly Stocks and Sample Preparation

We used laboratory WT strain Oregon-R D. melanogaster virgin males, aged 5 to 7 days. All dissections and sperm isolation were performed at room temperature in freshly prepared PBS with or without protease inhibitors (HALT, Thermo Fisher). We anaesthetized flies and removed reproductive tracts with forceps under a stereo dissecting microscope as previously described (11). Briefly, each biological replicate from ten males (20 paired seminal vesicles) were prepared separately over the course of no more than 1 hour by first removing the seminal vesicles from each male reproductive tract (containing testes, seminal vesicles, and accessory glands) into a fresh drop of PBS. Sperm were then carefully removed using fine needles to a 1.5 ml microcentrifuge tube containing 1 ml PBS (on ice). Sperm were then pelleted at 15,000 rpm for 15 min at 4 °C, PBS then carefully removed and the pellet resuspended by addition of 1.0 ml PBS followed by an additional 15 min centrifugation at 15,000 rpm. The washing procedure was repeated 2× and the final pellet immediately solubilized and reduced in 25 μl of 5% SDS/50 mM TEAB containing 50 mM DTT and incubated for 10 to 15 min at 95 °C. Samples were then spun again at 15,000 rpm for 15 min at 20 °C and visually inspected to ensure no visible pellets were present. Supernatants were then removed and stored at −20 °C or immediately processed as described below.

Solubilized sperm proteins were quantified using EZQ Protein Quantitation Kit (Thermo Fisher), and 14 to 16 μg of total protein were alkylated using 40 mM final concentration of freshly prepared iodoacetamide (Pierce) for 30 min in the dark at room temperature. Samples were processed using the S-trap Micro Columns (Protifi) following manufacturer’s S-trap Micro High Recovery Protocol. Briefly, samples (∼30 μl) were acidified to ∼1.2% phosphoric acid by addition of a stock 12% phosphoric acid solution. Proteins were digested by addition of 2 μl of a 1 mg/ml solution of porcine (MS sequencing grade modified trypsin, Promega) and layered onto the S-trap column containing 180 μl of 90% methanol/100 mM TEAB. Samples were briefly spun to remove excess buffer and washed 4× with S-trap buffer. An additional 0.5 μg of trypsin and 25 μl of 50 mM TEAB was added to the top of each column and incubated for 1 h at 47 °C. Samples were eluted off the S-trap columns using three elution buffers: 50 mM TEAB, 0.2% formic acid in water, and 50% acetonitrile/50% water +0.2% formic acid. Samples were dried down via speed vac and resuspended in 20 to 30 μl of 0.1% formic acid.

Liquid-Chromatography Tandem Mass Spectrometry

All LC-MS analyses were performed at the Biosciences Mass Spectrometry Core Facility (https://cores.research.asu.edu/mass-spec/) at Arizona State University. All data-dependent mass spectra were collected in positive mode using direct-injection into an Orbitrap Fusion Lumos mass spectrometer (Thermo Scientific) coupled with an UltiMate 3000 UHPLC (Thermo Scientific). One microliter of peptides were fractionated using an Easy-Spray LC column (50 cm × 75 μm ID, PepMap C18, 2 μm particles, 100 Å pore size, Thermo Scientific). Electrospray potential was set to 1.6 kV and the ion transfer tube temperature to 300 °C. The mass spectra were collected using the “Universal” method optimized for peptide analysis provided by Thermo Scientific. Full MS scans (375–1500 m/z range) were acquired in profile mode with the Orbitrap set to a resolution of 120,000 (at 200 m/z), cycle time set to 3 s, and mass range set to “Normal”. The RF lens was set to 30% and the AGC set to “Standard”. Maximum ion accumulation time was set to “Auto”. Monoisotopic peak determination was set to “peptide” and included charge states 2 to 7. Dynamic exclusion was set to 60s with a mass tolerance of 10 ppm and the intensity threshold set to 5.0e3. MS/MS spectra were acquired in a centroid mode using quadrupole isolation window set to 1.6 (m/z). Collision-induced fragmentation energy was set to 35% with an activation time of 10 milliseconds. Peptides were eluted during a 240-min gradient at a flow rate of 0.250 μl/min, containing 2 to 80% acetonitrile/water as follows: 0 to 3 min at 2%, 3 to 75 min at 2 to 15%, 75 to 180 min at 15 to 30%, 180 to 220 min at 30 to 35%, 220 to 225 min at 35 to 80%, 225 to 230 at 80%, and 230 to 240 at 80 to 5%.

Label-Free Quantification

We analyzed raw files searched against the Uniprot (www.uniprot.org) D. melanogaster database (Dmel_UP000000803.fasta) using Proteome Discover 2.4 (Thermo Scientific). Raw files were searched using SequestHT that included Trypsin (specific) as enzyme, maximum missed cleavage site 3, min/max peptide length 6/144, precursor ion (MS1) mass tolerance set to 20 ppm, fragment mass tolerance set to 0.5 Da, and a minimum of one peptide identified. Carbamidomethyl (C) was specified as fixed modification, and dynamic modifications set to acetyl and Met-loss at the N-terminus, and oxidation of Met. A concatenated target/decoy strategy and a false-discovery rate set to 1.0% was calculated using Percolator (25). The data was imported into Proteome Discoverer 2.4, and accurate mass and retention time of detected ions (features) using Minora Feature Detector algorithm. The identified Minora features were then used to determine area-under-the-curve of the selected ion chromatograms of the aligned features across all runs and relative abundances calculated.

Gene Ontology Enrichment

We performed gene ontology (GO) enrichment analyses using the website version of DAVID (v6.8) (26). We uploaded gene lists to DAVID (https://david.ncifcrf.gov/tools.jsp) and saved outputs for all three GO categories (biological process (BP), cellular components, molecular functions) and associated statistical values. We used the default D. melanogaster gene list as background in DAVID to identify enriched GO terms associated with the DmSP3 (foreground n = 3176). To identify enriched GO terms associated with specific classes of sperm proteins (foreground n detailed in the results), we used the DmSP3 (n = 3176) as the appropriate background for enrichment tests. We performed network comparisons between the DmSP2 and DmSP3 using the ClueGO plugin v2.5.8 (27) for Cytoscape (v3.9.0) (28) using the default D. melanogaster genome background to generate enriched GO categories using a right-sided hypergeometric test and p-values, adjusted using Benjamini-Hochberg for multiple testing correction. Enriched GO categories with false-discovery rate values below 1% are reported. Specific parameters details are found in the figure legends.

Evolutionary Rates

We calculated the rate of nonsynonymous (dN) to synonymous (dS) nucleotide substitutions (dN/dS) for D. melanogaster genes using an existing pipeline (29). We downloaded amino acid sequences and coding sequences for D. melanogaster (BDGP6.32) and coding sequences for Drosophila sechellia (dsec_r1.3), Drosophila simulans (ASM75419v3), and Drosophila yakuba (dyak_caf1) from Ensembl (30). For each species, we identified the longest isoform of each gene and identified orthologs using reciprocal BLASTn (31), with a minimum 30% identity and 1 × 1010 E-value cut-off. We identified reciprocal 1:1 orthologs between all four species by the highest BLAST score and identified open reading frames using BLASTx. We then aligned orthologs using PRANK (32) and masked poorly aligned reads with SWAMP (33) using a minimum sequence length = 150, nonsynonymous substitution threshold = 7, and window size = 15. We retained 11,715 orthologs for analysis after filtering poorly aligned orthologs and those with sequence length <30 bp. We calculated one-ratio estimates (model 0) with an unrooted phylogeny: ((D. simulans, D. sechellia), D. melanogaster, D. yakuba), using the CODEML package in PAML (34), and filtered orthologs with a branch specific dS ≥ 2 or where S∗dS ≤ 1 to avoid mutational saturation. In total, we retained dN/dS estimates for 11,417 genes after filtering, including 2571 (80.95%) proteins in the DmSP3. We tested for differences in evolutionary rates between independent sets of genes using Mann-Whitney U tests.

Experimental Design and Statistical Rationale

We designed experiments to (i) maximize proteome coverage, (ii) measure the relative abundance of individual proteins in the proteome using label-free quantitation, and (iii) examine sample purity by measuring the magnitude of adventitious protein binding and contamination in our samples. We performed three independent experiments using three treatments of purified sperm samples. In experiment 1, we collected three biological replicates of sperm in PBS only. In experiment 2, sperm were collected in either PBS and Halt protease inhibitor (“Halt” treatment), PBS only (“NoHalt” treatment), or PBS containing 0.1% Triton X100 without protease inhibitor (“PBST” treatment). In experiment 3, we collected four biological replicates of sperm prepared using either PBS (“PBS” treatment) or 2.5 M NaCl (“Salt” treatment).

We applied strict thresholds for peptide and protein identification by setting a false-discovery rate threshold at 1.0%, calculated using a reverse-concatenated target/decoy strategy in Percolator. We calculated label-free quantification (LFQ) ion intensities using the Minora feature detector in Proteome Discoverer to determine area-under-the-curve and summed technical replicates prior to analysis. To test for differences in abundance between treatments, we fit protein-wise negative binomial generalized linear models (see Statistical analysis). For experiment 2, the comparison between the Halt and NoHalt treatments was performed to determine whether active proteases were present in purified sperm samples. As only 16 proteins showed differential abundance between the Halt and NoHalt treatments (see supplemental Fig. S1), we pooled these treatments and considered them together as controls. To test the effect of detergent treatment, we subsequently performed differential abundance analysis comparing the PBST treatment to the average of both controls (Halt and NoHalt), excluding these 16 proteins (see supplementary analysis https://martingarlovsky.github.io/DmSP3/). To rank order protein abundances, we calculated a grand mean abundance for each protein across all three experiments. We excluded the PBST treatment samples from our estimates as PBST treatment significantly altered the proteome composition compared to other samples (see Results).

Statistical Analysis

We performed all statistical analysis in R v4.03 (35). All code and analyses are available via GitHub (https://martingarlovsky.github.io/DmSP3/).

To test for nonrandom distribution of sperm proteins across the polytene chromosomes, we downloaded the chromosomal location for all genes in the genome from FlyBase.org (36) and calculated the total numbers and proportion of genes on each chromosome. We then summed the observed number of genes found in the sperm proteome on each chromosome and calculated the expected number based on the total number of sperm proteins identified by multiplying the total number of sperm proteins (n = 3176) by the expected proportion. We calculated Χ2 statistics for each chromosome and the associated p-values with one degree of freedom and used the Benjamini-Hochberg procedure to correct for multiple testing. We excluded analysis of the Y chromosome due to the small number of protein-coding genes. To test for nonrandom distribution of sperm genes across ages classes, we downloaded gene age information from http://gentree.ioz.ac.cn/download.php (37) and grouped as ancestral (class 0; common to the Drosophila genus; n = 12,013), subgenus Sophophora (classes 1 + 2; n = 416), melanogaster group (class 3; n = 200), melanogaster subgroup (class 4; n = 334) or recent (classes 5 + 6; n = 120). We tested if sperm genes were randomly distributed across age classes compared to the rest of the genome as above, calculating the observed number of genes in each age class across the genome and among sperm proteins and calculating Χ2 statistics comparing the observed versus expected number of genes in each age class, using the Benjamini-Hochberg procedure to correct for multiple testing.

To test for differences in protein abundance between ribosomal proteins compared to the DmSP3 average, independent groups of X-linked, Y-linked, or autosomal-proteins or between ‘high confidence’ Sfps, ‘low confidence/transferred’ Sfps versus remaining sperm proteins, we calculated the grand mean abundance across all three experiments excluding the PBST treatment (see Experimental design and statistical rationale). To define Sfps, we used the database compiled by Wigby et al. (38), who categorized ‘high confidence’ Sfps based on biochemical and bioinformatics data (n = 292) or ‘low confidence/transferred’ Sfps (n = 321) that exhibit expression or functional characteristics suggesting a potential Sfps but which require further investigation (38). We filtered proteins identified by two or more unique peptides and found in at least three biological replicates in at least one treatment group (where applicable). We performed Kruskal-Wallace rank-sum tests followed by pairwise Wilcoxon rank-sum tests corrected for multiple testing using the Benjamini-Hochberg procedure. For experiments 2 and 3, we performed differential abundance analyses using edgeR (39). For experiment 2, we filtered proteins with values in seven out of nine biological replicates. For experiment 3, we filtered to include proteins identified in at least five replicates (i.e., in at least three out of four biological replicates of one treatment).

Results

Overview of the DmSP3

In the current study, we identified 2562 proteins across our three experiments before filtering (supplemental Fig. S2), of which 1965 (76.7%) were identified by two or more unique peptides in a single experiment (n = 1412) or in two or more replicates across any experiment (n = 1867). We obtained 106,498, 91,952, and 197,551 peptide-spectrum matches for experiments 1, 2, and 3, respectively. The numbers of peptide-spectrum matches were highly correlated between replicates within each experiment (mean Pearson’s correlation coefficient = 0.841, range 0.657–0.926, all p < 0.001). We measured relative protein abundances of 2125 proteins (82.9%) using LFQ. As expected from our previous study (12), α- and β-tubulins and S-Laps were among the most abundant proteins identified (Table 2). Also present were proteins of unexpected sperm prevalence including ocnus and janus B, a pair of duplicated gene products encoding a testis-specific phosphohistidine phosphatase (40), numerous Sfps, and over 80 ribosomal proteins.

Table 2.

Most abundant proteins in the DmSP3

FBgn Name Chromosome arm
FBgn0003884 α-Tubulin at 84B 3R
FBgn0003889 β-Tubulin at 85D 3R
FBgn0259795 loopin-1 2R
FBgn0003885 α-Tubulin at 84D 3R
FBgn0033868 S-Lap 7 2R
FBgn0035915 S-Lap 1 3L
FBgn0052064 S-Lap 4 3L
FBgn0045770 S-Lap 3 3L
FBgn0039071 big bubble 8 3R
FBgn0034132 S-Lap 8 2R
FBgn0041102 Ocnus 3R
FBgn0031545 CG3213 2L
FBgn0037862 Mitochondrial aconitase 2 3R
FBgn0038373 CG4546 3R
FBgn0002865 Male-specific RNA 98Ca 3R
FBgn0035240 CG33791 3L
FBgn0025111 Adenine nucleotide translocase 2 X
FBgn0069354 Porin2 2L
FBgn0052351 S-Lap 2 3L
FBgn0012036 Aldehyde dehydrogenase 2L

Top 20 most abundant proteins in the DmSP3 by LFQ (rank ordered).

Overall, we found highly consistent estimates of protein abundances between experiments. Protein abundances were strongly correlated between experiments (Pearson’s correlation = 0.86–0.89, all p < 0.001; supplemental Fig. S3) and median coefficients of variation for each experiment ranged from 0.018 to 0.054. We performed analyses using the entire DmSP3 (n = 3176; supplemental Table S1), combining the 2562 proteins identified in the current study with the 1108 proteins identified in the DmSP2 (7, 11) (Fig. 1A). While the current article was in review, another study published a list of 1409 proteins in sperm dissected from the seminal vesicles (20), the majority of which we also identified in the DmSP3 (1242/1409; 88.1%). The increased proteome coverage we achieved may be due to differences in sample preparation and experimental design (see Discussion).

Fig. 1.

Fig. 1

Proteins identified in the Drosophila melanogaster sperm proteome.A, overlap between DmSP1, DmSP2, and the current study, together making up the DmSP3 (n = 3176). B, number of D. melanogaster genes found in the DmSP3 with human homologs and disease associated phenotypes from the Online Mendelian Inheritance in Man database (OMIM.org). C, number of D. melanogaster sperm protein genes with none (gray), one (orange), or more than one (blue) associated disease phenotype. DmSP3, Drosophila melanogaster sperm proteome.

GO and Network Analyses

The DmSP3 is considerably larger than the DmSP2 (Fig. 1A), and GO analysis using DAVID identified 24 significantly enriched BP categories (Fig. 2A and supplemental Table S2). As expected, major categories included processes involved in energy transduction (e.g., oxidation-reduction, glycolysis, TCA cycle) and reproduction. Other sperm-specific functions included terms related to microtubule and cilium movement. Surprisingly, the GO term “translation” was a prominent member in this analysis containing 78 cytosolic and mitochondrial ribosomal proteins. To further explore the GO category representation in the DmSP2 and DmSP3, we generated a heat map between the two proteomes in Cytoscape using ClueGO (Fig. 2B). Similar to our previous analysis of the DmSP1 and DmSP2 (11), most of the categories were equal or nearly equal in their shared properties with the one obvious exception being the aforementioned translation BP category as discussed further below.

Fig. 2.

Fig. 2

GO functional network enrichment analysis and comparison of the DmSP2 and DmSP3.A, Bar graph of the 24 GO Biological Process categories identified in the DmSP3 by DAVID (26). Only functional enrichment groups with Benjamini-Hochberg corrected p-values < 0.01 and passing a 1% false-discovery rate threshold are shown. Note: some GO terms have been combined for clarity; see supplemental Table S2 for complete list of GO terms. B, GO Biological Process network comparison between the DmSP3 (3176 proteins) and DmSP2 (1108 proteins) using the ClueGO plugin for Cytoscape. Color-coded nodes within the network depict the degree of relative compositional enrichment of each dataset. The network is comprised of 22 groups (each comprised of at least 30 genes associated with a common GO functional term) containing a total of 1431 proteins. Node compositional enrichment for proteins identified in the current study (highlighted in red) when node composition bias exceeds 60%, while gray nodes indicate equal representation. Bold letters indicate one highly enriched category of proteins involved in cytoplasmic translation. DmSP3, Drosophila melanogaster sperm proteome; GO, gene ontology.

Human Disease Homologs

Genes in the DmSP3 are highly conserved, with 81.8% (2598/3176) having human homologs, compared to 48% of all Drosophila genes (41). Fully 37.8% (1202/3176) of DmSP3 genes have a homolog in humans associated with a known disease or syndrome (Online Mendelian Inheritance in Man database; OMIM.org; Fig. 1B). Over one third (34.7%; 417/1202) of disease associated DmSP3 genes have more than one human disease homolog (Fig. 1C). Among the most prevalent disease phenotypes found were susceptibility to autism, primary ciliary dyskinesia, spermatogenic failure, and myofibrillar- and congenital-myopathy (Table 3).

Table 3.

Human disease homologs in the DmSP3

OMIM phenotype N
Autism, Susceptibility To; AUTS20, AUTSX1, AUTSX2 27∗
Ciliary Dyskinesia, Primary; CILD40, CILD3, CILD7 25∗
Spermatogenic Failure; SPGF39, SPGF45, SPGF46 24∗
Myopathy; CFTD, MFM2, Fatal Infantile Hypertonic, Alpha-B Crystallin-Related 24∗
Hypertension, Essential 23
Type 2 Diabetes Mellitus; T2D 21
Asperger Syndrome, X-Linked, Susceptibility To; ASPGX1, ASPGX2 18∗
Cataract, Multiple Types; CTRCT16, CTRCT9 16∗
Ichthyosis, Congenital, Autosomal Recessive; ARCI4A, ARCI4B 16∗
46, XY Sex Reversal 8; SRXY8 12
Colorectal Cancer; CRC 11
Encephalopathy, Familial, With Neuroserpin Inclusion Bodies; FENIB 11
Ghosal Hematodiaphyseal Dysplasia; GHDD 10
Plasminogen Activator Inhibitor-1 Deficiency 10
Vitamin D-Dependent Rickets, Type 3; VDDR3 10
Deafness, Autosomal Recessive 91; DFNB91 9
Leukemia, Acute Myeloid; AML 9
Maturity-Onset Diabetes of The Young, Type 8, With Exocrine Dysfunction; MODY8 9
Pseudoxanthoma Elasticum; PXE 9
Cardiomyopathy, Dilated, 1II; CMD1II 8
Charcot-Marie-Tooth Disease, Axonal, Type 2F; CMT2F 8
Neuronopathy, Distal Hereditary Motor, Type IIB; HMN2B 8
Surfactant Metabolism Dysfunction, Pulmonary, 3; SMDP3 8

Most common human disease phenotypes from the Online Mendelian Inheritance in Man database (OMIM.org) associated with D. melanogaster genes found in the DmSP3. N = number of D. melanogaster genes associated with each phenotype. Similar disease phenotypes (marked with an asterisk) have been grouped. Complete list of disease associations can be found in supplemental Table S15.

Ribosomal Proteins in the DmSP3

Almost one-half of all D. melanogaster ribosomal proteins listed in FlyBase.org (83/169; 49.1%, including paralogs) were identified in the DmSP3 (supplemental Table S3). We identified the majority of cytoplasmic ribosomal proteins (76/93; 81.7%) but only 7 out of 76 (9.2%) mitochondrial ribosomal proteins. There was no significant difference in ribosomal protein abundance compared to the DmSP3 average (Kruskal-Wallis rank sum test, X2 = 0.063, df = 1, p = 0.803; Fig. 3A), suggesting that ribosomal proteins identified in the DmSP3 are integral to the sperm proteome and not artefactual. Furthermore, 72 of the 83 ribosomal proteins we identified were also identified by McCullough et al. (20).

Fig. 3.

Fig. 3

Ribosomal proteins in the DmSP3.A, abundance of ribosomal proteins identified in the DmSP3 compared to the remaining sperm proteome (‘other’). Colored points represent the abundance of individual proteins. Black points show the mean and thick and thin bars represent the 33% and 66% confidence intervals, respectively. We compared abundances using a Kruskal-Wallace rank sum test. B, representation of large and small cytoplasmic and mitochondrial ribosomal proteins in the brain, DmSP3, embryo, or oocyte. The dashed line represents the total number of ribosomal proteins in each class and asterisks represent results from comparing the observed to expected number of proteins identified using the Χ2 distribution after multiple testing correction. C, overlap between the total number of ribosomal proteins identified in the DmSP3 and brain tissue. n.s.; non-significant; ∗p < 0.05; ∗∗∗p < 0.001. DmSP3, Drosophila melanogaster sperm proteome.

The canonical ribosome contains 80 ribosomal proteins including 13 paralog pairs in Drosophila (FlyBase.org). Although the significance of paralog heterogeneity for ribosome function is currently unknown, paralog switching of ribosomal proteins has been observed in gonads and other tissues (42). We therefore compared ribosomal protein paralogs in the DmSP3 to those previously described in four tissue types including the testis (42). Significant differences were found between all four tissues as only three ribosomal protein paralogs were observed in the DmSP3 (RpL22-like, RpS14b and RpS28b) whereas both ribosomal proteins and paralog ribosomal proteins were found in all tissues with the exception of two (Rp10Aa and RpS14a; supplemental Table S4). Notably, RpL22-like is more abundant in the testis (42) whereas RpL22 was more abundant in the DmSP3. For the remaining paralogs, we identified only one member of each pair: the most abundant paralog found in the testis for seven and the less abundant paralog for two (supplemental Table S4). For RpL10Ab and RpS14b, only one paralog was identified in both the current study and Hopes et al. (42), and we did not identify either paralog of RpS10 (RpS10a or RpS10b). Together, these results suggest a complex landscape of paralog switching in the gonad during spermatogenesis and highlight distinct differences between sperm-ribosomal protein and testis-ribosomal protein populations.

The finding of a large number of ribosomal proteins in the DmSP3 was unexpected given sperm are thought to be transcriptionally quiescent. Therefore, we next compared the representation of ribosomal proteins found in the DmSP3 to three other recent proteomic studies in D. melanogaster which used Lumos Fusion Orbitrap mass-spectrometers, representing the female germline and two somatic tissues; embryo (43), unfertilized oocyte (44), and brain (45). All four tissue/cell types identified most cytoplasmic ribosomal proteins, with a slight underrepresentation of large cytoplasmic subunits in the brain (Fig. 3B). The DmSP3 and brain both showed significant underrepresentation of large and small mitochondrial ribosomal proteins, whereas oocyte and embryo showed almost complete representation of all ribosomal subunits (Fig. 3B). Significantly more ribosomal proteins identified in the brain or sperm were shared between tissues (64/98; 65.3%) than expected by chance (Fisher’s exact test, p < 0.001; Fig. 3C).

Chromosomal Distribution of Sperm Proteins

Sperm proteins were underrepresented on the X- (X2 = 12.6, df = 1, p = 0.002) and 3L- (X2 = 11.8, df = 1, p = 0.002) chromosomes (Fig. 4A); a pattern that was previously reported for X-linked genes in the DmSP1 (7) but not replicated in the DmSP2 (11). Protein abundance of X-linked proteins was significantly lower than those on autosomes (Mann-Whitney U test, p = 0.041) or the Y chromosome (Mann-Whitney U test, p < 0.001; Fig. 4B). We identified 9 of the 16 known proteins encoded on the Y chromosome (Table 4). The average abundance of Y-linked sperm proteins was higher than autosomal sperm proteins (Mann-Whitney U test, p < 0.001); six within the top 20% most highly abundant proteins, and all within the top 50% (Fig. 4B and Table 4).

Fig. 4.

Fig. 4

Chromosomal distribution of DmSP3 proteins.A, chromosomal distribution of sperm proteins. Numbers below bars are the observed and expected number of genes on each chromosome, respectively, and the dashed line indicates the null expectation. Asterisks represent results from comparing the observed to expected number of genes using the Χ2 distribution after multiple testing correction. B, abundance of sperm proteins found on autosomes (‘A’) and sex chromosomes (‘X’ or ’Y’). Points, representing individual proteins, are omitted from autosomes for clarity. Asterisks represent results from pairwise Wilcoxen rank-sum test corrected for multiple testing using the Benjamini-Hochberg procedure. n.s., non-significant; ∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001. DmSP3, Drosophila melanogaster sperm proteome.

Table 4.

Y-linked sperm proteins in the DmSP3

FBgn Name Ranked abundance (%) Sterile
FBgn0267433 male fertility factor kl5 98.8 Yes
FBgn0267432 male fertility factor kl3 98.8 Yes
FBgn0058064 Aldehyde reductase Y 95.5 No
FBgn0001313 male fertility factor kl2 93.6 Yes
FBgn0046323 Occludin-Related Y 92.9 No
FBgn0267449 WD40 Y 86.4 Yes
FBgn0267592 Coiled-Coils Y 86 Not studied
FBgn0046697 Ppr-Y 78.3 No
FBgn0046698 Protein phosphatase 1, Y-linked 2 65.2 No

Genes are rank ordered by mean abundance and association with male fertility from gene knockout/knockdown experiments (57, 58) are shown.

Seminal Fluid Proteins Identified in the DmSP3

Sfps have been extensively studied in Drosophila with over 600 putative Sfps identified to date including 292 that are considered ‘high confidence’ (38). A surprisingly high number of Sfps were identified in the DmSP3 (122 ‘high confidence’ Sfps; 156 ‘low confidence/transferred’ Sfps; supplemental Table S1) (38). Another study also recently found a number of Sfps associated with sperm in the seminal vesicles, of which, we identified almost all (57/62; 91.9%) (20). We found no significant difference in abundance between Sfps and the remaining DmSP3 (Kruskal-Wallace rank-sum test, X2 = 4.28, df = 1, p = 0.118; Fig. 5A) and 44 ‘high confidence’ Sfps were at, or above, the median abundance of the DmSP3 (supplemental Table S5).

Fig. 5.

Fig. 5

Seminal fluid proteins in the DmSP3.A, log2 abundance of proteins found in the DmSP3 classified as ‘high confidence’ Sfps, ‘low confidence or transferred’ Sfps by Wigby et al. (38), or remaining sperm proteins. B, volcano plot for difference between PBST treatment versus the average of both controls (Halt and NoHalt) in experiment 2. Positive values indicate higher abundance in controls. C, MA plot for difference between NaCl treatment versus PBS control. Positive values indicate higher abundance in NaCl treatment. For (B) and (C), points are colored as in (A) denoting ‘high confidence’ (yellow) and ‘low confidence/transferred’ (turquoise) Sfps or remaining sperm proteins (purple) that showed significant differences in abundance based on a |logFC| > 1 and false discovery rate corrected p-value < 0.05. Several Sfps are labeled in (B) that showed differential abundance between treatments. Sfps are labeled in (C) that were among the top 10% most abundant proteins and the three Sfps (bold) that showed significant differences in abundance between treatments (yellow). DmSP3, Drosophila melanogaster sperm proteome; Sfps, seminal fluid proteins.

We therefore examined the binding characteristics of the Sfps by washing purified sperm with a strong anionic detergent (Triton X-100) known to disrupt plasma membranes. Following detergent treatment, 1600 proteins were identified, the majority (1063/1600; 66%) identified by two or more unique peptides. We identified 198 proteins that were lower abundance in PBST compared to controls (supplemental Table S6) and three proteins more abundant in PBST samples (Fig. 5B).

Of the 60 ‘high confidence’ Sfps identified by two or more unique peptides in experiment 2, 17 (28.4%) were filtered out prior to analysis (including 14 which were not detected in PBST samples in any replicate), and 29 (48.3%) were found at significantly lower abundance in PBST samples, together suggesting these proteins are weakly bound or found on the sperm plasma membrane. The remaining 14 (23.3%) Sfps showed no significant difference in abundance, suggesting tight association with sperm (Table 5). Additionally, 13 out of 53 (24.5%) ribosomal proteins detected in experiment 2 were significantly lower in abundance after PBST treatment. Proteins lower in abundance after PBST treatment showed GO enrichment of multicellular organism reproduction, mitochondrial transport, transmembrane transport, cytoplasmic translation, and sarcomere organization (BP). Thus, as expected, PBST treatment stripped lipids and membrane and membrane-bound proteins (including Sfps) from sperm (supplemental Table S7).

Table 5.

Seminal fluid proteins identified in the sperm proteome after PBST treatment

FBgn Name Chromosome arm
FBgn0011694 Ejaculatory bulb protein II 2R
FBgn0261055 Seminal fluid protein 26Ad 2L
FBgn0004181 Ejaculatory bulb protein 2R
FBgn0003885 alpha-Tubulin at 84D 3R
FBgn0260745 midline fasciclin 3R
FBgn0036970 Serpin 77Bc 3L
FBgn0036969 Serpin 77Bb 3L
FBgn0259975 Seminal fluid protein 87B 3R
FBgn0034709 Secreted Wg-interacting molecule 2R
FBgn0264815 Phosphodiesterase 1c 2L
FBgn0020414 Imaginal disc growth factor 3 2L
FBgn0050104 Ecto-5′-nucleotidase 2 2R
FBgn0052203 Serpin 75F 3L
FBgn0003748 Trehalase 2R

In experiment 3, we washed sperm samples with high molar salt expected to weaken ionic bonds and eliminate nonspecific protein binding to sperm. We identified 1890 proteins, of which, 1273 (65%) were identified by two or more unique peptides. After filtering (see Experimental procedures), we performed differential abundance analysis for 1202 proteins and identified 92 differentially abundant proteins, including 3 Sfps (Sfp33A3, aquarius [CG14061], and CG6628) (Fig. 5C). The remaining 48 ‘high confidence’ Sfps we identified in this experiment did not show significant differential abundance between treatments, with six Sfps in the top 20% most abundance proteins (regucalcin, Acp36DE, CG31872, Transferrin 1, CG17097, and α-Tubulin at 84D) (Fig. 5C).

Gene–Protein Abundance Concordance

To explore the relationship between protein abundance and gene expression for the 68 ‘high confidence’ Sfps tightly binding to sperm following detergent or salt treatment (‘sperm associated Sfps’; supplemental Table S8), we compared gene expression (fragments per kilobase of transcript per million mapped reads) for all proteins identified in the DmSP3 between the accessory glands, carcass, ovary, and testis using data retrieved from FlyAtlas2 (46). The average expression of both sperm-associated Sfps and the remaining Sfps identified in the DmSP3 was highest in the accessory glands, while the remaining DmSP3 proteins were most highly expressed in testis (supplemental Fig. S4A). However, seven sperm-associated Sfps showed higher expression in the testis than accessory glands (supplemental Table S9).

The abundance of proteins in the DmSP3 had the strongest correlation (β) and best fit (R2) in the testis (β = 0.460, R2 = 0.133, p < 0.001, n = 1498) (supplemental Fig. S4B). Protein abundance of sperm-associated Sfps was positively correlated with gene expression in the testis (β = 0.399, R2 = 0.152, p = 0.006, n = 49) but not the accessory glands (p = 0.246), carcass (p = 0.052), or ovary (p = 0.271). The abundance of remaining Sfps identified in the DmSP3 was positively correlated with gene expression in the accessory glands (β = 0.274, R2 = 0.197, p = 0.004, n = 41) and testis (β = 0.281, R2 = 0.147, p = 0.040, n = 29) but not the carcass (p = 0.109) or ovary (p = 0.677) (supplemental Fig. S4C). Therefore, our results suggest sperm-associated Sfps show tighter regulation with gene expression in the testis than accessory glands.

Gene Age

A variety of mechanisms drive genomic and protein diversity including gene duplication and retroposition (5, 37), resulting in unique, lineage-specific patterns of gene age (47, 48, 49). Newly evolved genes frequently acquire testis-biased gene expression (50) and it was therefore of interest to query the gene age landscape of the DmSP3. There were fewer ‘recent’ (X2 = 6.58, df = 1, p = 0.026), ‘melanogaster subgroup’ (X2 = 9.69, df = 1, p = 0.009), and ‘Sophophora-group’ (X2 = 5.51, df = 1, p = 0.032) age genes than expected by chance, indicating genes-encoding sperm proteins are underrepresented in more recent evolutionary time (Fig. 6A). We identified 13 genes of recent origin, of which five were located on the X chromosome and four of which are Sfps (supplemental Table S10).

Fig. 6.

Fig. 6

Sperm evolutionary dynamics.A, gene age distribution of sperm proteins. Numbers below bars are the observed and expected number of genes in each age class, respectively, and the dashed line indicates the null expectation. Asterisks represent results from comparing the observed to expected number of genes using the Χ2 distribution after multiple testing correction. B, mean (± standard error) nonsynonymous (dN) to synonymous (dS) nucleotide substitution rate (dN/dS) estimates for sperm proteins. Asterisks represent results from Mann-Whitney U tests comparing each gene set (OMIM, Sfp, X-linked) to the genome average (‘All’), excluding proteins in that set. Dashed line represents the genome average (mean dN/dS = 0.110, standard error = 0.001, n = 11,417). Numbers below points indicate numbers of genes in each category. Note: groups are not necessarily mutually exclusive, that is, ‘OMIM’ proteins may also be ‘X-linked’, etc. n.s., not significant; ∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001. Sfps, seminal fluid proteins.

Sperm Evolutionary Rates

Genes in the DmSP3 evolve more slowly than the genome average (Mann-Whitney U test, p < 0.001). This pattern remains when considering X-linked sperm proteins compared to the genome average (p < 0.001), which evolve at a similar rate to the DmSP3 average (p = 0.958; Fig. 6B). Sfps in the DmSP3 evolve faster than the DmSP3 average (p < 0.001), at a similar rate to other Sfps (p = 0.232; supplemental Fig. S5), whereas genes with a human disease homolog (OMIM.org) evolve more slowly than the DmSP3 average (p < 0.001; Fig. 6B).

The top 10%, fastest evolving genes in the DmSP3 (dN/dS [mean ± s.e.] = 0.313 ± 0.009, n = 257, supplemental Table S11) showed GO enrichment for multicellular organism reproduction (BP) and extracellular space (cellular components) (supplemental Table S12). The bottom 10%, slowest evolving genes in the DmSP3 (dN/dS = 0.004 ± 0.0002, n = 258, supplemental Table S13), showed BP enrichment for cytoplasmic translation, centrosome duplication, regulation of cell shape, ribosomal large subunit assembly, tricarboxylic acid cycle, ATP hydrolysis coupled proton transport, cell adhesion, oocyte microtubule cytoskeleton polarization, and endocytosis (supplemental Table S14).

Discussion

In summary, our reanalysis of the D. melanogaster sperm proteome (DmSP3) more than doubled the number of identified proteins, dramatically increased representation of ribosomal proteins, and highlighted several human neurological disease homologs. LFQ identified highly abundant tubulins, S-Laps, Y-linked sperm proteins, and ocnus, a testis-specific protein. LFQ also provided direct evidence for lowered abundances of X-linked sperm proteins. Sperm genes evolve relatively slowly and are underrepresented in recent age classes, consistent with evolutionary constraint acting on the sperm proteome. Finally, we identified a number of Sfps in the DmSP3, which were resistant to detergent or high molar salt treatment, suggesting some Sfps are integral to the sperm proteome.

The increased (>2-fold) depth of proteome coverage is likely due to improved protein extraction, efficiency of trypsination/peptide recovery, and direct injection methods employed in this study. Traditionally, SDS-PAGE off-line prefractionation has been the method of choice for the analysis of complex proteomes. However, these off-line methods come at a cost: sample loss due to the extra steps involved and the well-known issues of peptide recovery from polyacrylamide gels (51, 52). Although work to alleviate this limitation continues to improve this approach, our results suggest that a combination of high SDS concentrations in the initial solubilization and use of immobilized enzymatic digestion using S-Trap technologies greatly enhanced the yield of usable peptides for bottom-up proteomics. The DmSP3 also contained Yolk protein 2 (Yp2), a protein previously found in sperm (53) but undetected in the DmSP1 or DmSP2. As noted by the authors of this study (53), detection of Yp2 in sperm required large amounts of input protein for detection on immunoblots, suggesting Yp2 was present at very low levels in the testis and sperm (53). Therefore, detection of Yp2 in our study provides additional confidence in the efficacy or our approach.

The sperm proteome is expected to exhibit dynamic gene movement and expression evolution due to its sex-specific expression and essential role for male fertility (5). We found X-linked genes are underrepresented in the sperm proteome, as reported in the DmSP1 (7). Additionally, we show that X-linked sperm proteins were found in significantly lower abundances, consistent with the downstream effects of meiotic sex chromosome inactivation (22, 23, 24) and/or resolution of intralocus sexual conflict (54, 55, 56). In contrast, more than half of Y-linked proteins (9/16) including known fertility factors were present in the DmSP3 (57, 58). Our LFQ analysis revealed all nine Y-linked protein abundances were above the DmSP average, with 7/9 in the top 10%. This is the first quantitative assessment of this important class of proteins in sperm and adds direct empirical evidence in support of the long-standing hypothesized structural role in the assembly of the sperm axoneme (59).

We found sperm proteins evolve more slowly than the genome average. Slow rates of molecular evolution could be due to purifying selection or weak selection acting on sperm genes as they are shielded from selection in females (7, 60, 61). Sperm proteins were also underrepresented in recent evolutionary age classes and over 80% had human homologs, supporting the idea that sperm genes are under evolutionary constraint. A recent study found that Sfps are overrepresented in recent age classes (62), indicating different evolutionary forces acting on sperm versus nonsperm components of the ejaculate. Sfps in the DmSP3 evolve at a similar rate to Sfps found elsewhere in the genome and more quickly than the DmSP3 average, suggesting similar evolutionary pressures affecting rates of Sfp evolution across tissues.

The abundance of ribosomal proteins in the DmSP3 was unexpected given that sperm are stripped of most cellular machinery prior to maturation. However, sperm may undergo post ejaculatory modifications, perform secondary sexual functions, or provision the developing zygote after fertilization, requiring protein synthesis (18, 63, 64, 65). Sperm function beyond delivering a haploid compliment of nuclear material for fertilization still remains relatively underexplored (16, 17, 18). The presence of a large repertoire of core ribosomal proteins delivered to the egg during fertilization raises the intriguing possibility that paternally derived ribosomes are active during zygote formation and perhaps beyond.

Another intriguing finding that sperm had higher abundance of RpL22 versus the paralog RpL22-like, opposite from levels found in the testis (42), suggests a complex pattern of paralog switching and selectivity during spermatogenesis. While the functional significance of this selectivity is unknown, they are interesting to consider in the context of the known mRNA repertoire in Drosophila sperm delivered to the egg at fertilization (65). Fully 33% of the total sperm mRNA repertoire encoded ribosomal proteins (47/142; ref. (65)), a striking coincidence that warrants further study. We also found similarity in the underrepresentation of mitochondrial ribosomal proteins in both the DmSP3 and brain, providing yet another example of the molecular similarities between these two tissue types (66). Finally, we note that the DmSP3 contains as many as 300 entries with GO annotation terms related to neuronal structure and function, lending additional support to the similarities drawn between the brain and testis.

Possible Testis Origin of Seminal Fluid Proteins

Although some Sfps were previously identified, but not quantified in the DmSP2 (11), the unexpectedly high numbers (and in some cases, relative abundances) of Sfps found in the DmSP3 adds to an expanding landscape of Sfp biology. As Sfps are thought to be primarily secreted from the paired accessory glands and the ejaculatory bulb in Drosophila (67), our results raise the possibility that some Sfps are integral to the sperm proteome and are secreted from the testes or seminal vesicles, or bind to sperm prior to mixing in the ejaculatory duct. We identified 122 ‘high confidence’ Sfps (38) in the DmSP3 which is unlikely artefactual given that many Sfps were found in multiple biological replicates and in independent experiments. An independent study conducted in another laboratory using a different Drosophila strain identified an almost identical list of ‘sperm-associated Sfps’ (20), thus providing strong evidence in support of a possible testis origin of some Sfps. Denaturing the sperm plasma membrane using detergent stripped most (75%) Sfps from the sperm proteome, suggesting these Sfps are integral to the sperm plasma membrane or bound to sperm advantageously in the seminal vesicles prior to mixing in the ejaculatory duct. High molar salt had little effect on the composition of the sperm proteome, indicating some Sfps are bound strongly to sperm.

We identified 68 ‘sperm-associated Sfps’ that were not depleted by detergent or salt treatment. Thus, we suggest several of the ‘high-confidence’ Sfps in the DmSP3 and also highly expressed in the testes (supplemental Table S9) should be classified as sperm proteins. In addition, α-Tubulin at 84D (FBgn0003885) is a major constituent of microtubules and involved in sperm axoneme assembly and therefore likely a sperm protein (See also supplemental Tables S1 and S2 in (20)). Notably, Acp36DE was consistently among the most abundant proteins in our experiments. Previous studies have shown Acp36DE tightly binds to sperm in the female reproductive tract after mating and is essential for efficient sperm storage in the female sperm storage organs (19, 68). The possibility that Sfps bind to sperm in the seminal vesicles prior to mixing in the ejaculatory duct should be investigated further (20). Moreover, the potential for the testes, seminal vesicles, or perhaps even sperm cells, to secrete proteins, including Sfps, requires further investigation.

Finally, the DmSP3 contains over 1200 human disease homologs. The prominence of several neurological diseases (e.g., Primary Ciliary Dyskinesia, susceptibility to autism, encephalopathy, and neuropathy) may be related to the shared functional designs of sperm and neurons, cells of extraordinary axial ratios transmitting biological information over large distances. It will be of great interest to tease out the significance of this subset of neural-related DmSP3 proteins in the context of sperm function and its related reproductive activities and possible relevance for study of human diseases.

Conclusion

Our reanalysis of the D. melanogaster sperm proteome using improved separation and detection methods and an updated genome annotation highlights several key features of sperm function and evolution, including the prominence of proteins integral to sperm development (tubulins and S-Laps), the dynamic nature of sex-linked sperm genes, and constraints on sperm proteome evolution. We also show the prevalence of many ribosomal proteins, despite the expectation that sperm are transcriptionally silent. The parallels in ribosomal protein composition and occurrence of several human neurological disease homologs also lend further support to the functional similarities between sperm and neurons. Finally, we demonstrate that a significant number of Sfps are found in the sperm proteome raising the possibility that Sfps mix with sperm in the seminal vesicles or Sfps may be secreted from the testes, seminal vesicles, or even sperm cells.

Data Availability

Proteomic data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (69) with the identifier PXD032033. All code and analyses are available from GitHub: https://martingarlovsky.github.io/DmSP3/

Supplemental data

This article contains supplemental data (46).

Conflict of interest

The authors declare that they have no conflicts of interest with the contents of this article.

Acknowledgments

We thank Caitlin McDonough-Goldstein and Maria Vibranovski for helpful discussion, Alison Wright, Daniela Palmer, and Leeban Yusef for advice analyzing evolutionary rates. We are grateful to Eric Sedore and Larne Pekowsky who provided computing services (Syracuse University HTC Campus Grid), the authors whose data was used in this study, and the curators of FlyBase.org for continued maintenance of this essential resource. We are grateful to three anonymous reviewers who provided helpful feedback and comments on the article. This work was funded in part by the NSF award ACI-1341006, the Biodesign Institute and ASU Knowledge Enterprise Mass Spectrometry Research Facility.

Author contributions

M. D. G. and T. L. K. conceptualization; J. A. S. and T. L. K. investigation; M. D. G. and T. L. K. formal analysis; M. D. G. and T. L. K. writing–original draft.

Contributor Information

Martin D. Garlovsky, Email: martin.garlovsky@tu-dresden.de.

Timothy L. Karr, Email: tkarr@asu.edu.

Supplemental Data

Supplemental Fig S1

Volcano plots from pairwise analyses between treatments in experiment two, denoting ‘high confidence’ (yellow) and ‘low confidence/transferred’ (turquoise) Sfps or remaining sperm proteins (purple) that showed significant differences in abundance based on a |logFC| > 1 and false discovery rate corrected p-value < 0.05. Several Sfps are labelled that showed differential abundance between treatments.

mmc1.pdf (171.9KB, pdf)
Supplemental Fig S2

Overlap between proteins identified in the current study. The central Euler diagram shows the overlap between proteins identified in each experiment. The surrounding diagrams show the overlap in number of proteins identified in each replicate of each treatment, coloured according to experiment as in the central diagram.

mmc2.pdf (87.5KB, pdf)
Supplemental Fig S3

Correlations in average protein abundance between each experiment in the current study. Mean protein abundance was calculated across all replicates for each experiment, except experiment two which excluded the PBST treatment. Shown are Pearson’s correlations and line of best fit.

mmc3.pdf (189.5KB, pdf)
Supplemental Fig S4

Gene–protein abundance concordance in the DmSP3. a) Heatmap of mRNA expression of DmSP3 genes (n = 2673) in the accessory glands, carcass, ovary, and testis. Data retrieved from from FlyAtlas2 (46) are log2(FPKM) scaled per gene. The 7 ‘high confidence’ Sfps with higher expression in the testis than accessory glands are highlighted on the right. Labels on the right also show ‘sperm associated Sfps’ (red) and other ‘high confidence’ Sfps (blue) identified in the DmSP3. b) Linear regressions of gene expression on protein abundance in the testis (n = 1498), carcass (n = 1165), accessory glands (n = 1001), and ovary (n = 825). c) Linear regressions of gene expression on protein abundance for ‘sperm associated Sfps’ (SAS) and remaining Sfps identified in the DmSP in each tissue. b) and c) are linear regressions using z-score log2-transformed values after filtering genes with log2-FPKM < 2. Blue lines are model fits from a linear regression, dashed lines indicate a perfect correlation between gene expression and protein abundance.

mmc4.pdf (505.7KB, pdf)
Supplemental Fig S5

Nonsynonymous (dN) to synonymous (dS) nucleotide substitution rate (dN/dS) estimates for proteins in the DmSP3 or elsewhere. Asterisks represent results from Mann-Whitney U tests; n.s., non-significant; ∗∗∗, p < 0.001.

mmc5.pdf (37.8KB, pdf)
Supplemental Tables S1–S15
mmc6.xlsx (1.8MB, xlsx)

References

  • 1.Dorus S., Karr T.L. Sperm Biology: An Evolutionary Perspective. 1st Ed. Academic Press; Burlington, MA: 2009. Sperm proteomics and genomics; pp. 435–469. [Google Scholar]
  • 2.Bayram H.L., Claydon A.J., Brownridge P.J., Hurst J.L., Mileham A., Stockley P., et al. Cross-species proteomics in analysis of mammalian sperm proteins. J. Proteomics. 2016;135:38–50. doi: 10.1016/j.jprot.2015.12.027. [DOI] [PubMed] [Google Scholar]
  • 3.Rowe M., Skerget S., Rosenow M.A., Karr T.L. Identification and characterisation of the zebra finch (Taeniopygia guttata) sperm proteome. J. Proteomics. 2018;193:192–204. doi: 10.1016/j.jprot.2018.10.009. [DOI] [PubMed] [Google Scholar]
  • 4.Pitnick S., Hosken D.J., Birkhead T.R. Sperm Biology: An Evolutionary Perspective. 1st Ed. Academic Press; Burlington, MA: 2009. Sperm morphological diversity; pp. 69–149. [Google Scholar]
  • 5.Rettie E.C., Dorus S. Drosophila sperm proteome evolution. Spermatogenesis. 2012;2:213–223. doi: 10.4161/spmg.21748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kahrl A.F., Snook R.R., Fitzpatrick J.L. Fertilization mode drives sperm length evolution across the animal tree of life. Nat. Ecol. Evol. 2021;5:1153–1164. doi: 10.1038/s41559-021-01488-y. [DOI] [PubMed] [Google Scholar]
  • 7.Dorus S., Busby S.A., Gerike U., Shabanowitz J., Hunt D.F., Karr T.L. Genomic and functional evolution of the Drosophila melanogaster sperm proteome. Nat. Genet. 2006;38:1440–1445. doi: 10.1038/ng1915. [DOI] [PubMed] [Google Scholar]
  • 8.Karr T.L. Fruit flies and the sperm proteome. Hum. Mol. Genet. 2007;16:R124–R133. doi: 10.1093/hmg/ddm252. [DOI] [PubMed] [Google Scholar]
  • 9.Karr T.L. Reproductive proteomics comes of age. Mol. Cell Proteomics. 2019;18(Supplement 1):S1–S5. doi: 10.1074/mcp.E119.001418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Aebersold R., Mann M. Mass-spectrometric exploration of proteome structure and function. Nature. 2016;537:347–355. doi: 10.1038/nature19949. [DOI] [PubMed] [Google Scholar]
  • 11.Wasbrough E.R., Dorus S., Hester S., Howard-Murkin J., Lilley K., Wilkin E., et al. The Drosophila melanogaster sperm proteome-II (DmSP-II) J. Proteomics. 2010;73:2171–2185. doi: 10.1016/j.jprot.2010.09.002. [DOI] [PubMed] [Google Scholar]
  • 12.Dorus S., Wilkin E.C., Karr T.L. Expansion and functional diversification of a leucyl aminopeptidase family that encodes the major protein constituents of Drosophila sperm. BMC Genomics. 2011;12:177. doi: 10.1186/1471-2164-12-177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cox J., Mann M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu. Rev. Biochem. 2011;80:273–299. doi: 10.1146/annurev-biochem-061308-093216. [DOI] [PubMed] [Google Scholar]
  • 14.Zhao L., Cong X., Zhai L., Hu H., Xu J.Y., Zhao W., et al. Comparative evaluation of label-free quantification strategies. J. Proteomics. 2020;215 doi: 10.1016/j.jprot.2020.103669. [DOI] [PubMed] [Google Scholar]
  • 15.Yu F., Haynes S.E., Nesvizhskii A.I. IonQuant enables accurate and sensitive label-free quantification with FDR-controlled match-between-runs. Mol. Cell Proteomics. 2021;20 doi: 10.1016/j.mcpro.2021.100077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Immler S. The sperm factor: Paternal impact beyond genes. Heredity. 2018;121:239–247. doi: 10.1038/s41437-018-0111-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Krawetz S.A. Paternal contribution: new insights and future challenges. Nat. Rev. Genet. 2005;6:633–642. doi: 10.1038/nrg1654. [DOI] [PubMed] [Google Scholar]
  • 18.Karr T.L., Swanson William J., Snook R.R. Sperm Biology: An Evolutionary Perspective. 1st Ed. Academic Press; Burlington, MA: 2009. The evolutionary significane of variation in sperm-egg interactions; pp. 305–365. [Google Scholar]
  • 19.Neubaum D.M., Wolfner M.F. Mated Drosophila melanogasterr females require a seminal fluid protein, Acp36DE, to store sperm efficiently. Genetics. 1999;153:845–857. doi: 10.1093/genetics/153.2.845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.McCullough E.L., Whittington E., Singh A., Pitnick S., Wolfner M.F., Dorus S. The life history of Drosophila sperm involves molecular continuity between male and female reproductive tracts. Proc. Natl. Acad. Sci. 2022;119 doi: 10.1073/pnas.2119899119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Barreau C., Benson E., Gudmannsdottir E., Newton F., White-Cooper H. Post-meiotic transcription in Drosophila testes. Development. 2008;135:1897–1902. doi: 10.1242/dev.021949. [DOI] [PubMed] [Google Scholar]
  • 22.Vibranovski M.D., Lopes H.F., Karr T.L., Long M. Stage-specific expression profiling of Drosophila spermatogenesis suggests that meiotic sex chromosome inactivation drives genomic relocation of testis-expressed genes. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Vibranovski M.D., Chalopin D.S., Lopes H.F., Long M., Karr T.L. Direct evidence for postmeiotic transcription during Drosophila melanogaster spermatogenesis. Genetics. 2010;186:431–433. doi: 10.1534/genetics.110.118919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mahadevaraju S., Fear J.M., Akeju M., Galletta B.J., Pinheiro M.M.L.S., Avelino C.C., et al. Dynamic sex chromosome expression in Drosophila male germ cells. Nat. Commun. 2021;12:892. doi: 10.1038/s41467-021-20897-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Käll L., Canterbury J.D., Weston J., Noble W.S., MacCoss M.J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods. 2007;4:923–925. doi: 10.1038/nmeth1113. [DOI] [PubMed] [Google Scholar]
  • 26.Huang D.W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  • 27.Bindea G., Mlecnik B., Hackl H., Charoentong P., Tosolini M., Kirilovsky A., et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25:1091–1093. doi: 10.1093/bioinformatics/btp101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wright A.E., Harrison P.W., Zimmer F., Montgomery S.H., Pointer M.A., Mank J.E. Variation in promiscuity and sexual selection drives avian rate of Faster-Z evolution. Mol. Ecol. 2015;24:1218–1235. doi: 10.1111/mec.13113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., et al. Ensembl 2020. Nucleic Acids Res. 2020;48:D682–D688. doi: 10.1093/nar/gkz966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 32.Löytynoja A., Goldman N. webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser. BMC Bioinformatics. 2010;11:579. doi: 10.1186/1471-2105-11-579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Harrison P.W., Jordan G.E., Montgomery S.H. Swamp: sliding window alignment masker for PAML. Evol. Bioinforma Online. 2014;10:197–204. doi: 10.4137/EBO.S18193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Yang Z. Paml 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
  • 35.R Core Team . R Foundation for Statistical Computing; Vienna, Austria: 2020. R: A Language and Environment for Statistical Computing [Internet] [Google Scholar]
  • 36.Larkin A., Marygold S.J., Antonazzo G., Attrill H., dos Santos G., Garapati P.V., et al. FlyBase: updates to the Drosophila melanogaster knowledge base. Nucleic Acids Res. 2021;49:D899–907. doi: 10.1093/nar/gkaa1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zhang Y.E., Vibranovski M.D., Krinsky B.H., Long M. Age-dependent chromosomal distribution of male-biased genes in Drosophila. Genome Res. 2010;20:1526–1533. doi: 10.1101/gr.107334.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wigby S., Brown N.C., Allen S.E., Misra S., Sitnik J.L., Sepil I., et al. The Drosophila seminal proteome and its role in postcopulatory sexual selection. Philos. Trans. R. Soc. B Biol. Sci. 2020;375 doi: 10.1098/rstb.2020.0072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinforma Oxf Engl. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Parsch J., Meiklejohn C.D., Hauschteck-Jungen E., Hunziker P., Hartl D.L. Molecular evolution of the ocnus and janus genes in the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 2001;18:801–811. doi: 10.1093/oxfordjournals.molbev.a003862. [DOI] [PubMed] [Google Scholar]
  • 41.Yamamoto S., Jaiswal M., Charng W.L., Gambin T., Karaca E., Mirzaa G., et al. A Drosophila genetic resource of mutants to study mechanisms underlying human genetic diseases. Cell. 2014;159:200–214. doi: 10.1016/j.cell.2014.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Hopes T., Norris K., Agapiou M., McCarthy C.G.P., Lewis P.A., O’Connell M.J., et al. Ribosome heterogeneity in Drosophila melanogaster gonads through paralog-switching. Nucleic Acids Res. 2021;50:2240–2257. doi: 10.1093/nar/gkab606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cao W.X., Kabelitz S., Gupta M., Yeung E., Lin S., Rammelt C., et al. Precise temporal regulation of post-transcriptional repressors is required for an rrderly Drosophila maternal-to-zygotic transition. Cell Rep. 2020;31 doi: 10.1016/j.celrep.2020.107783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.McDonough-Goldstein C.E., Pitnick S., Dorus S. Drosophila oocyte proteome composition covaries with female mating status. Sci. Rep. 2021;11:3142. doi: 10.1038/s41598-021-82801-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Li J., Han S., Li H., Udeshi N.D., Svinkina T., Mani D.R., et al. Cell-surface proteomic profiling in the fly brain uncovers wiring regulators. Cell. 2020;180:373–386.e15. doi: 10.1016/j.cell.2019.12.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Leader D.P., Krause S.A., Pandit A., Davies S.A., Dow J.A.T. FlyAtlas 2: a new version of the Drosophila melanogaster expression atlas with RNA-seq, miRNA-seq and sex-specific data. Nucleic Acids Res. 2018;46:D809–D815. doi: 10.1093/nar/gkx976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kaessmann H., Vinckenbosch N., Long M. RNA-Based gene duplication: mechanistic and evolutionary insights. Nat. Rev. Genet. 2009;10:19–31. doi: 10.1038/nrg2487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Su Q., He H., Zhou Q. On the origin and evolution of Drosophila new genes during spermatogenesis. Genes. 2021;12:1796. doi: 10.3390/genes12111796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Long M., VanKuren N.W., Chen S., Vibranovski M.D. New gene evolution: little did we know. Annu. Rev. Genet. 2013;47:307–333. doi: 10.1146/annurev-genet-111212-133301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Betrán E., Thornton K., Long M. Retroposed new genes out of the X in Drosophila. Genome Res. 2002;12:1854–1859. doi: 10.1101/gr.604902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Takemori N., Takemori A., Wongkongkathep P., Nshanian M., Loo R.R.O., Lermyte F., et al. Top-down/bottom-up mass spectrometry workflow using dissolvable polyacrylamide gels. Anal Chem. 2017;89:8244–8250. doi: 10.1021/acs.analchem.7b00357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Takemori A., Butcher D.S., Harman V.M., Brownridge P., Shima K., Higo D., et al. PEPPI-MS: polyacrylamide-gel-based prefractionation for analysis of intact proteoforms and protein complexes by mass spectrometry. J. Proteome Res. 2020;19:3779–3791. doi: 10.1021/acs.jproteome.0c00303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Majewska M.M., Suszczynska A., Kotwica-Rolinska J., Czerwik T., Paterczyk B., Polanska M.A., et al. Yolk proteins in the male reproductive system of the fruit fly Drosophila melanogaster: spatial and temporal patterns of expression. Insect Biochem. Mol. Biol. 2014;47:23–35. doi: 10.1016/j.ibmb.2014.02.001. [DOI] [PubMed] [Google Scholar]
  • 54.Parisi M., Nuttall R., Naiman D., Bouffard G., Malley J., Andrews J., et al. Paucity of genes on the drosophila X chromosome showing male-biased expression. Science. 2003;299:697–701. doi: 10.1126/science.1079190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Meisel R.P., Malone J.H., Clark A.G. Disentangling the relationship between sex-biased gene expression and X-linkage. Genome Res. 2012;22:1255–1265. doi: 10.1101/gr.132100.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Zhou Q., Bachtrog D. Sex-specific adaptation drives early sex chromosome evolution in Drosophila. Science. 2012;337:341–345. doi: 10.1126/science.1225385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hafezi Y., Sruba S.R., Tarrash S.R., Wolfner M.F., Clark A.G. Dissecting fertility functions of Drosophila Y chromosome genes with CRISPR. Genetics. 2020;214:977–990. doi: 10.1534/genetics.120.302672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Zhang J., Luo J., Chen J., Dai J., Montell C. The role of Y chromosome genes in male fertility in Drosophila melanogaster. Genetics. 2020;215:623–633. doi: 10.1534/genetics.120.303324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Pisano C., Bonaccorsi S., Gatti M. The kl-3 loop of the Y chromosome of Drosophila melanogaster binds a tektin-like protein. Genetics. 1993;133:569–579. doi: 10.1093/genetics/133.3.569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Dapper A.L., Wade M.J. Relaxed selection and the rapid evolution of reproductive genes. Trends Genet. 2020;36:640–649. doi: 10.1016/j.tig.2020.06.014. [DOI] [PubMed] [Google Scholar]
  • 61.Southern H.M., Berger M.A., Young P.G., Snook R.R. Sperm morphology and the evolution of intracellular sperm–egg interactions. Ecol. Evol. 2018;8:5047–5058. doi: 10.1002/ece3.4027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Patlar B., Jayaswal V., Ranz J.M., Civetta A. Non-adaptive molecular evolution of seminal fluid proteins in Drosophila. Evolution. 2021;75:2102–2113. doi: 10.1111/evo.14297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Gur Y., Breitbart H. Mammalian sperm translate nuclear-encoded proteins by mitochondrial-type ribosomes. Genes Dev. 2006;20:411–416. doi: 10.1101/gad.367606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Pitnick S., Wolfner M.F., Dorus S. Post-ejaculatory modifications to sperm (PEMS) Biol. Rev. 2020;95:365–392. doi: 10.1111/brv.12569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Fischer B.E., Wasbrough E., Meadows L.A., Randlet O., Dorus S., Karr T.L., et al. Conserved properties of Drosophila and human spermatozoal mRNA repertoires. Proc. R. Soc. B Biol. Sci. 2012;279:2636–2644. doi: 10.1098/rspb.2012.0153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Matos B., Publicover S.J., Castro L.F.C., Esteves P.J., Fardilha M. Brain and testis: more alike than previously thought? Open Biol. 2021;11 doi: 10.1098/rsob.200322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Avila F.W., Sirot L.K., LaFlamme B.A., Rubinstein C.D., Wolfner M.F. Insect seminal fluid proteins: identification and function. Annu. Rev. Entomol. 2011;56:21–40. doi: 10.1146/annurev-ento-120709-144823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Avila F.W., Wolfner M.F. Cleavage of the Drosophila seminal protein Acp36DE in mated females enhances its sperm storage activity. J. Insect Physiol. 2017;101:66–72. doi: 10.1016/j.jinsphys.2017.06.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Perez-Riverol Y., Bai J., Bandla C., García-Seisdedos D., Hewapathirana S., Kamatchinathan S., et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022;50:D543–D552. doi: 10.1093/nar/gkab1038. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Fig S1

Volcano plots from pairwise analyses between treatments in experiment two, denoting ‘high confidence’ (yellow) and ‘low confidence/transferred’ (turquoise) Sfps or remaining sperm proteins (purple) that showed significant differences in abundance based on a |logFC| > 1 and false discovery rate corrected p-value < 0.05. Several Sfps are labelled that showed differential abundance between treatments.

mmc1.pdf (171.9KB, pdf)
Supplemental Fig S2

Overlap between proteins identified in the current study. The central Euler diagram shows the overlap between proteins identified in each experiment. The surrounding diagrams show the overlap in number of proteins identified in each replicate of each treatment, coloured according to experiment as in the central diagram.

mmc2.pdf (87.5KB, pdf)
Supplemental Fig S3

Correlations in average protein abundance between each experiment in the current study. Mean protein abundance was calculated across all replicates for each experiment, except experiment two which excluded the PBST treatment. Shown are Pearson’s correlations and line of best fit.

mmc3.pdf (189.5KB, pdf)
Supplemental Fig S4

Gene–protein abundance concordance in the DmSP3. a) Heatmap of mRNA expression of DmSP3 genes (n = 2673) in the accessory glands, carcass, ovary, and testis. Data retrieved from from FlyAtlas2 (46) are log2(FPKM) scaled per gene. The 7 ‘high confidence’ Sfps with higher expression in the testis than accessory glands are highlighted on the right. Labels on the right also show ‘sperm associated Sfps’ (red) and other ‘high confidence’ Sfps (blue) identified in the DmSP3. b) Linear regressions of gene expression on protein abundance in the testis (n = 1498), carcass (n = 1165), accessory glands (n = 1001), and ovary (n = 825). c) Linear regressions of gene expression on protein abundance for ‘sperm associated Sfps’ (SAS) and remaining Sfps identified in the DmSP in each tissue. b) and c) are linear regressions using z-score log2-transformed values after filtering genes with log2-FPKM < 2. Blue lines are model fits from a linear regression, dashed lines indicate a perfect correlation between gene expression and protein abundance.

mmc4.pdf (505.7KB, pdf)
Supplemental Fig S5

Nonsynonymous (dN) to synonymous (dS) nucleotide substitution rate (dN/dS) estimates for proteins in the DmSP3 or elsewhere. Asterisks represent results from Mann-Whitney U tests; n.s., non-significant; ∗∗∗, p < 0.001.

mmc5.pdf (37.8KB, pdf)
Supplemental Tables S1–S15
mmc6.xlsx (1.8MB, xlsx)

Data Availability Statement

Proteomic data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (69) with the identifier PXD032033. All code and analyses are available from GitHub: https://martingarlovsky.github.io/DmSP3/


Articles from Molecular & Cellular Proteomics : MCP are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES